Treebank Studio (PREVIEW)
PaCQL case studies using IcePaHC
This page shows some examples of queries to search for linguistically interesting patterns in the IcePaHC corpus. All the queries are written as coding queries with a dependent binary variable (1/0) because this is how a real study in quantative syntax will normally be designed. The typical workflow would be to first experiment with the query while looking at the "Web output" and then download the "TSV" output for analysis in a statistical package.
Oblique vs. nominative subjects
Basic query:
IP-(MAT|SUB) idoms NP-SBJ
oblique:1
NP-SBJ idoms .*-[ADG]
oblique:0
NP-SBJ idoms .*-N
The first line picks out all subjects in finite clauses. Note that the label matching is done via regular expressions so "IP-(MAT|SUB)" matches both "IP-MAT" and "IP-SUB". The "oblique" coding variable codes every result as "1" if the subject is in the accusative, dative or genitive, [ADG], but as "0" if the subject is nominative. Note that the first (or only) coding varible which has only "1" and "0" as its values is assumed to be the dependent variable in the summary view, see [Display summary].
Coding variables cause different versions of the query to be run. The above query runs both of the following queries, coding matches for the first one as "oblique:1" and matches to the second one as "oblique:0".
IP-(MAT|SUB) idoms NP-SBJ
NP-SBJ idoms .*-[ADG]
IP-(MAT|SUB) idoms NP-SBJ
NP-SBJ idoms .*-N
Note that every NP which passed subject tests for a modern annotator is annotated as an oblique subject. It is up to the user to make use of such annotation for previous centuries whether the user thinks that these phrases are subjects in Old Icelandic or not.
Same query with some meta coding added:
IP-(MAT|SUB) idoms NP-SBJ
oblique:1
NP-SBJ idoms .*-[ADG]
oblique:0
NP-SBJ idoms .*-N
meta:
node label IP-(MAT|SUB)
text year
text century
text genre
The above query adds a coding column with the label of the matched IP, whose value will be either "IP-MAT" or "IP-SUB" depending on whether the result came from a main clause or a subordinate clause. Columns for text year, text century and text genre have been added, too, for anlysis.
Word order in the verb phrase - OV vs. VO
Basic query:
define:
modal MD[PD][IS]
object NP-OB[12]
IP-(MAT|SUB) idoms modal
ov:1
modal sprec object
object sprec VB
ov:0
modal sprec VB
VB sprec object
The query above uses a definition block headed by "define:" to define shortcuts for some labels we want to match. The first entry in the definition block is used to replace "modal" with "MD[PD][IS]" whenever "modal" appears in the query. "modal" matches modals in both the present and the past tense "[PD]", and in the indicative and subjunctive mood "[IS]". The label "object" uses "NP-OB[12]" to match direct objects "NP-OB1" and "NP-OB2". Recall that label matching uses regular expression syntax.
The query matches finite clauses (both matrix and subordinate clauses) which immediately dominate "idoms" a modal. Then the "ov" coding variable gets the value "1" if the modal sisterwise-precedes "sprec" the object and the object sisterwise-precedes "sprec" a verb in the infinitive "VB". The "ov" coding variable gets the value "0" if the modal "sprec"'s a verb in the infinitive which in turn "sprec"'s an object.
More advanced OV/VO:
define:
modal MD[PD][IS]
object NP-OB[12]
IP-(MAT|SUB) idoms modal
ov:1
modal sprec object
object sprec VB
ov:0
modal sprec VB
VB sprec object
np:pro
object idomsonly PRO-.*
np:quant
object idoms Q[RS]?-.*
np:else
object idoms .*
Here we have added another coding variable "np" which gets the value "pro" whenever the object immediately dominates a pronoun and nothing else (note "idomsonly"). "np" is coded as "quant" if there is a quantifier in it. Note that quantifiers can have the tag "Q", or "QR" in the comparative or "QS" in the superlative. "[RS]?" indicates that one of these letters can be optionally included in the tag.
OV/VO with information about heaviness and style:
define:
modal MD[PD][IS]
object NP-OB[12]
IP-(MAT|SUB) idoms modal
ov:1
modal sprec object
object sprec VB
ov:0
modal sprec VB
VB sprec object
np:pro
object idomsonly PRO-.*
np:quant
object idoms Q[RS]?-.*
np:else
object idoms .*
meta:
node nodewords object
node nodestring object
node label IP-(MAT|SUB)
text genre
text lexicaldiversity
This final version of our OV/VO query is the same as before except it adds some meta coding columns. We get the number of words dominated by the object and this allows us to study whether heaviness correlates with the OV/VO distinction. We get the string of terminal nodes which make up the object in a column and this might help us study lexical effects or eyeball the types of noun phrases which are found in each word order. Then there is a column with the label of the IP and it tells us whether the result came from a matrix or a subordinate clause. Finally, we get information about the genre of the text and a column about lexical diversity, a measure of the rate of type frequency of word forms and the number of words in the text. We can use the last columns as a proxy for a stylistic dimension, e.g. formal vs. informal or complex vs. simple style.
Topicalized objects
Basic query:
define:
object NP-OB[12]
finiteverb (VB|HV|MD|DO)[PD][IS]
IP-(MAT|SUB) idoms finiteverb
topicalized:1
object sprec finiteverb
finiteverb sprec NP-SBJ
topicalized:0
NP-SBJ sprec finiteverb
finiteverb sprec object
This query defines "object" as a shortcut for direct and indirect objects and "finiteverb" as a shortcut for a finite verb, including main verbs "VB", have "HV", modals "MD" and the verb
gera 'do'. The main section of the query picks out finite clauses which immediately dominate ("idoms") a finite verb (basically all of them), and the IP of that clause is the first label and becomes the anchor of the query. The search engine answers the question "how many anchors match a given pattern?".
Using "haslabel" to detect object type:
define:
object NP-OB[12]
finiteverb (VB|HV|MD|DO)[PD][IS]
IP-(MAT|SUB) idoms finiteverb
topicalized:1
object sprec finiteverb
finiteverb sprec NP-SBJ
topicalized:0
NP-SBJ sprec finiteverb
finiteverb sprec object
direct:1
object haslabel NP-OB1
direct:0
object haslabel NP-OB2
The above query uses the special function "haslabel" to create a column for object type. The column is called "direct" and its value is "1" if the matched object is an "NP-OB1" but "0" if it is an indirect object, "NP-OB2".
Subjects and objects in relative clauses
Basic query:
CP-REL idoms WNP
object:1
WNP sameindex NP-OB[12]
object:0
WNP sameindex NP-SBJ
This query looks at extraction of arguments from relative clauses and codes each result as "object:1" if the extracted element is an object but "object:0" if the extracted element is a subject. Note that "sameindex" applies to the node which dominates the trace, even if the trace itself is where the index is written in the annotation.
Arguments with a particular verb
Query for the order of the two objects of
gefa 'to give':
IP.* idoms VB
VB idomslemma gefa
iofirst:1
VB sprec NP-OB2
NP-OB2 sprec NP-OB1
iofirst:0
VB sprec NP-OB1
NP-OB1 sprec NP-OB2
That is all for now. Please check back for updates.