S YNTACTIC A NALYSIS NALYSIS –I (FORMAL GRAMMARS)
Jasmeet Singh Thapar Universit
INTRO!UCTION The word “syntax” in natural language, refers to the grammatical arrangement of words in a sentence and their relationship with each other. The objective of syntactic analysis is to find the syntactic structure of the sentence. The sy synt ntac acti ticc st stru ruct ctur ure e is usua usuall lly y repr repres esen ente ted d in th the e The form of tree whose nodes are the phrases and the leaves corresponds to words of the languages. The process of identifying the syntactic structure of the sentence is called snta"ti" called snta"ti" parsing #r parsing$ Syntactic parsing can also be defined as the process of assigning ‘phrase markers to a sentence. Syntactic analysis or parsing is useful in determining the meaning of sentence.
CONSTITU%NCY C#nstit&en" is an important aspect of natural language useful for syntactic analysis. There are certain words that go together with each other more than with others. usually group group together to act as a !n a language, words that usually single unit are called "#nstit&ents called "#nstit&ents #r phrases$ , , , court are court are all n#&n all n#&n phrases as phrases as they can all appear as same syntactic context #subject or object of verb$. The constituents combine with other combine to form sentence. "or instanc instance, e, the the noun phrase ‘The bird’ combine combine with the verb phrase ‘flies’ phrase ‘flies’ to to form the sentence ‘The sentence ‘The bird flies’. ordering of words in a constituent and the The ordering of constituents is quite important.
CONT%'T FR%% GRAMMAR (CFG) %ontext&free grammar is a widely used mathematical system for modeling constituent structure in natural language.
%ontext&free grammars are also called phrase str&"t&re grammars str&"t&re grammars and are first defined for natural language by %homsky #'()*$.
%"+s are first used for lgol programming language by -ackus #'()($ and aur #'(/0$, so it is also referred as a"*&sNa&r as a"*&sNa&r F#rm (NF)$
CONT%'T FR%% GRAMMAR (CFG)
context&free grammar consists of
a set of r&+es or r&+es or pr#,&"ti#ns pr#,&"ti#ns,, each of which expresses the ways that hat symbo ymbolls of the lang langua uag ge can be gro groupe uped and order dered together, a +e-i"#n of +e-i"#n of words.
"or example, the following productions express that a N. #or n#&n #or n#&n phrase$, a ProperNoun phrase$, can be composed of either a ProperNoun
NP 1 Det 1 Det Nominal NP 1 ProperNoun 1 ProperNoun %ontext&free rules can be hierarchically embedded, so we can combine the previous rules with others like the following which express facts about the lexicon2 Det 1 Det 1 a Det 1 Det 1 the Noun 1 Noun 1 flight flight
CONT%'T FR%% GRAMMAR (CFG) The symbols that are used in a %"+ are divided into two classes2 The The sy symb mbol olss th that at corr corres espo pond nd to wo word rdss in th the e lang langua uage ge '$ #“the”, “bird”$ are called termina+ called termina+ symbols3 symbols3 the lexicon is the set of rules that introduce these terminal symbols. The The sy symbo mbols ls th that at ex expr pres esss clust cluster erss or ge gene nera rali5 li5at atio ions ns of 4$ these are called n#ntermina+s$ called n#ntermina+s$ n eac context ree ru e, t e tem to t e r g t o t e arrow #1$ is an ordered list of one or more terminals and non& terminals, while to the left of the arrow is a single non& terminal symbol expressing some cluster or generali5ation. 6ach 6ach gram gramma marr must must hav have one one desi design gnat ated ed st star artt sy sym mbol bol which is often called S . Since conte tex xt&free gram rammars are oft often use used to defi efine sentences, S is is usually interpreted as the “sentence” node.
% 'AM.L% OF .RO!UCTION RUL%S AN! L%'ICON Production Rules
Lexicon
Noun flights | breeze | trip | morning | ...
Verb fly
Adjective cheapest | non−stop | first | latest | other | direct | ...
→
→
is | prefer prefer | like | need | want |
S → NP VP (I + want a morning flight) Pronou oun n (I) NP → Pron |Proper-Noun (Los Angeles) Det |Det Nominal (a + flight)
→
ronoun
→
me
you
t ...
Proper-Noun Alaska | Baltimore Baltimore | Los Angeles Angeles | Chicago | United | American | ...
Determiner the | a | an | this | these | that | ...
o
→
→
Preposition ...
Conjunction
from from | to | on | near |
→
→
and | or | but | ...
Nominal → Nominal Noun (morning flight)
| Noun (flights) VP Verb (do) |Verb erb NP (want + a flight) | Verb NP PP (leave + Boston + in the morning) |Verb PP (leaving + on Thursday) PP Preposition NP (from + Los Angeles) →
→
6789:6 ;" 9S!+ %"+ The parse tree for “I prefer a morning flight” according to grammar (defined in previous slide)
• Ca Can n be re repr pres esen ente ted d in a mor more e comp compac actt way way – br brac acke kete ted: d:
FORMAL !%FINITION OF CFG context&free context&free grammar G is defined by four parameters N, ?, P, S # # technically “is a @&tuple”$2 1) N a a set of n#ntermina+ n#ntermina+ sm/#+s #or sm/#+s #or varia/+es varia/+es$$ ? a set of termina+ termina+ sm/#+s #disjoint sm/#+s #disjoint from N $ 4$ of r&+es or productions, each of the form 1B A$ set of where is a non&terminal, B is a string of symbols from
designated start designated start sm/#+ S C.."on&terminals %apital letters like , !, C.." S"The S"The start symbol :ower&case +reek letters like D, B, and E Strings drawn from #?∪N $∗ :ower& r&ca case se
FORMAL L ANGUAG% !erivati#n0 :et D',D4, ...,D ...,Dm be strings in #? ∪N $∗,m F ,m F ', such that D' ⇒D4, D4,D4 ⇒DA,..., DA,...,D DmG' ⇒Dm He say that D' ,erives D ,erives Dm, or D' ⇒∗ Dm The language : G generated by a grammar G as the set of strings comp compos osed ed of te term rmin inal al symbo ymbols ls wh whic ich h can can be deri deriv ved from from th the e designated start symbol S . ∗ I ∗ ⇒ Sentences #strings of words$ that can be derived by a grammar are in the form ormal lan languag guage e defi defin ned by th that at gram gramm mar, ar, are are called grammati"a+ sentences. Sentences that cannot be derived by a given formal grammar are not in the language defined by that grammar, and are referred to as &ngrammati"a+$ as &ngrammati"a+$ inguis isttics ics, th the e use of for formal lang anguages ages to model odel natu atural ral !n lingu languages is called generative grammar, grammar, since the language is defined by the set of possible sent entenc ences “gene enerated” by the grammar.
%NGLIS1 S%NT%NC% L%2%L CONSTRUCTIONS sentence in a language can have varying structure.
!n 6ngl 6nglis ish, h, th the e four four comm common only ly know known n st stru ruct ctur ures es of sentences are2
'$
Meclarative Structure
4$
!mperative Structure
A$
Nes&o Ouestion Structure
@$
Hh&Ouestion Structure
%NGLIS1 !%CLARATI2% S%NT%NC% CONSTRUCTIONS Sentences with a declarative structure have a subject followed followed by a predicate. predicate.
The subject of a declarative sentence is a noun phrase and the predicate is a verb phrase.
6xamples of few declarative sentences include2
! like like horse riding.
The flight flight should leave at at around seven p.m. p.m.
I prefer a morning flight
The phra hrase str& tr&"t&re r&+ r&+e 3#r the the imperative ive senten"es is0 S→ N. 2.
%NGLIS1 IM.%RATI2% S%NT%NC% CONSTRUCTIONS
Sentences with imperative structure usually begins with a verb phrase and do not have subject.
The subject of these types of sentence is implicit and is understood by ‘you.
These types of sentences are used for commands and su es esti tions ons hence hence th the e are are call called ed as im erat erativ ive e sente sentenc nces es..
6xamples of imperative sentences include2
Show me the notebook.
Stop talking.
8ove to the classroom.
The phrase str&"t&re r&+e 3#r imperative senten"es is0 S
2.
→
%NGLIS1 Y%SNO 4U%STION S%NT%NC% CONSTRUCTIONS
Sentences with esn# 5&esti#n structure are often #though not always$ used to ask Puestions #hence the name$.
These Puestions perform different pragmati" functions such as asking, rePuesting, or suggesting. ome examp es o
ese ype o sen ences are2 re2
Mo you have :9 bookQ
!s the the cricket match overQ
%an you show me me show your photographQ photographQ
These senten"es /egin 6ith an a&-i+iar ver/7 3#++#6e, / a s&/8e"t NP 7 3#++#6e, / a VP $ S 9 Aux NP VP
%NGLIS1 : 1 4U%STION S%NT%NC% CONSTRUCTIONS
Sentences with wh&Puestion wh&Puestion structure structure are more complex.
These are so named because one of their constituents is a 6h phrase7 that phrase7 that is, one that includes a 6h6#r, a 6h6#r, # #$ho, $ho, $hose, $hen, $here, $hat, $hich, ho$, $h%$. $h%$.
These may be broadly grouped into two classes of sentence&level structures2 6hs&/8e"t5&esti# 6hs&/8e"t5&esti#n7 n7 6hn#ns&/8e"t 6hn#ns&/8e"t 5&esti#n. 5&esti#n.
The 6hs&/8e"t5&esti#n st structure is identical to the declarative structure, except that the first noun phrase contains some wh&word.
6xamples of 6hs&/8e"t5&esti#n includes2 6hs&/8e"t5&esti#n includes2
Hhich team won the matchQ matchQ
Hhich flights serve breakfastQ
The phrase str&"t&re r&+e 3#r 6hs&/8e"t5&esti#n in"+&,es0 S→ :hN. 2.
%NGLIS1 : 1 4U%STION S%NT%NC% CONSTRUCTIONS CONT!;
!n the 6hn#ns&/8e"t the 6hn#ns&/8e"t 5&esti#n structure, 5&esti#n structure, the wh&phrase is not the subject of the sentence, and so the sentence includes another subject.
!n th thes ese e ty type pess of se sent nten ence cess th the e auxi auxili liar ary y appe appear arss befo before re th the e subject NP subject NP , just as in the yes&no&Puestion structures. structures.
6xamples of these these sentences include2
Hhich Hhich camera camerass can ou show me in our shopQ shopQ
Hhat flights flights do you you have from !ndia to HashingtonQ HashingtonQ
The phrase str&"t&re #3 the 6hn#ns&/8e"t 6hn#ns&/8e"t 5&esti#n is0 S → :hN. A&A&- N. 2. %onstructions like the 6hn#ns&/8e"t5&esti#n contain what are are call called ed +#ng +#ng ,istan"e ,istan"e ,epen,en ,epen,en"ies "ies becau because se th the e Wh"NP $hat flights is flights is far away from the predicate that it is semantically related to, the main verb ha#e verb ha#e in in the &P .
.1RAS% L%2%L CONSTRUCTIONS
fundamental notion in natural language is that certain groups of words behaves as constituents or phrases.
phrase is named after their head, which is the lexical category that determines the property of phrase.
"or instance, if the headR central word is noun, then it is called noun phrase.
;ne of the simplest way to test whether a group of words is a phrase or not is to see whether it can be substituted with the other group of words without changing meaning (S&/stit&ti#n Test). Test).
6lements that can substitute each other in certain syntactic position are said to be members of same paradigm.
CONSTITU%NT SUSTITUTION T%ST
T1% NOUN .1RAS%
• • •
!ree of te most pop"lar NPs: NP Pronoun NP Prop Pr oper er No Noun un NP min nal Dett No De Nomi mina nall # Nomi Nominal Noun | Nominal → → →
NP
→
→
Noun
Modifiers HEAD-Noun Modifiers determ rmine iner r: • The dete • The role can be filled by simple lexical determiners or by more . . Examples 1: a flight | this flight | any fli fligh ghts ts | those fli fligh ghts ts | some flights Examples 2 : United’s flight |Un Unit ited ed’’s pi pilo lot’ t’s s union | De Denv nver’ er’s s ma mayo yor’s r’s mother’s canc cancel eled ed flig flight ht Poss Po sses essi sive ve expr expres essi sion ons s are are de defi fine ned d by: by: Det NP’s The nomin nominal al:: • The • Can be either a simple noun or a construction in which a noun (Nominal Noun)) is in the center and it also have pre- and post-head Noun modifiers. →
→
→
%FOR% T1% 1%A! NOUN $e%eral word classes may appear before before te ead: cardinal n"mbers& ordinal ordinal n"mber n"mberss and '"antifi '"antifiers ers(( • )*amples: cardinal n"mbers – n"mbers – two fr frie iends nds # one one sto stop p • )*amples: ordinal n"mbers – n"mbers – te first friend # te ne*t stop # te oter fligt • )*amples: '"antifiers '"antifiers – – many friends # se%eral stops # few fligts # m"c noise +d,ecti%es occ"r occ"r after '"antifie '"antifiers rs b"t before before no"ns ec %es can a so e gro"pe n o an a ec %e p rase ( • +d,ect +d,ecti%e i%ess can can a%e a%e an an ad%erb ad%erb bef befor ore e te ad,e ad,ecti cti%e( %e( )*ample )*ample:: te te least e*pensi%e fare • + r"l r"le e wic wic defi define ness all pre preno nomi minal nal modi modifie fiers rs:: NP (Det) (Card) (Card) (Ord) (Ord) (Quant) (Quant) (AP) (AP) Nomi Nominal nal
→
FT%R T1% 1%A! NOUN <=> A FT%R + ead no"n can be followed followed by tree kinds of of postmodifiers: postmodifiers: /( prepo possit itiiona nall pr prase sess – )*ample: all fligts from D" D"ba baii 0( non1finite cla"ses – )*ample: any fligts arr arri%in i%ing g afte afterr ele ele%en %en a(m a(m(( 2( relati%e cla"ses – )*ample: a fligt ta tatt ser ser%es %es bre breakf akfast ast nominal r"le tat acco"nts acco"nts for for PPs: Nominal N + nominal No ominal PP
→
!ree most common kinds of non1finit non1finite e postmodifiers: postmodifiers:
a. b. c.
3er"ndi%e -1ing. 1ed Infinite forms
•
3er" 3er"nd ndi% i%es es con consi sist st of of a VP tat tat beg begin inss wit wit te te ger ger"n "ndi di%e %e for form m of te te %erb( )*amples: any of tose le lea% a%ing ing on ! !"r "rsd sday ay +n nom omin inal al r"le r"le tat tat acco acco"n "nts ts for for g ger er"n "ndi di%e %e modi modifi fier ers: s: Nominal GerundVP and the rules for gerunds are: 3er"ndVP GerundV NP | GerundV Gerun dV PP | GerundV GerundV | Gerun GerundV dV NP PP
•
→
→
"T6< "T6< T6 6M ;> 4R4
Postmodifie Postmodifiers rs based based on –ed forms and infinit infinities ies(( )*amples: )*amples:
I need to a%e dinner ser%ed( 4ic 4ic is te aircraft "sed by tis fligt 5 !e last fligt to arr arri% i%e e in 6os 6osto ton n (
Postnominal Postnominal relati%e cla"ses -a(k(a restricti%e relati%e cla"ses. begin wo o .( !e relati wit a relati%e prono"n -tat or w relati%e %e pro prono" no"n n f"nct f"nctio ions ns as te s"b,ect s"b,ect of of te embedded embedded %erb( %erb(
•
)*amples:
+ flig fligtt tat ser%es breakfast 7ligts ta tatt lea%e lea%e in in te mo morni rning ng !e one t tat at lea lea%e %ess at ten ten tirty tirty fi% fi%e e !o
deal wit relati%e cla"ses& we add te r"les: Nominal Nomin Nom inal al Re RelC lCla laus use e →
8elCla"se
(who (w ho | tha that) t) VP
→
ARSE AR SE P
TREE TR EE FO FOR R
T AMP 10” AMPA A LEA LEAVIN VING G BEFORE BEFORE 10”
“ ALL THE MOR DENVER NVER TO MORNIN NING G FLIG FLIGHTS HTS FROM D E
%R .1RAS% 2 %R • VPs con consis sistt of a %erb %erb and and a n"mber n"mber of ot oter er con consti stit"e t"ents nts(( !ese !ese constit constit"ent "entss incl"d incl"de e NPs and PPs: Verb Example Example:: disappear VP VP Verb NP Example Example:: prefer prefer a morni morning ng flight flight VP Verb NP PP Example Example:: leave leave Bost Boston on in in the the morning morning Verb PP Example Example:: leav leavin ing g on Thursd Thursday ay VP An entire embedded sentence may follow a verb. They are called sentential complements. Examples Examples:: →
→
→ →
You [VP [v said [S there were two two flights that were the cheapest cheapest ]]] [VP [v Tell ] [NP me ] [s how to get from the airport in Philadelphia Philadelphia to downtown]] downtown]] !o
VP
deal wit sentential complements we add te r"le: Ve V erb S
→
Another potential constituent of a VP is another VP. VP. This happens for f or some verbs, e.g. want, try, would like, intend, need. Examples Examples::
I want [VP to fly from Dallas to Orlando] Hello, I’m trying [VP to find a flight flight from Dallas Dallas to Orland Orlando o]
!J%CTI2% ?A !2%R !2%R .1RAS%S .R%.OSITIONAL7 A !J%CTI2%
9reposition 9hrase #99$ contains a preposition followed by other constituent usually noun phrase. 6xamples2 He played volleyball #n volleyball #n the /ea"h ohn went #&tsi,e went #&tsi,e The phrase str&"t&re r&+e is ..→ .rep (N.) adverb followed by a 99. 6xam 6xampl ples2 es2 shi shish sh is "+ever, "+ever, The train is ver +ate, +ate, 8y sister is 3#n, is 3#n, #3 anima+s The phrase str&"t&re r&+e is A.→ (A,v) A,8 (..) n adve adverb rb phra phrase se cons consis ists ts of an adv adverb, erb, pree preece cede ded d by a degree of adverb 6xample2 Time passes very Puickly. The phrase str&"t&re r&+e is A,v.→ (Intens) A,v
COOR!INATION • Prase Prasess can be con, con,oi oined ned wit wit con con,"n ,"nct ctio ions ns -e(g( -e(g( and& and& or& or& b"t. b"t. 7orms of coordinations: • 7or no"n prases NP N NP P and NP →
Please repeat [NP [NP the flights] and [NP the the cos costs ts ]
• for nominals Nominal
Nomin No minal al an and d Nom Nomin inal al
→
Please repeat the [Nom [Nom flights] and [Nom costs ]
• 7or %erb prases VP
VP V P and VP
→
VP VP
Nom
Fran Francis cisco co ]
• 7or $ con,"nction [S [S I’m interested in a flight from Denver to Dallas] and [S I’m also interested in going going to San Francis Francisco co ]
• Coor Coordi dina nati tion on makes makes "se "se of of met metar ar"le "les: s: 9
X and X
→
TR%%AN@S • Corp"s in wic eac sentence is syntactically annotated wit a pars pa rse e tree tree • !e Penn !reebank "ses corpora in )nglis from 6rown& $witcboard& +!I$ and 4all $treet o"rnal( !ere are treebanks for Cinese and +rabic as well( • ;ter !reebanks: Prag"e Dependency !reebank for C
Brown Corpus