Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Collins Parser (Collins 1997, 2005) What is a supervised parser? When is it lexicalized? How are dependencies used for CFG parsing? What is a generative model? Why discriminative reranking? How is it evaluated? How good are the results?
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Outline • basics – (P)CFG – supervised learning – (lexicalized) PCFG
• Collins 1997: Probabilistic parser – model 1: generative version of (Collins 1996) – model 2: + complement/adjunct distinction – (model 3: + wh-movement model)
• Collins 2005: Reranking – reranker architecture – generative / discriminative, (log-)linear
• conclusion
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Outline • basics – (P)CFG – supervised learning – (lexicalized) PCFG
• Collins 1997: Probabilistic parser – model 1: generative version of (Collins 1996) – model 2: + complement/adjunct distinction – (model 3: + wh-movement model)
• Collins 2005: Reranking – reranker architecture – generative / discriminative, (log-)linear
• conclusion
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Probabilistic CFG •
CFG
S → N P
VP
• PCFG S → N P V P (90%) which means:
–
P (Ruler = N P , V P | Rulel = S ) = 0.9
– with normalization: ruler
P (ruler |rulel ) = 1
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Supervised Parsing Architecture
Tree#an$ (S1,T1),(S2,T4),(S3,T5),(S4,T8)
Training Data (S3,T5),(S4,T8)
Training Algorithm
Model
Test Data (S1,T1),(S2,T4)
!al"ation
(S1,?),(S2,?)
Parser
(S1,T1') (S1,T2') (S2,T3') (S2,T4')
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Finding the Best Parse T best = arg max P (T |S ) = arg max T
T
P (T, S ) = arg max P (T, S ) T P (S )
Two types of models
• discriminative: –
P (T |S ) estimated directly
–
P (T, S ) distribution not available
– no model parameters for generating S
• generative: – estimation of P (T, S ) – PCFG: P (T, S )
=
rule∈S P (ruler |rulel )
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Lexicalization of Rules add head word and its PoS tag to each nonterminal
S → N P
VP
becomes
S (loves, VB) → N P (John,NNP ) V P (loves, VB) let’s write this as
P (h) → L 1 (l1 ) H (h)
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Collins 1997: Model 1 Tell a head-driven (lexicalized) generative story:
P (ruler |rulel ) = P (Ln (ln ), . . . , L1 (l1 ), H(h), R1 (r1 ), . . . , Rm (rm )| P (h)) • generate heads first, then the left and right modifiers (independently)
n+1
= P (H(h) | P (h)) ·
P (Li (li ) | P (h), H(h), ∆(i))
i=1
m+1
·
P (Ri (ri ) | P (h), H(h), ∆(i))
i=1
• stop generating modifiers when Ln+1 (ln+1 ) = STOP or Rm+1 (rm+1 ) = STOP : N → neighbour?, verb in between? , (0, 1, 2, > 2) commas in between? • ∆
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Parameter Estimation
n+1
= P (H (h) | P (h)) ·
P (Li (li ) | P (h), H (h), ∆(i))
i=1
m+1
·
P (Ri (ri ) | P (h), H (h), ∆(i))
i=1
parameters estimated by relative frequency in the training set (max. likelihood):
C (H (h), P (h)) P (H (h)|P (h)) = C (P (h)) C (Li (li ), P (h), H (h), ∆(i)) P (Li (li )|P (h), H (h), ∆(i)) = C (P (h), H (h), ∆(i))
linearly smoothed with counts with less specific conditions (backoff)
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Parsing • Bottom-Up chart parsing • PoS tag sentence • each word is a potential head of a phrase • calculate probabilities of modifiers • go on
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Dataset Penn Treebank: Wall Street Journal portion
• sections 2-21 for training – 40k sentences
• section 23 for testing – 2,416 sentences
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Evaluation PARSEVAL evaluation measures:
nr of correctly predicted constituents nr of all predicted constituents nr of correctly predicted constituents Labeled Recall (LR) = nr of all correct constituents in the gold parse Labeled Precision (LP) =
where ’correct ’ constituent ↔ same boundaries, same label
Crossing Brackets (CB) = nr of constituents violating the boundaries in the gold parse
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Results Model 1
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Subcategorization Problem consider this parse:
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Subcategorization Problem due to the independence of modifiers, Model 1 may parse:
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Subcategorization Problem Solution: distinguish modifiers into complements (’-C’) and adjuncts
→ estimate separate probabilities
→ learn that V P (was) prefers only one complement. complement information might also help identifying functional information like subject
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Model 2 Extend Model 1:
P (H (h)|P (h)) · P (LC |P (h), H (h)) · P (RC |P (h), H (h))
m+1
·
LC i ) P (Li (li )|P (h), H (h), ∆(i),
i=1
n+1
·
RC i ) P (Ri (ri )|P (h), H (h), ∆(i),
i=1
• draw sets of allowed complements (subcat sets) for the left (LC) and right (RC) side • generate each complement in LC/RC exactly once. •
no STOP before the subcat set is satisfied
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Results Model 2
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Reranking (Collins 2005) Architecture
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Why rank again? • consider more features of a parse tree – CFG rule occurrence (lexicalized / with grandparent node) – bigram (nonterminals only / lexicalized) occurrence – . . .
• parser: generative model – new random variables needed for every feature
→ nr of joint-probability parameters grows exponentially with nr of features (must be avoided by a generative story introducing conditional independencies)
• reranker: discriminative (log-)linear classifier – treat every feature independently – simple to extend feature set
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Log-Linear Models for PCFG, one step is an application of a CFG-rule:
P (T, S ) =
P (ruler |rulel )
P (ruler |rulel )C
log (P (ruler |rulel )) · C S (rule)
rule∈S
=
S
(rule)
rule∈G
⇔ log (P (T, S )) =
rule∈G
i.e. linear combination in log space call log (P (ruler |rulel )) ’feature weight ’ and C (rule) ’feature value ’
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Results after Reranking
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Conclusion • Lexicalized parser • ’Head-centric’ generative process • Extensions for subcategorization (and wh-movement) • Discriminative Reranking of results
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Thanks for your attention! questions discussion
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Parsing 1/3 bottom up chart parsing: choose a complete(+) phrase as head for a new phrase
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Parsing 2/3 add completed neighbouring phrases as modifiers
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Parsing 3/3 complete by adding STOP modifiers
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
wh-Movement Rules Solution: Account for (+gap) rules separately. → Allow generation of a TRACE under a (+gap)-version of a nonterminal.
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
wh-Movement Rule Analysis
we observe: A TRACE can be
• passed down the head (rule 3) • passed down to one of the left / right modifiers • discharged by a TRACE
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Model 3 Extend Model 2: new random variable G with values:
• Head - passed down the head (3) •
Left/Right - passed down to one of the left / right modifiers ( LC +=gap / RC +=gap )
the gap entry in LC / RC is discharged by a TRACE or a (gap)-phrase modifier phrase
P (H (h)|P (h))·P (LC |P (h), H (h))·P (RC |P (h), H (h))·P (G|P (h), H (h))
m+1
·
LC i ) P (Li (li )|P (h), H (h), ∆(i),
i=1
n+1
·
i=1
RC i ) P (Ri (ri )|P (h), H (h), ∆(i),
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Results Model 3
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
Practical Issues - Smoothing sparse data for full conditioning set → needs backoff
ˆ = λ · pmle ˆ + linear combination: p
(1 − λ) · pˆbackoff recursively stacked: p ˆbackoff = λ ′ · pˆmle + (1 − λ′ ) · pˆbackoff ′
all words occurring less than 5 times are replaced by UNKNOWN
′
Seminar: Recent Advances in Parsing Technology (WS 2011/2012)
Collins Parser
History-Based Models
n
history-based model (generative, structured):
P (T, S ) = i=1 P (di |Φ(d1 , . . . , di−1 )) i.e. a pair (t, s) is generated by a sequence of steps D = d1 , . . . , dn