A Probabilistic Lexical Model for Ranking Textual Inference

A Probabilistic Lexical Model for Ranking Textual Inference

A Probabilistic Lexical Model for Ranking Textual Inference Eyal Shnarch, Ido Dagan, Jacob Goldberger The entire talk in a single sentence Bar Ilan University @ *SEM 2012 2 Outline 1 2 Bar Ilan University @ *SEM 2012 3 3 1 2 Bar Ilan University @ *SEM 2012 3

4 Textual inference useful in many NLP apps tle of In the Bat n 1815, Ju 8 1 , o lo Water my, led r a h c n e r the F on, was by Napole crushed. Nap o tall e leon wa

s no the nough t Batt o wi t n le of Wat erlo o At Waterloo Napoleon did surrender... Waterloo - finally facing my Waterloo Napoleon engaged in a series of wars, and won many in B e Nap lgium o was leon defe ated n wa s o

e l o Nap the f o r o r 4 Empe from 180 h Frenc . 15 to 18 Bar Ilan University @ *SEM 2012 5 Lexical textual inference Complex systems use parser tle of In the Bat J un 8 1 , o lo

r e Wat e nc h r F e h t , 5 181 by army, led w as Napoleon, crushed. in B elgi st or 2nd order co-occurrence Na p u 1 ol eo m w as

n defe ate d Lexical inference rules link terms from T to H Lexical rules come from lexical resources H is inferred from T iff all its terms are inferred Bar Ilan University @ *SEM 2012 6 Textual inference for ranking 1 tle of In the Bat n 1815, Ju 8 1 , o lo Water my, led r a h c

n e r the F on, was by Napole crushed. 3 Nap o tall e leon wa s no the nough t Batt o wi t n le of Wat erlo o At Waterloo Napoleon did 2 surrender... Waterloo - finally facing my Waterloo

Napoleon engaged in a series of wars, and won many 4 In w h bat i c h tle w Na p a ol eo s defe n ated ? 5 n wa s o e l o Nap the f o r

o r 4 Empe from 180 h Frenc . 15 to 18 Bar Ilan University @ *SEM 2012 7 Ranking textual inference prior work Syntacticbased methods Transform Ts parsed tree into Hs parsed tree Based on principled ML model Heuristic lexical methods Fast, easy to implement, highly competitive Practical across genres and languages (Wang et al. 07, Heilman and Smith 10, Wang and Manning 10) (MacKinlay and Baldwin 09, Clark and Harrison 10,

Majumdar and Bhattacharyya 10) Bar Ilan University @ *SEM 2012 8 1 2 Bar Ilan University @ *SEM 2012 3 9 Probabilistic model overview t1 t2 t3 t4 t5 t6 Battle of Waterloo French army led by Napoleon was crushed

knowledge integration P (T h1 ) P(T h2 ) P(T h3 ) h1 h2 h3 x1 x2 x3 term-level which battle was Napoleon defeated sentence-level P(T H ) Bar Ilan University @ *SEM 2012 annotations are available at sentence-level only

10 Probabilistic model term level t1 t2 t3 t4 t5 t6 Battle of Waterloo French army led by Napoleon was crushed r is a rule multiple evidence OR R (r ) is the reliability level of the resource which suggested r rule1

t' P(T h) 1 c [1 P(t h)] cchains ( h ) h1 h2 transitive chain chain P(t h) R ( r ) rchain rule2 h3 which battle was Napoleon defeated ACL 11 short paper this level parameters: one per input lexical resource Bar Ilan University @ *SEM 2012 11 Probabilistic model overview Battle of Waterloo French army led by Napoleon was crushed

knowledge integration P (T h1 ) P(T h2 ) P(T h3 ) term-level which battle was Napoleon defeated sentence-level P(T H ) Bar Ilan University @ *SEM 2012 12 Probabilistic model sentence level P (T h1 ) P(T h2 ) P(T h3 ) h1 h2 h3 x1

x2 x3 which battle was Napoleon defeated we define hidden binary random variables: xt = 1 iff ht is inferred from T (zero otherwise) Modeling with AND gate: Most intuitively However Too strict Does not model terms dependency Bar Ilan University @ *SEM 2012 y final sentencelevel decision 13 Probabilistic model sentence level xt = 1 iff ht is inferred by T (zero otherwise) P (T h1 ) P(T h2 )

P(T h3 ) h1 h2 h3 x1 x2 x3 y1 y2 y3 which battle was Napoleon defeated we define another binary random variable: yt inference decision for the prefix h1 ht P(yt = 1) is dependent on yt-1 and xt final sentencelevel decision

qij (k ) P( yt k | yt 1 i, xt j ) LM P M i, j , k {0,1} this level parameters Bar Ilan University @ *SEM 2012 14 M-PLM inference P (T h1 ) P(T h2 ) P(T h3 ) h1 h2 h3 x1 x2

x3 y1 y2 y3 which battle was Napoleon defeated t (k ) P( yt k ) t 1 (i) P( xt j )qij (k ) (2) i , j{0 ,1} 1 (k ) P( x1 k ) (3) can be computed efficiently with a forward algorithm qij(k) n P( yn 1) P ( x1 ) P ( xt ) P ( yt | yt 1 , xt ) x1 ,, xn (1) final sentencelevel decision

t 2 y 2 ,, y n 1 , y n 1 P( yn 1) n (1) (4) Bar Ilan University @ *SEM 2012 15 M-PLM summary Observed Lexical rules which link terms Lea rnin we g d ev sch eme eloped E lear to join M par n all tly ame ters

Bar Ilan University @ *SEM 2012 16 1 2 Bar Ilan University @ *SEM 2012 3 17 Evaluations data sets Ranking in passage retrieval for QA Recognizing textual entailment within a corpus (Wang et al. 07) 5700/1500 question-candidate answer pairs from TREC 8-13 20,000 text-hypothesis pairs in each RTE-5, RTE-6

Manually annotated Notable line of work from recent years: Punyakanok et al. 04, Cui et al. 05, Originally constructed for classification Wang et al. 07, Heilman and Smith 10, Wang and Manning 10 Bar Ilan University @ *SEM 2012 18 Evaluations baselines Syntactic generative models Require parsing Apply sophisticated machine learning methods (Punyakanok et al. 04, Cui et al. 05, Wang et al. 07, Heilman and Smith 10, Wang and Manning 10) Lexical model Heuristically Normalized-PLM AND-gate for the sentence-level Add heuristic normalizations to addresses its disadvantages (TextInfer workshop 11) Performance in line with best RTE systems Bar Ilan University @ *SEM 2012

LM P N H 19 QA results syntactic baselines Bar Ilan University @ *SEM 2012 20 QA results syntactic baselines + HN-PLM +0.7% +1% Bar Ilan University @ *SEM 2012 21 QA results baselines + M-PLM +3.2% LM M-P +3.5%

Bar Ilan University @ *SEM 2012 22 RTE results M-PLM vs. HN-PLM +1.9% +7.3% +3.6% +6.0% Bar Ilan University @ *SEM 2012 23 Summary Clean probabilistic lexical model As a lexical component or as a stand alone inference system Superiority of principled methods over heuristic ones Attractive passage retrieval ranking method Code available - BIU NLP downloads M-PLM limits Processing is term order dependent Lower performance on classification vs. HN-PLM does not normalize well across hypotheses length

k n Tha u o Y Future work Further explore a broader range of related probabilistic models Bar Ilan University @ *SEM 2012 24

Recently Viewed Presentations

  • Sources of Energy for Exercise - haringey6sport

    Sources of Energy for Exercise - haringey6sport

    ATP-CP Energy System. ATP is stored in the muscle & liver for "Quick Energy" Nerve impulses trigger breakdown of ATP into ADP. ADP = Adenosine Diphosphate & 1 Phosphate. The splitting of the Phosphate bond = Energy for work. Ex....
  • Notes  Arc Length A) B) 1 in 3

    Notes Arc Length A) B) 1 in 3

    wheel in radians per . hour. 4) The circular blade on a saw rotates at . 2400 revolutions per minute. (Caution: How do we convert . 2400. revolutions to . ... revolutions per second. If the diameter of . the...
  • Dermatologic Therapy-Topical

    Dermatologic Therapy-Topical

    2 sec arc Foresee PHP™ Technology Vernier Acuity Snellen 20/15 Resolution 1minute of arc 0.017 degrees Vernier Resolution Two seconds of arc 0.03 minutes of arc 0.00051 degrees The width of a pencil viewed at 300 m ! Hyperacuity Discuss...
  • to 1 st grade! Samantha Bastian  Rena Campbell

    to 1 st grade! Samantha Bastian Rena Campbell

    20 Minutes of Reading (Take Home Books and Raz-Kids) Additional home activities include counting, measuring, grouping, and other means of real world problem solving. Visit the First Grade website for lunch times.
  • Project 11 Security and Privacy of the Communication

    Project 11 Security and Privacy of the Communication

    Worked on the mid-program paper. Researched the timeline. Problem & Solutions. Unsure of how to structure the paper - no straightforward analysis. Timeline for the proposal was for XP not 7. With 200 updates, reading the description for each is...
  • Decoder Symbol Examples - University of Wisconsin-Madison

    Decoder Symbol Examples - University of Wisconsin-Madison

    A 2-to-4 decoder has a 2-bit input codeword, so there are 4 possible input values and hence 4 outputs—one per value. Both the inputs and outputs MUST be labeled - their position does not have inherent meaning. The input labeled...
  • Mixed age classes in 2016/17 - Lum Head Primary

    Mixed age classes in 2016/17 - Lum Head Primary

    It seems highly likely that Lum Head will stay at an intake level of 45 pupils, and that our numbers will stay around this point. Mixed age classes will therefore become a permanent feature of our school. Budget and classroom...
  • Publishing English Lesson 1 - USTC

    Publishing English Lesson 1 - USTC

    "Becoming an Academic Writer: 50 Exercises for Paced, Productive, and Powerful Writing" by Patricia Goodson "Research examining productive faculty's habits consistently point to scheduled and protected writing time as a key element for success."