Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model

Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model

Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill Part of Speech Tagging Train a model on a set of hand-tagged sentences Find best sequence of POS tags for new sentence Generative Models Hidden Markov Model HMM Discriminative Models Maximum Entropy Markov Model (MEMM) Brown Corpus ~57,000 tagged sentences 87 tags (reduced to 45 for Penn TreeBank tagging) ~300 tags including compound tags that_DT fire's_NN+BEZ too_QL big_JJ ._. fires = fire_NN is_BEZ Hidden Markov Models DT (singular determiner) NN+BEZ (common noun + is)

That Fires QL (qualifier) too JJ (adjective) big Set of hidden states (POS tags) Set of observations (word tokens) Dependents ONLY on current tag HMM parameters Transition probabilities : P(ti|t0ti) = P(ti|ti-1) Observation probabilities: P(wi|t0tn,w0wn) = P(wi|ti) Initial tag distribution: P(t0) HMM Best Tag Sequence For HMM, the Viterbi algorithm finds the most probable tagging for a new sentence For re-ranking later, we want not the best tagging but the k best tagging for each sentence

HMM Beam Search Word 0 Word 0 Step1 Enumerate all possible tags for the first word Start ... Step 2 Word 0 Evaluate each tagging using trained HMM keep only the best k (first word sentence taggings) Word 1 Step 3 For each of the k taggings of the previous step, enumerate all possible tags for the second word

Step 4 Evaluate each two-word sentence tagging and discard all the k best. Repeat for all words in the sentence Word 2 ... Word 2 Word 2 Start ... ... Word 2 Word 1 Word 2 ... Word 2 MaxEnt Re-ranking

After beam search, we have the k best taggings for our sentence Use trained MaxEnt model to select most probable sequence of tags Word 1 ... ... ... ... ... ... ... Word t Start Word 1

Word t Word t Results Feature Current word Previous tag

Word contains a numeral -ing -ness -ity -ed -able -s -ion -al -ive -ly Word is capitalized Word is hyphenated Word is all uppercase Word is all uppercase with a numeral Word is capitalized and a word ending in Co. or Inc. is found within 3 words ahead Results Accuracy Baseline Most frequent class tagger: 73.41% (24%) HMM Viterbi tagger: 92.96% (32.76% on ) 92 60 91.5 50 91

40 90.5 30 90 20 89.5 10 Known Word Accuracy 89 1 2 3 4 5 Beam Search Width (K) 0 10 20

Recently Viewed Presentations

  • Module Learning Outcomes:  . Module Contents: MODULE 10

    Module Learning Outcomes: . Module Contents: MODULE 10

    •Rafters •Framing •Openings - Gables and Skylights •Roofing Materials, Sheathing & Insulation •Air Movement •Attics. Ridge Board. Common Rafter. Valley Rafter. Hip Rafter. Jack Rafter. Cripple Jack Rafter. Gable. Valley Jack Rafter. Rafters. ARCH 28544 - Architectural Studio 2 ...
  • Q Dedicated to enhancing the health and safety

    Q Dedicated to enhancing the health and safety

    Good morning everyone, my name is Adam van Dijk and I to am part of the QPHI team described in Ms. Basil's and Ms. Donovan's presentations. I am an epidemiologist and graduated from the Queen's epi. program last fall. My...
  • Project Introduction Emoji-speak and titles Task: You will

    Project Introduction Emoji-speak and titles Task: You will

    Properly-cited textual evidence from the novel (NOTE: need at least 2 pieces of evidence) Evaluation of likes/dislikes/impact based on discussion of literary components such as, in this model, thematic ideas and theme (NOTE: model is incomplete - coverage of these...
  • Music Appreciation: The History of Rock

    Music Appreciation: The History of Rock

    There was also a cheap Hawaiian guitar which Pop and guitarist Ron Asheton would take turns in plucking to produce a simulated sitar drone, while drummer Scott Asheton pounded away at a set of oil drums with a ball peen...
  • 3rd Grade Newsletter Aug. 14- Aug. 18 Whats

    3rd Grade Newsletter Aug. 14- Aug. 18 Whats

    3. rd. Grade. Newsletter. Aug. 14- Aug. 18. What's Happening in 3. rd. Grade? This week we are reading "Alexander, Who Used to Be Rich Last Sunday." We will focus on some vocabulary words in this story.
  • PowerPoint-Präsentation

    PowerPoint-Präsentation

    Dieses Quiz kann frei kopiert werden! Inhalt und Impressum Oktober 2008, Gestaltung Ingo Mennerich, Fotos: Verschiedene Autoren (siehe Angaben)
  • Integrated National Education Information System (iNEISTM) Training Slides

    Integrated National Education Information System (iNEISTM) Training Slides

    Download School data File. Import extracted file into Edval. ... Please fill in the attendance form and class mapping file before leaving. Please leave a copy TT file to consultants. Please request assist from Helpdesk if you need help after...
  • Day 1 Handouts: 1) Blooms taxonomy 2) Learning

    Day 1 Handouts: 1) Blooms taxonomy 2) Learning

    Levels of learning goals Next Activity: Work on your learning goals with your table group Testing achievement of learning goals: Slide 17 The Montillation of Traxoline when assessment goes astray Slide 19 Slide 20 Slide 21 Ideas for implementation: Slide...