Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model
Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill Part of Speech Tagging Train a model on a set of hand-tagged sentences Find best sequence of POS tags for new sentence Generative Models Hidden Markov Model HMM Discriminative Models Maximum Entropy Markov Model (MEMM) Brown Corpus ~57,000 tagged sentences 87 tags (reduced to 45 for Penn TreeBank tagging) ~300 tags including compound tags that_DT fire's_NN+BEZ too_QL big_JJ ._. fires = fire_NN is_BEZ Hidden Markov Models DT (singular determiner) NN+BEZ (common noun + is)
That Fires QL (qualifier) too JJ (adjective) big Set of hidden states (POS tags) Set of observations (word tokens) Dependents ONLY on current tag HMM parameters Transition probabilities : P(ti|t0ti) = P(ti|ti-1) Observation probabilities: P(wi|t0tn,w0wn) = P(wi|ti) Initial tag distribution: P(t0) HMM Best Tag Sequence For HMM, the Viterbi algorithm finds the most probable tagging for a new sentence For re-ranking later, we want not the best tagging but the k best tagging for each sentence
HMM Beam Search Word 0 Word 0 Step1 Enumerate all possible tags for the first word Start ... Step 2 Word 0 Evaluate each tagging using trained HMM keep only the best k (first word sentence taggings) Word 1 Step 3 For each of the k taggings of the previous step, enumerate all possible tags for the second word
Step 4 Evaluate each two-word sentence tagging and discard all the k best. Repeat for all words in the sentence Word 2 ... Word 2 Word 2 Start ... ... Word 2 Word 1 Word 2 ... Word 2 MaxEnt Re-ranking
After beam search, we have the k best taggings for our sentence Use trained MaxEnt model to select most probable sequence of tags Word 1 ... ... ... ... ... ... ... Word t Start Word 1
Word t Word t Results Feature Current word Previous tag
Word contains a numeral -ing -ness -ity -ed -able -s -ion -al -ive -ly Word is capitalized Word is hyphenated Word is all uppercase Word is all uppercase with a numeral Word is capitalized and a word ending in Co. or Inc. is found within 3 words ahead Results Accuracy Baseline Most frequent class tagger: 73.41% (24%) HMM Viterbi tagger: 92.96% (32.76% on ) 92 60 91.5 50 91
Word Wall practice sheets can be found at this site. A typical week in the WWW block may look like… Monday Word Wall Making Words Tuesday Word Wall Making Words Wednesday Word Wall Rounding up the Rhymes Thursday Word Wall...
The PDCA Cycle. The most basic Quality Improvement Cycle. PDCA Cycle. Plan. Do. Check. Act. Plan. Define Customer requirements for product or service. 1. Marketing Research for new product or service. ... Is there an Action Plan for Implementation? 3....
Do people from neighbourhoods with poor reputations face 'postcode discrimination' when looking for work? Paper presented to the 2012 Social Policy Association conference, Social Policy in an unequal world,
The brussels-iiAregulation among the community instruments . in the field of judicial cooperation in civil matters ) jurisdiction. recognition and enforcement. of judgments. conflict of laws. civil and commercial matters:
a double polar covalent bond and . also. forms what's called a. COORDINATE COVALENT BOND. The oxygen electrons coordinate this situation so that carbon "gets" an octet in a sort-of cheating way. Weird, but it happens! C=O ~ 81. My...
Science is a body of knowledge about the Universe. Mathematics is a language that can describe relationships and change in relationships in a rational way. Science generally uses mathematics as a tool to describe science. A few scientists including myself...
Ready to download the document? Go ahead and hit continue!