Lecture Title - University of Wisconsin-Madison

Lecture Title - University of Wisconsin-Madison

Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation Intro. ANN & Fuzzy Systems Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor Classifier MLP classifier (C) 2001 by Yu Hen Hu 2 Intro. ANN & Fuzzy Systems An Example

Consider classify eggs into 3 categories with labels: medium, large, or jumbo. The classification will be based on the weight and length of each egg. Decision rules: 1. If W < 10 g & L < 3cm, then the egg is medium 2. If W > 20g & L > 5 cm then the egg is jumbo 3. Otherwise, the egg is large Three components in a pattern classifier: Category (target) label Features Decision rule L W (C) 2001 by Yu Hen Hu 3 Intro. ANN & Fuzzy Systems Statistical Pattern Classification The objective of statistical pattern classification is to draw an optimal decision

rule given a set of training samples. The decision rule is optimal because it is designed to minimize a cost function, called the expected risk in making classification decision. This is a learning problem! (C) 2001 by Yu Hen Hu Assumptions 1. Features are given. Feature selection problem needs to be solved separately. Training samples are randomly chosen from a population 2. Target labels are given Assume each sample is assigned to a specific, unique label by the nature. Assume the label of training

samples are known. 4 Intro. ANN & Fuzzy Systems Pattern Classification Problem Let X be the feature space, and C = {c(i), 1 i M} be M class labels. For each x X, it is assumed that the nature assigned a class label t(x) C according to some probabilistic rule. Randomly draw a feature vector x from X, P(c(i)) = P(x c(i)) is the a priori probability that t(x) = c(i) without referring to x. P(c(i)|x) = P(x c(i)|x) is the posteriori probability that t(x) = c(i) given the value of x P(x|c(i)) = P(x |x c(i)) is the conditional probability (a.k.a. likelihood function) that x will assume its value given that it is drawn from class c(i). P(x) is the marginal probability that x will assume its value without referring to which class it belongs to.

Use Bayes Rule, we have P(x|c(i))P(c(i)) = P(c(i)|x)P(x) Also, P(c(i ) | x) P( x | c(i )) P(c(i )) M P( x | c(i)) P(c(i)) i 1 (C) 2001 by Yu Hen Hu 5 Intro. ANN & Fuzzy Systems Decision Function and Prob. Mis-Classification Given a sample x X, the objective of statistical pattern classification is to design a decision rule g(x) C to assign a label to x. If g(x) = t(x), the naturally assigned class label, then it is a correct classification. Otherwise, it is a misclassification. Define a 0-1 loss function: 0 if g ( x) t ( x) ( x | g ( x)) 1 if g ( x) t ( x)

(C) 2001 by Yu Hen Hu Given that g(x) = c(i*), then P(( x | g ( x) c(i*)) 0 | x) P (t ( x) c(i*) | x) P(c(i*) | x) Hence the probability of misclassification for a specific decision g(x) = c(i*) is P(( x | g ( x) c(i*)) 1 | x) 1 P (c(i*) | x) Clearly, to minimize the Pr. of mis-classification for a given x, the best choice is to choose g(x) = c(i*) if P(c(i*)|x) > P(c(i)|x) for i i* 6 Intro. ANN & Fuzzy Systems MAP: Maximum A Posteriori Classifier The MAP classifier stipulates that the classifier that minimizes pr. of misclassification should choose g(x) = c(i*) if P(c(i*)|x) > P(c(i)|x), i i*. This is an optimal decision rule. Unfortunately, in real world applications, it is often difficult to estimate P(c(i)|x). (C) 2001 by Yu Hen Hu Fortunately, to derive the optimal

MAP decision rule, one can instead estimate a discriminant function Gi(x) such that for any x X, i i*. Gi*(x) > Gi(x) iff P(c(i*)|x) > P(c(i)|x) Gi(x) can be an approximation of P(c(i)|x) or any function satisfying above relationship. 7 Intro. ANN & Fuzzy Systems Maximum Likelihood Classifier Use Bayes rule, p(c(i)|x) = p(x|c(i))p(c(i))/p(x). Hence the MAP decision rule can be expressed as: g(x) = c(i*) if p(c(i*))p(x|c(i*)) > p(c(i))p(x|c(i)), i i*. Under the assumption that the a priori Pr. is unknown, we may assume p(c(i)) = 1/M. As such, maximizing p(x|c(i)) is equivalent to maximizing p(c(i)|c).

(C) 2001 by Yu Hen Hu The likelihood function p(x| c(i)) may assume a univariate Gaussian model. That is, p(x|c(i)) ~ N(i,i) i,i can be estimated using samples from {x|t(x) = c(i)}. A priori pr. p(c(i)) can be estimated as: {x; # x s. t. t(x) c(i)} P (c(i )) |X| 8 Intro. ANN & Fuzzy Systems Nearest-Neighbor Classifier Let {y(1), , y(n)} X be n samples which has already been classified. Given a new sample x, the NN decision rule chooses g(x) = c(i) if y (i*) Min. || y (i ) x || 1i n is labeled with c(i). As n , the prob. Mis-classification using NN classifier is at

most twice of the prob. Mis-classification of the optimal (MAP) classifier. k-Nearest Neighbor classifier examine the k-nearest, classified samples, and classify x into the majority of them. Problem of implementation: require large storage to store ALL the training samples. (C) 2001 by Yu Hen Hu 9 Intro. ANN & Fuzzy Systems MLP Classifier Each output of a MLP will be used to approximate the a posteriori probability P(c(i)|x) directly. The classification decision then amounts to assign the feature to the class whose corresponding output at the MLP is maximum. During training, the classification labels (1 of N encoding) are presented as target values (rather than the true, but

unknown, a posteriori probability) (C) 2001 by Yu Hen Hu Denote y(x,W) to be the ith output of MLP, and t(x) to be the corresponding target value (0 or 1) during the training. e 2 (t ) E || t ( x) y ( x,W ) ||2 E || t ( x) E[t ( x) | x] E[t ( x) | x] y ( x,W ) ||2 } E || t ( x) E[t ( x) | x] ||2 E || E[t ( x) | x] y ( x, W ) ||2 E || E[t ( x) | x] y ( x, W ) ||2 Hence y(x,W) will approximate E(t(x)|x) = P(c(i)|x) 10

Recently Viewed Presentations

  • CNC Machining - center-stanton.k12.nd.us

    CNC Machining - center-stanton.k12.nd.us

    Medieval times: gears were used to transmit power, and cams and rods were used to convert circular motion to reciprocal motion. Mechanization: combining simple machines to make compound machines - didn't become a part of manufacturing until the late 1700s.
  • C&amp;E News September 24, 2012 - University of Idaho

    C&E News September 24, 2012 - University of Idaho

    Chem 409 . Dr. Frank Cheng. 26A Renfrew Hall. 5-6387. [email protected] Office Hours: M&W 2:30-4:30 This course will introduce you into aspects of chemistry not covered in formal courses. Among those topics will be how to give a scientific presentation.
  • La recherche d &#x27;information en Histoire

    La recherche d 'information en Histoire

    Ressources documentaires la base Historical Abstracts Fenêtre des résultats : Export Records = possibilité d'exporter ses résultats dans un outil de gestion de références bibliographiques : Refworks, EndNote, Procite, Reference Manager + e-mail choix entre référence abrégée et référence complète...
  • N YS CO MMO N COR E MAT

    N YS CO MMO N COR E MAT

    In Lesson 3, students use a vertical tape diagram from the application problem to create a scaled bar graph. Direct the participants to create a bar graph to represent the number of fish in Sal's Pet Store. A blank grid...
  • Title of Presentation - Rexburg Hams

    Title of Presentation - Rexburg Hams

    Focus of HamNet is Emergency Communications. Since the beginning of the BroadBand-HamNetproject, the focus has been on emergency communications. HamNet is one more tool in our emergency communication tool belt. More emergency communicators with HamNet knowledge and experience are needed
  • Powerpoint Jeopardy

    Powerpoint Jeopardy

    POWERPOINT JEOPARDY Subject: Jeopardy Template Description: www.edtechnetwork.com Keywords: Jeopardy Powerpoint Template Educational Technology Category: Jeopardy Template Last modified by: Gordon Kayla Company: Educational Technology Network
  • The Moon and its Phases - Science A 2 Z

    The Moon and its Phases - Science A 2 Z

    Can you see the Moon during the day? Does the Moon change shape? How many shapes does the Moon make? * Get class discussion about what they have observed when they see the Moon at night. Ask them how many...
  • Rise of Labor Unions in the 19th Century Gilded Age

    Rise of Labor Unions in the 19th Century Gilded Age

    What does this mean? Knights of Labor (1869- 1886) The aims of the Knights of Labor included the following: • An eight-hour work day • Termination of child labor ... Rise of Labor Unions in the 19th Century Gilded Age...