# ECE 5984: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning Topics: SVM Multi-class SVMs Neural Networks Multi-layer Perceptron Readings: Barber 17.5, Murphy 16.5 Stefan Lee Virginia Tech HW2 Graded

Mean 63/61 = 103% Max: 76 Min: 20 (C) Dhruv Batra 2 Administrativia HW3 Due: Nov 7th 11:55PM You will implement primal & dual SVMs Kaggle competition: Higgs Boson Signal vs Background classification

(C) Dhruv Batra 3 Administrativia (C) Dhruv Batra 4 Recap of Last Time (C) Dhruv Batra

5 Linear classifiers Which line is better? w.x = j w(j) x(j) 6 Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra

Slide Credit: Carlos Guestrin 7 Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 8

Dual SVM formulation the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 9 Dual SVM formulation the non-separable case

(C) Dhruv Batra Slide Credit: Carlos Guestrin 10 Dual SVM formulation the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

11 Why did we learn about the dual SVM? Builds character! Exposes structure about the problem There are some quadratic programming algorithms that can solve the dual faster than the primal The kernel trick!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin

12 Dual SVM interpretation: Sparsity (C) Dhruv Batra Slide Credit: Carlos Guestrin 13 Dual formulation only depends on dot-products, not on w! (C) Dhruv Batra

14 Common kernels Polynomials of degree d Polynomials of degree up to d 2 Gaussian kernel / Radial Basis Function Sigmoid (C) Dhruv Batra

Slide Credit: Carlos Guestrin 15 Plan for Today SVMs Multi-class Neural Networks (C) Dhruv Batra

16 What about multiple classes? (C) Dhruv Batra Slide Credit: Carlos Guestrin 17 One against All (Rest) y2

Not y2 Learn N classifiers: y1 Not y1 Noty3 (C) Dhruv Batra y3 Slide Credit: Carlos Guestrin

18 One against One y2 Learn N-choose-2 classifiers: y1 y1 y2

(C) Dhruv Batra y3 y3 Slide Credit: Carlos Guestrin 19 Problems (C) Dhruv Batra

Image Credit: Kevin Murphy 20 Learn 1 classifier: Multiclass SVM Simultaneously learn 3 sets of weights (C) Dhruv Batra Slide Credit: Carlos Guestrin 21

Learn 1 classifier: Multiclass SVM (C) Dhruv Batra Slide Credit: Carlos Guestrin 22 Addressing non-linearly separable data Option 1, nonlinear features Choose non-linear features, e.g., Typical linear features: w0 + i wi xi Example of non-linear features:

Degree 2 polynomials, w0 + i wi xi + ij wij xi xj Classifier hw(x) still linear in parameters w As easy to learn Data is linearly separable in higher dimensional spaces Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

Addressing non-linearly separable data Option 2, nonlinear classifier Choose a classifier hw(x) that is non-linear in parameters w, e.g., Decision trees, neural networks, More general than linear classifiers But, can often be harder to learn (non-convex optimization required) Often very useful (outperforms linear classifiers) In a way, both ideas are related (C) Dhruv Batra

Slide Credit: Carlos Guestrin 24 New Topic: Neural Networks (C) Dhruv Batra 25 Synonyms Neural Networks Artificial Neural Network (ANN)

Feed-forward Networks Multilayer Perceptrons (MLP) Types of ANN Convolutional Nets Autoencoders Recurrent Neural Nets [Back with a new name]: Deep Nets / Deep Learning (C) Dhruv Batra 26 Biological Neuron

(C) Dhruv Batra 27 Artificial Neuron Perceptron (with step function) Logistic Regression (with sigmoid) (C) Dhruv Batra 28

Sigmoid w0=2, w1=1 w0=0, w1=1 w0=0, w1=0.5 1 1 1

0.9 0.9 0.9 0.8 0.8 0.8

0.7 0.7 0.7 0.6 0.6 0.6 0.5

0.5 0.5 0.4 0.4 0.4 0.3

0.3 0.3 0.2 0.2 0.2 0.1 0.1

0.1 0 -6 -4 -2 (C) Dhruv Batra 0

2 4 6 0 -6 -4 -2

0 2 4 Slide Credit: Carlos Guestrin 6 0 -6

-4 -2 0 2 4 6

29 Many possible response functions Linear Sigmoid Exponential Gaussian Limitation A single neuron is still a linear decision boundary What to do?

(C) Dhruv Batra 31 (C) Dhruv Batra 32 Limitation A single neuron is still a linear decision boundary What to do? Idea: Stack a bunch of them together!

(C) Dhruv Batra 33 Hidden layer 1-hidden layer feed-forward network: On board (C) Dhruv Batra 34 Neural Nets

Best performers on OCR http://yann.lecun.com/exdb/lenet/index.html NetTalk Text to Speech system from 1987 http://youtu.be/tXMaFhO6dIY?t=45m15s Rick Rashid speaks Mandarin http://youtu.be/Nu-nlQqFCKg?t=7m30s (C) Dhruv Batra 35

Universal Function Approximators Theorem 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi 89] (C) Dhruv Batra 36 Neural Networks Demo

http://playground.tensorflow.org/ (C) Dhruv Batra 37

## Recently Viewed Presentations

• Another assay as a colorimetric assays is the metabolic activity of viable cells. Tetrazolium salts are reduced only by metabolically active cells. Thus, 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) can be reduced to a blue colored formazan and the amount of formazan can...
• What Was the Renaissance andA New Worldview. The Renaissance was a time of relativity and change in the areas of political, social, economics, and cultural as well as how people viewed themselves and the world. With a reawakened interest of...
• Type in your mailing address to your college and your return address according to the model on the board. Check in with me and then you can print! Your Name (Lipinski) c/o Casteel High School. 24901 S. Power Rd. Queen...
• Partnerships in Learning: linking early childhood services, families and schools for optimal development Dr Jean Ashton with Ass Prof Christine Woodrow, Ass Prof Christine Johnston, Ass Prof June Wangmann, Ms Tanya James & Ms Lin Singh
• Pay . Band or Grade. Slide 3- 13. DCIPS Occupational Structure. Common Work Level. Common Work Level. Slide 3- 14. DCIPS Occupational Structure. Pay Bands and Grades within Work Levels. Pay Bands and Grades within Work Levels. Slide 3- 15....
• Contrasts: planned comparisons When an experiment is designed to test a specific hypothesis that some treatments are different from other treatments, we can use contrasts to test for significant differences between these specific treatments. Contrasts are more powerful than multiple...
• Cell Specialization Chart. Get a copy of the Cell Book. Create a chart on Page 103 (See board for template) May Do's. Student Data Trackers and Conferencing. Vocabulary. Update Notebook and Table of Contents
• Most likely you will not have time on the initial day of training. This slide can be used at a later date). * A good brain break at this point is "Tony Chestnut." You will need to link the sound...