# CS 2750: Machine Learning CS 2750: Machine Learning Linear Algebra and Matlab Prof. Adriana Kovashka University of Pittsburgh January 10, 2017 Announcement TA wont be back until the last week of January Skype office hours: Tuesday/Wednesday 9:30am to 11:30am Username: vicl720

Please do Doodle: http://doodle.com/poll/gtaxphum9mutd74x Linear Algebra Primer Professor Fei-Fei Li Stanford Another, very in-depth linear algebra review from CS229 is available here: http://cs229.stanford.edu/section/cs229-linalg.pdf And a video discussion of linear algebra from EE263 is here (lectures 3 and 4): http://see.stanford.edu/see/lecturelist.aspx?coll=17005383-19c6-49ed-9497-2ba8bfcfe5f6

1/21/20 3 Vectors and Matrices Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, word counts, movie ratings, pixel brightnesses, etc. Well define some common uses and standard operations on them.

Fei-Fei Li 3 Vector A column vector A row vector where where

denotes the transpose operation Fei-Fei Li 5 Vector Youll want to keep track of the orientation of your vectors when programming in MATLAB. You can transpose a vector V in MATLAB by writing V.

Fei-Fei Li 6 Vectors have two main uses Vectors can represent an offset in 2D or 3D space Points are just vectors from the origin Fei-Fei Li

Data can also be treated as a vector Such vectors dont have a geometric interpretation, but calculations like distance still have value 7 Matrix A matrix

is an array of numbers with size by , i.e. m rows and n columns. If Fei-Fei Li , we say that is square. 8 Matrix Operations

Addition Can only add a matrix with matching dimensions, or a scalar. Scaling 1/21/20 9 Matrix Operations

Inner product (dot product) of vectors Multiply corresponding entries of two vectors and add up the result xy is also |x||y|Cos( the angle between x and y ) Fei-Fei Li 10 Matrix Operations Inner product (dot product) of vectors If B is a unit vector, then AB gives the length of A

which lies in the direction of B (projection) (if B is unit-length hence norm is 1) Adapted from Fei-Fei Li 11 Norms L1 norm L2 norm

Lp norm (for real numbers p 1) Matrix Operations Multiplication The product AB is: Each entry in the result is (that row of A) dot product with (that column of B) Fei-Fei Li

13 Matrix Operations Multiplication example: Each entry of the matrix product is made by taking the dot product of the corresponding row in the left matrix, with the corresponding column in the right

one. Fei-Fei Li 14 Matrix Operations Transpose flip matrix, so row 1 becomes column 1 A useful identity: Fei-Fei Li

15 Special Matrices Identity matrix I Square matrix, 1s along diagonal, 0s elsewhere I [another matrix] = [that matrix] Diagonal matrix Square matrix with numbers along diagonal, 0s elsewhere

A diagonal [another matrix] scales the rows of that matrix Fei-Fei Li 16 Special Matrices Symmetric matrix Fei-Fei Li 17

Inverse Given a matrix A, its inverse A-1 is a matrix such that AA-1 = A-1A = I E.g. Inverse does not always exist. If A-1 exists, A is invertible or non-singular. Otherwise, its singular. Fei-Fei Li 18

Matrix Operations MATLAB example: >> x = A\B x = 1.0000 -0.5000 Fei-Fei Li 19 Pseudo-inverse

Say you have the matrix equation AX=B, where A and B are known, and you want to solve for X You could use MATLAB to calculate the inverse and premultiply by it: A -1AX=A1B X=A-1B MATLAB command would be inv(A)*B But calculating the inverse for large matrices often brings problems with computer floating-point resolution, or your matrix might not even have an inverse. Fortunately, there are workarounds. Instead of taking an inverse, directly ask MATLAB to solve for X in AX=B, by typing A\B MATLAB will try several appropriate numerical methods (including the pseudoinverse if the inverse doesnt exist) MATLAB will return the value of X which solves the equation

If there is no exact solution, it will return the closest one If there are many solutions, it will return the smallest one Fei-Fei Li 20 Matrix Operation Properties Matrix addition is commutative and associative A+B = B+A A + (B + C) = (A + B) + C Matrix multiplication is associative and

distributive but not commutative A(B*C) = (A*B)C A(B + C) = A*B + A*C A*B != B*A Linear independence Suppose we have a set of vectors v1, , vn If we can express v1 as a linear combination of the other vectors v2vn, then v1 is linearly dependent on the other vectors. The direction v1 can be expressed as a combination of the directions v2vn. (E.g. v1 = .7 v2 -.7 v4)

If no vector is linearly dependent on the rest of the set, the set is linearly independent. Common case: a set of vectors v1, , vn is always linearly independent if each vector is perpendicular to every other vector (and non-zero) Fei-Fei Li 22 Linear independence Linearly independent set

Fei-Fei Li Not linearly independent 23 Matrix rank Column/row rank Column rank always equals row rank Matrix rank

If a matrix is not full rank, inverse doesnt exist Inverse also doesnt exist for non-square matrices Fei-Fei Li 24 Singular Value Decomposition (SVD) There are several computer algorithms that can factor a matrix, representing it as the product of some other matrices The most useful of these is the Singular Value

Decomposition Represents any matrix A as a product of three matrices: UVVVT MATLAB command: [U,S,V] = svd(A); Fei-Fei Li 25 Singular Value Decomposition (SVD) UVVVT = A Where UV and V are rotation matrices, and V is

a scaling matrix. For example: Fei-Fei Li 26 Singular Value Decomposition (SVD) In general, if A is m x n, then UV will be m x m, V will be m x n, and VT will be n x n. Fei-Fei Li

27 Singular Value Decomposition (SVD) UV and V are always rotation matrices. Geometric rotation may not be an applicable concept, depending on the matrix. So we call them unitary matrices each column is a unit vector. V is a diagonal matrix The number of nonzero entries = rank of A The algorithm always sorts the entries high to low

Fei-Fei Li 28 Singular Value Decomposition (SVD) M = UVVVT Illustration from Wikipedia SVD Applications Weve discussed SVD in terms of geometric

transformation matrices But SVD of a data matrix can also be very useful To understand this, well look at a less geometric interpretation of what SVD is doing Fei-Fei Li 30 SVD Applications Look at how the multiplication works out, left to right: Column 1 of UV gets scaled by the first value from V.

The resulting vector gets scaled by row 1 of VT to produce a contribution to the columns of A Fei-Fei Li 31 SVD Applications + = Each product of (column i of U)(value i from )(row i

of VT) produces a component of the final A. Fei-Fei Li 32 SVD Applications Were building A as a linear combination of the columns of U Using all columns of U, well rebuild the original matrix perfectly But, in real-world data, often we can just use the first few columns of U and well get something close (e.g. the first

Apartial, above) Fei-Fei Li 33 SVD Applications We can call those first few columns of U the Principal Components of the data They show the major patterns that can be added to produce the columns of the original matrix The rows of VT show how the principal components

are mixed to produce the columns of the matrix Fei-Fei Li 34 SVD Applications We can look at V to see that the first column has a large effect

Fei-Fei Li while the second column has a much smaller effect in this example 35 Principal Component Analysis Remember, columns of U are the Principal Components of the data: the major patterns that can be added to produce

the columns of the original matrix One use of this is to construct a matrix where each column is a separate data sample Run SVD on that matrix, and look at the first few columns of U to see patterns that are common among the columns This is called Principal Component Analysis (or PCA) of the data samples Fei-Fei Li 36 Principal Component Analysis

Often, raw data samples have a lot of redundancy and patterns PCA can allow you to represent data samples as weights on the principal components, rather than using the original raw form of the data By representing each sample as just those weights, you can represent just the meat of whats different between samples This minimal representation makes machine learning and other algorithms much more efficient Fei-Fei Li

37 Example: Eigenfaces Images of faces: represent each as the concatenation of its pixels (column vectors) Stack together horizontally to get matrix A [U, S, V] = svd(A); First four columns of U: Can represent each face as a linear combination of the first few columns of U Image from Alexander Ihler

Addendum: How is SVD computed? For this class: tell MATLAB to do it. Use the result. But, if youre interested, one computer algorithm to do it makes use of Eigenvectors The following material is presented to make SVD less of a magical black box. But you will do fine in this class if you treat SVD as a magical black box, as long as you remember its properties from the previous slides. Fei-Fei Li

39 Eigenvector definition Suppose we have a square matrix A. We can solve for vector x and scalar such that Ax= x In other words, find vectors where, if we transform them with A, the only effect is to scale them with no change in direction These vectors are called eigenvectors (German for self vector of the matrix), and the scaling factors are called eigenvalues

An m x m matrix will have m eigenvectors where is nonzero Fei-Fei Li 40 Finding eigenvectors Computers can find an x such that Ax= x using this iterative algorithm: x=random unit vector while(x hasnt converged) x=Ax

normalize x x will quickly converge to an eigenvector Some simple modifications will let this algorithm find all eigenvectors Fei-Fei Li 41 Finding SVD Eigenvectors are for square matrices, but SVD is for all matrices

To do svd(A), computers can do this: Take eigenvectors of AAT (matrix is always square). These eigenvectors are the columns of UV. Square root of eigenvalues are the singular values (the entries of V). Take eigenvectors of ATA (matrix is always square). These eigenvectors are columns of V (or rows of VT) Fei-Fei Li 42

Finding SVD Moral of the story: SVD is fast, even for large matrices Its useful for a lot of stuff There are also other algorithms to compute SVD or part of the SVD MATLABs svd() command has options to efficiently compute only what you need, if performance becomes an issue A detailed geometric explanation of SVD is here: http://www.ams.org/samplings/feature-column/fcarc-svd Fei-Fei Li

43 Matlab Tutorial http://www.cs.pitt.edu/~kovashka/cs2750_sp17/tutorial.m http://www.cs.pitt.edu/~kovashka/cs2750_sp17/myfunction.m http://www.cs.pitt.edu/~kovashka/cs2750_sp17/myotherfunction.m Please cover whatever we dont finish at home. Tutorials and Exercises https://people.cs.pitt.edu/~milos/courses/cs2

750/Tutorial/ http://www.math.udel.edu/~ braun/M349/Matlab_probs2.pdf http://www.facstaff.bucknell.edu/maneval/hel p211/basicexercises.html Do Problems 1-8, 12 Most also have solutions Ask the TA if you have any problems