Multinomial Logit Sociology 229: Advanced Regression Copyright 2010 by Evan Schofer Do not copy or distribute without permission Announcements Short assignment 1 handed out today Due at start of class next week Agenda Minor follow-up to last class: Marginal change in logistic regression Models for polytomous outcomes Ordered logistic regression Multinomial logistic regression Conditional logit: models for alternative-specific data Marginal Change in Logit Issue: How to best capture effect size in nonlinear models? % Change in odds ratios for 1-unit change in X Change in actual probability for 1-unit change in X Either for hypothetical cases or an actual case Another option: marginal change The actual slope of the curve at a specific point Again, can be computed for real or hypothetical cases Use adjust (stata 9/10) or margins (stata 11) Recall from calculus: derivatives are slopes... So, a marginal change is just a derivative.

Marginal vs Discrete Change in Logit Long and Freese 2006:169 Ordered Logit: Motivation Issue: Many categorical dependent variables are ordered Ex: strongly disagree, disagree, agree, strongly agree Ex: social class Linear regression is often used for ordered categorical outcomes Ex: Strongly disagree=0, disagree=1, agree=2, etc. This makes arbitrary usually unjustifiable assumptions about the distance between categories Why not: Strongly disagree=0, disagree=3, agree=3.5? If numerical values assigned to categories do not accurately reflect the true distance, linear regression may not be appropriate Ordered Logit: Motivation Strategies to deal with ordered categorical variables 1. Use OLS regression anyway Commonly done; but can give incorrect results Possibly check robustness by varying coding of interval between outcomes 2. Collapse variables to dichotomy, use a binary model such as logit or probit

Combine strongly disagree & disagree, strongly agree & agree Model disagree vs. agree Works fine, but throws away useful information. Ordered Logit: Motivation Strategies to deal with ordered categorical variables (contd): 3. If you arent confident about ordering, use multinomial logistic regression (discussed later) 4. Ordered logit / ordinal probit 5. Stereotype logit Not discussed. Ordered Logit Ordered logit is often conceptualized as a latent variable model Observed responses result from individuals falling within ranges on an underlying continuous measure Example: There is some underlying variable agreement If you fall below a certain (unobserved) threshold, youll respond strongly disagree Whereas logit looks at P(Y=1), ologit looks at probability of falling in particular ranges Ordered Logit Example: Environment Spending Government spending on the environment

GSS question: Are we spending too little money, about the right amount, too much? GSS variable NATENVIR from years 2000, 02, 04, 06 Recoded: 1 = too little, 2 = about right, 3 = too much Ordered logit Example Government spending on environment . ologit envspend educ incomea female age dblack class city suburb attendchurch Ordered logistic regression Log likelihood = -4191.1232 Number of obs LR chi2(9) Prob > chi2 Pseudo R2 = = = = 5169 192.88 0.0000 0.0225 -----------------------------------------------------------------------------envspend | Coef. Std. Err. z

P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .0419784 .0108409 3.87 0.000 .0207307 .0632261 income | .0023984 .0057545 0.42 0.677 -.0088802 .013677 female | .2753095 .0591542 4.65 0.000 .1593693 .3912496 age | -.012762 .0017667 -7.22 0.000 -.0162247 -.0092994 dblack |

.2898025 .0930178 3.12 0.002 .1074911 .472114 class | -.0719344 .0485173 -1.48 0.138 -.1670266 .0231578 city | .227895 .080983 2.81 0.005 .0691711 .3866188 suburb | .0752643 .0695921 1.08 0.279 -.0611337 .2116624 attendchurch | -.086372 .0109998 -7.85 0.000

-.1079312 -.0648128 -------------+---------------------------------------------------------------/cut1 | -2.872315 .1930206 -3.250628 -2.494001 /cut2 | -.8156047 .1867621 -1.181652 -.4495577 Instead of a constant, ologit indicates cutpoints, which can be used to compute probabilities of falling into a particular value of Y Ordered logit Example Ologit results can be shown as odds ratios . ologit envspend educ incomea female age dblack class city suburb attendchur, or Ordered logistic regression Log likelihood = -4191.1232 Number of obs LR chi2(9) Prob > chi2 Pseudo R2 = = = =

5169 192.88 0.0000 0.0225 -----------------------------------------------------------------------------envspend | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | 1.042872 .0113056 3.87 0.000 1.020947 1.065268 incomea | 1.002401 .0057683 0.42 0.677 .9911591 1.013771 female | 1.316938 .0779025 4.65 0.000 1.172771 1.478828

age | .987319 .0017443 -7.22 0.000 .9839063 .9907437 dblack | 1.336164 .124287 3.12 0.002 1.113481 1.60338 class | .930592 .0451498 -1.48 0.138 .8461771 1.023428 city | 1.255953 .1017109 2.81 0.005 1.07162 1.471995 suburb | 1.078169 .0750321

1.08 0.279 .9406974 1.235731 attend | .9172529 .0100896 -7.85 0.000 .8976894 .9372429 -------------+---------------------------------------------------------------/cut1 | -2.872315 .1930206 -3.250628 -2.494001 /cut2 | -.8156047 .1867621 -1.181652 -.4495577 Women have 1.32 times the odds of falling in a higher category than men a difference of (1-1.31)*100 = 32%. Proportional Odds Assumption The fact that you can calculate odds ratios highlights a key assumption of ordered logit: Proportional odds assumption Also known as the parallel regression assumption Which also applies to ordered probit Model assumes that variable effects on the odds

of lower vs. higher outcomes are consistent Effect on odds of too little vs about right is same for about right vs too much Controlling for all other vars in the model If this assumption doesnt seem reasonable, consider stereotype logit or multinomial logit. Ologit Interpretation Like logit, interpretation is difficult because effect of Xs on Y is nonlinear Effects vary with values of all X variables Interpretation strategies are similar to logit: You can produce predicted probabilities For each category of Y: Y= 1, Y=2, Y=3 For real or hypothetical cases You can look at effect of change in X on predicted probabilities of Y Given particular values of X variables You can present marginal effects. Ordered logit vs. OLS Government spending on environment . reg envspend educ incomea female age dblack class city suburb attendchur Source | SS df

MS -------------+-----------------------------Model | 71.1243142 9 7.90270158 Residual | 1916.7124 5159 .371527894 -------------+-----------------------------Total | 1987.83672 5168 .384643328 Number of obs F( 9, 5159) Prob > F R-squared Adj R-squared Root MSE = = = = = = 5169 21.27 0.0000 0.0358 0.0341 .60953 -----------------------------------------------------------------------------envspend | Coef. Std. Err.

t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------educ | .012701 .0032069 3.96 0.000 .0064141 .0189878 income | .0006037 .0016821 0.36 0.720 -.002694 .0039013 female | .0900251 .0173081 5.20 0.000 .0560938 .1239563 age | -.0038736 .0005258 -7.37 0.000 -.0049044 -.0028428 dblack |

.0726494 .0261632 2.78 0.006 .0213585 .1239403 class | -.0165553 .0142495 -1.16 0.245 -.0444904 .0113797 city | .0555329 .0229917 2.42 0.016 .0104594 .1006065 suburb | .031217 .0205407 1.52 0.129 -.0090515 .0714855 attendchur | -.0243782 .0032213 -7.57 0.000 -.0306934

-.0180631 _cons | 2.618234 .0547459 47.83 0.000 2.510909 2.72556 In this case, OLS produced similar results to ordered logit. But, that doesnt always happen and you wont know if you dont check. Multinomial Logistic Regression What if you want have a dependent variable has several non-ordinal outcomes? Ex: Mullen, Goyette, Soares (2003): What kind of grad school? None vs. MA vs MBA vs Profl School vs PhD. Ex: McVeigh & Smith (1999). Political action Action can take different forms: institutionalized action (e.g., voting) or protest Inactive vs. conventional pol action vs. protest Other examples? Multinomial Logistic Regression Multinomial Logit strategy: Contrast outcomes with a common reference point Similar to conducting a series of 2-outcome logit models comparing pairs of categories

The reference category is like the reference group when using dummy variables in regression It serves as the contrast point for all analyses Example: Mullen et al. 2003: Analysis of 5 categories yields 4 tables of results: No grad school vs. MA No grad school vs. MBA No grad school vs. Profl school No grad school vs. PhD. Multinomial Logistic Regression Imagine a dependent variable with M categories Ex: 2000 Presidential Election: j = 3; Voting for Bush, Gore, or Nader J Probability of person i choosing category j must add to 1.0: p ij

j 1 pi1( Bush ) pi 2 (Gore ) pi 3( Nader ) 1 Multinomial Logistic Regression Option #1: Conduct binomial logit models for all possible combinations of outcomes Probability of Gore vs. Bush Probability of Nader vs. Bush Probability of Gore vs. Nader Note: This will produce results fairly similar to a multinomial output But: Sample varies across models Also, multinomial imposes additional constraints So, results will differ somewhat from multinomial logistic regression. Multinomial Logistic Regression We can model probability of each outcome as: K kj X kji pij e e j 1

J K kj X kji j 1 j 1 i = cases, j categories, k = independent variables Solved by adding constraint Coefficients sum to zero J j 1 jk 0 Multinomial Logistic Regression Option #2: Multinomial logistic regression Choose one category as reference Probability of Gore vs. Bush Probability of Nader vs. Bush Probability of Gore vs. Nader

Lets make Bush the reference category Output will include two tables: Factors affecting probability of voting for Gore vs. Bush Factors affecting probability of Nader vs. Bush. Multinomial Logistic Regression Choice of reference category drives interpretation of multinomial logit results Similar to when you use dummy variables Example: Variables affecting vote for Gore would change if reference was Bush or Nader! What would matter in each case? 1. Choose the contrast(s) that makes most sense Try out different possible contrasts 2. Be aware of the reference category when interpreting results Otherwise, you can make BIG mistakes Effects are always in reference to the contrast category. MLogit Example: Family Vacation Mode of Travel. Reference category = Train . mlogit mode income familysize Large families less likely to take bus (vs. train)

Multinomial logistic regression Number of obs LR chi2(4) Prob > chi2 Pseudo R2 Log likelihood = -138.68742 = = = = 152 42.63 0.0000 0.1332 -----------------------------------------------------------------------------mode | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------Bus | income | .0311874 .0141811 2.20

0.028 .0033929 .0589818 family size | -.6731862 .3312153 -2.03 0.042 -1.322356 -.0240161 _cons | -.5659882 .580605 -0.97 0.330 -1.703953 .5719767 -------------+---------------------------------------------------------------Car | income | .057199 .0125151 4.57 0.000 .0326698 .0817282 family size | .1978772 .1989113 0.99 0.320 -.1919817 .5877361

_cons | -2.272809 .5201972 -4.37 0.000 -3.292377 -1.253241 -----------------------------------------------------------------------------(mode==Train is the base outcome) Note: It is hard to directly compare Car vs. Bus in this table MLogit Example: Car vs. Bus vs. Train Mode of Travel. Reference category = Car . mlogit mode income familysize, base(3) Multinomial logistic regression Log likelihood = -138.68742 Number of obs LR chi2(4) Prob > chi2 Pseudo R2 = = = = 152 42.63 0.0000 0.1332

-----------------------------------------------------------------------------mode | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------Train | income | -.057199 .0125151 -4.57 0.000 -.0817282 -.0326698 family size | -.1978772 .1989113 -0.99 0.320 -.5877361 .1919817 _cons | 2.272809 .5201972 4.37 0.000 1.253241 3.292377 -------------+---------------------------------------------------------------Bus | income | -.0260117

.0139822 -1.86 0.063 -.0534164 .001393 family size | -.8710634 .3275472 -2.66 0.008 -1.513044 -.2290827 _cons | 1.706821 .6464476 2.64 0.008 .439807 2.973835 -----------------------------------------------------------------------------(mode==Car is the base outcome) Here, the pattern is clearer: Wealthy & large families use cars Stata Notes: mlogit Dependent variable: any categorical variable Dont need to be positive or sequential Ex: Bus = 1, Train = 2, Car = 3 Or: Bus = 0, Train = 10, Car = 35 Base category can be set with option: mlogit mode income familysize, baseoutcome(3)

Exponentiated coefficients called relative risk ratios, rather than odds ratios mlogit mode income familysize, rrr MLogit Example: Car vs. Bus vs. Train Exponentiated coefficients: relative risk ratios Multinomial logistic regression Log likelihood = -138.68742 Number of obs LR chi2(4) Prob > chi2 Pseudo R2 = = = = 152 42.63 0.0000 0.1332 -----------------------------------------------------------------------------mode | RRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------Train

| income | .9444061 .0118194 -4.57 0.000 .9215224 .9678581 familysize | .8204706 .1632009 -0.99 0.320 .5555836 1.211648 -------------+---------------------------------------------------------------Bus | income | .9743237 .0136232 -1.86 0.063 .9479852 1.001394 familysize | .4185063 .1370806 -2.66 0.008 .2202385 .7952627

-----------------------------------------------------------------------------(mode==Car is the base outcome) exp(-.057)=.94. Interpretation is just like odds ratios BUT comparison is with reference category. Predicted Probabilities You can predict probabilities for each case Each outcome has its own probability (they add up to 1) . predict predtrain predbus predcar if e(sample), pr . list predtrain predbus predcar 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. +--------------------------------+ | predtrain predbus predcar | |--------------------------------| | .3581157 .3089684 .3329159 |

| .448882 .1690205 .3820975 | | .3080929 .3106668 .3812403 | | .0840841 .0562263 .8596895 | | .2771111 .1665822 .5563067 | | .5169058 .279341 .2037531 | | .5986157 .2520666 .1493177 | | .3080929 .3106668 .3812403 | | .0934616 .1225238 .7840146 | | .6262593 .1477046 .2260361 | This case has a high predicted probability of traveling by car

This probabilities are pretty similar here Classification of Cases Stata doesnt have a fancy command to compute classification tables for mlogit But, you can do it manually Assign cases based on highest probability You can make table of all classifications, or just if they were classified correctly . gen predcorrect = 0 . replace predcorrect = 1 if pmode == mode (85 real changes made) First, I calculated the predicted mode and a dummy indicating whether prediction was correct . tab predcorrect predcorrect | Freq. Percent Cum. ------------+----------------------------------0 | 67 44.08 44.08 1 | 85 55.92

100.00 ------------+----------------------------------Total | 152 100.00 56% of cases were classified correctly Predicted Probability Across X Vars Like logit, you can show how probabilies change across independent variables However, adjust command doesnt work with mlogit So, manually compute mean of predicted probabilities Note: Other variables will be left as is unless you set them manually before you use predict . mean predcar, over(familysize) --------------------------Over | Mean -------------+------------predcar | 1 | .2714656 2 | .4240544 3 | .6051399 4 | .6232910 5 | .8719671 6 |

.8097709 Probability of using car increases with family size Note: Values bounce around because other vars are not set to common value. Note 2: Again, scatter plots aid in summarizing such results Stata Notes: mlogit Like logit, you cant include variables that perfectly predict the outcome Note: Stata logit command gives a warning of this mlogit command doesnt give a warning, but coefficient will have z-value of zero, p-value =1 Remove problematic variables if this occurs! Hypothesis Tests Individual coefficients can be tested as usual Wald test/z-values provided for each variable However, adding a new variable to model actually yields more than one coefficient If you have 4 categories, youll get 3 coefficients LR tests are especially useful because you can test for improved fit across the whole model LR Tests in Multinomial Logit Example: Does familysize improve model? Recall: It wasnt always significant maybe not!

Run full model, save results mlogit mode income familysize estimates store fullmodel Run restricted model, save results mlogit mode income estimates store smallmodel Compare: lrtest fullmodel smallmodel Likelihood-ratio test LR chi2(2) = 9.55 (Assumption: smallmodel nested in fullmodel) Prob > chi2 = Yes, model fit is significantly improved 0.0084 Multinomial Logit Assumptions: IIA Multinomial logit is designed for outcomes that are not complexly interrelated Critical assumption: Independence of Irrelevant Alternatives (IIA) Odds of one outcome versus another should be independent of other alternatives Problems often come up when dealing with individual choices

Multinomial logit is not appropriate if the assumption is violated. Multinomial Logit Assumptions: IIA IIA Assumption Example: Odds of voting for Gore vs. Bush should not change if Nader is added or removed from ballot If Nader is removed, those voters should choose Bush & Gore in similar pattern to rest of sample Is IIA assumption likely met in election model? NO! If Nader were removed, those voters would likely vote for Gore Removal of Nader would change odds ratio for Bush/Gore. Multinomial Logit Assumptions: IIA IIA Example 2: Consumer Preferences Options: coffee, Gatorade, Coke Might meet IIA assumption Options: coffee, Gatorade, Coke, Pepsi Wont meet IIA assumption. Coke & Pepsi are very similar substitutable. Removal of Pepsi will drastically change odds ratios for coke vs. others. Multinomial Logit Assumptions: IIA Issue: Choose categories carefully when doing multinomial logit! Long and Freese (2006), quoting Mcfadden:

Multinomial and conditional logit models should only be used in cases where the alternatives can plausibly be assumed to be distinct and weighed independently in the eyes of the decisionmaker. Categories should be distinct alternatives, not substitutes Note: There are some formal tests for violation of IIA. But they dont work well. Not recommended. See Long and Freese (2006) p. 243 Multinomial Logit Assumptions: IIA Ways to cope with violations of IIA 1. Combine similar options to avoid substitutes Example: coffee, Gatorade, Coke, Pepsi Combine into; Coffee, Gatorate, Carbonated drinks 2. Or, model outcomes as a set of choices First, whether to have a carbonated drink And, then conduct a subsequent analysis for the choice of Coke vs. Pepsi Nested logit 3. Use a model that doesnt require IIA assumption Ex: Multinomial probit which doesnt make this assumption but is computationally intensive. Multinomial Assumptions/Problems Aside from IIA, assumptions & problems of multinomial logit are similar to standard logit

Sample size You often want to estimate MANY coefficients, so watch out for small N Outliers Multicollinearity Model specification / omitted variable bias Etc. Real World Multinomial Example Gerber (2000): Russian political views Prefer state control or Market reforms vs. uncertain Older Russians more likely to support state control of economy (vs. being uncertain) Younger Russians prefer market reform (vs. uncertain) Multinomial Example 2 McVeigh, Rory and Christian Smith. 1999. Who Protests in America: An Analysis of Three Political Alternatives Inaction, Institutionalized Politics, or Protest. Sociological Forum, 14, 4:685-702. Alternative Specific Data Most variables of interest pertain to cases

Example: Travel to work by car, bus, or train? Individual cases vary in income which affects choices BUT, the various alternatives have differences Ex: The cost of travel differs for car vs bus vs train Ex: Cost can vary for individuals and each alternative Train might be cheap for people in some cities, not others Ex: The time of the trip also varies Sometimes we wish to model the impact of these alternative-specific differences on choice Either alone, or in conjunction with case-specific variables. Alternative Specific Data Issue: Alternative specific data requires a different kind of dataset Case-specific multinomial data simply requires a dependent variable indicating the option chosen 1 line of data per case Dependent variable coded 1=bus, 2=car, 3=train Alternative specific data requires multiple lines of data for each case One line of data for each possible outcome With information on variables like cost, travel time, etc. Plus a dummy variable indicating which of the outcomes was actually chosen.

Case vs Alternative Specific Data Example from Long and Freese 2006:294 Data from Greene & Hensher 1997 Case-specific data on travel modes: . list id mode income famsize 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. +--------------------------------+ | id mode income famsize | |--------------------------------| | 1 Car

35 1 | | 2 Car 30 2 | | 3 Car 40 1 | | 4 Car 70 3 | | 5 Car 45 2 | | 6 Train 20 1 | | 8 Car 12

1 | | 9 Car 40 1 | | 10 Car 70 2 | | 11 Car 15 2 | | 12 Car 35 2 | | 13 Car 50 4 | | 14 Car 40 1 | | 15 Car 26 4 |

Each line of data represents a case The dependent variable is coded in a single variable: Mode: Train = 1, Bus = 2, Car = 3 Case vs Alternative Specific Data Alternative-specific data on travel modes: . list id mode choice train bus car time cost income famsize, nolab sepby(id) 1. 2. 3. 4. 5. 6. 7. 8. 9. +--------------------------------------------------------------------------+ | id mode choice train bus car time cost income famsize | |--------------------------------------------------------------------------|

| 1 1 0 1 0 0 406 31 35 1 | | 1 2 0 0 1 0 452 25 35 1 | | 1 3 1 0 0 1 180 10

35 1 | |--------------------------------------------------------------------------| | 2 1 0 1 0 0 398 31 30 2 | | 2 2 0 0 1 0 452 25 30 2 | | 2 3 1 0 0

1 255 11 30 2 | |--------------------------------------------------------------------------| | 3 1 0 1 0 0 926 98 40 1 | | 3 2 0 0 1 0 917 53 40 1 | | 3 3

1 0 0 1 720 23 40 1 | Now there are 3 lines of data for each case one for each possible choice Another dummy (choice) indicates the one actually chosen Alternative specific variables (e.g., travel cost) varies for car, bus, train Analyzing alternative-specific data Conditional logit models can be used for models with alternative-specific data Stata: clogit But: case-specific variables must be manually entered as interactions between X and each choice Note: conditional logit with case & alternative specific data is called a McFaddens Choice Model Stata now has simple options for these models You dont have to create interaction variables asclogit alternative specific conditional logit McFaddens choice model Also: asmprobit alternative specific multinomial probit Example: McFaddens Choice Model Mode of Travel. Reference category = car

. asclogit choice time cost, casevars(income famsize) case(id) alternatives(mode) Alternative-specific conditional logit Case variable: id Number of obs Number of cases = = 456 152 Wald chi2(6) = 69.09 Log likelihood = -77.504846 Prob > chi2 = 0.0000 -----------------------------------------------------------------------------choice | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------mode | Alternative-specific time | -.0185035

.0025035 -7.39 0.000 -.0234103 -.0135966 variables have intuitive cost | -.0402791 .0134851 -2.99 0.003 -.0667095 -.0138488 -------------+---------------------------------------------------------------effects Train | income | -.0342841 .0158471 -2.16 0.031 -.0653438 -.0032243 All things being equal, famsize | -.0038421 .3098075 -0.01 0.990 -.6110537 people avoid.6033695 choices

_cons | 3.499641 .7579665 4.62 0.000 2.014054 4.985228 that are slow or costly -------------+---------------------------------------------------------------Bus | income | -.0080174 .0200322 -0.40 0.689 -.0472798 .031245 famsize | -.5141037 .4007015 -1.28 0.199 -1.299464 .2712569 _cons | 2.486465 .8803649 2.82 0.005 .7609815 4.211949 -------------+---------------------------------------------------------------Car |

(base alternative) McFaddens Choice Model Stata syntax: asclogit Ex: asclogit choice time cost, casevars(income famsize) case(id) alternatives(mode) Alternative specific variables are in main variable list Case-specific variables included in casevars option NOTE: A case ID variable must be specified alternative option identifies the alternatives Car vs train vs bus. McFaddens Choice Model Uses of alternative specific data 1. Political choice depends on characteristics of the person AND the candidate Case-specific variables: education, income Alternative-specific: Candidates characteristics OR Agreement/similarity between person & candidate Ex: dummies indicating similar views on abortion 2. Type of college you attend None vs community vs 4-year public vs 4-year private Case-specific: GPA, family income Alternative-specific: cost, selectivity of admissions Other examples?