Regression Models for Categorical Dependent Variables Using Stata, Third Edition, by J. Scott Long and Jeremy Freese, is an essential reference for those who use Stata to fit and interpret regression models for categorical data. Although regression models for categorical dependent variables are common, few texts explain how to interpret such models; this text decisively fills the void.
The third edition is divided into two parts. Part I begins with an excellent introduction to Stata and follows with general treatments of the estimation, testing, fitting, and interpretation of models for categorical dependent variables. The book is thus accessible to new users of Stata and those who are new to categorical data analysis. Part II is devoted to a comprehensive treatment of estimation and interpretation for binary, ordinal, nominal, and count outcomes.
Readers familiar with previous editions will find many changes in the third edition. An entire chapter is now devoted to interpretation of regression models using predictions. This concept is explored in greater depth in Part II. The authors also discuss how many improvements made to Stata in recent years—factor variables, marginal effects with margins, plotting predictions using marginsplot—facilitate analysis of categorical data.
The authors advocate a variety of new methods that use predictions to interpret the effect of variables in regression models. Readers will find all discussion of statistical concepts firmly grounded in concrete examples. All the examples, datasets, and author-written commands are available on the authors’ website, so readers can easily replicate the examples with Stata.
Examples in the new edition also illustrate changes to the authors’ popular SPost commands after a recent rewrite inspired by the authors’ evolving views on interpretation. Readers will note that SPost now takes full advantage of the power of the margins command and the flexibility of factor-variable notation. Long and Freese also provide a suite of new commands, including mchange, mtable, and mgen. These commands complement margins, aiding model interpretation, hypothesis testing, and model diagnostics. They offer the same syntactical convenience features that users of Stata expect, for example including powers or interactions of covariates in regression models and seamlessly working with complex survey data. The authors also discuss how to use these commands to estimate marginal effects, either averaged over the sample or evaluated at fixed values of the regressors.
The third edition of Regression Models for Categorical Dependent Variables Using Stata continues to provide the same high-quality, practical tutorials of previous editions. It also offers significant improvements over previous editions—new content, updated information about Stata, and updates to the authors’ own commands. This book should be on the bookshelf of every applied researcher analyzing categorical data and is an invaluable learning resource for students and others who are new to categorical data analysis.
List of figures
PART I GENERAL INFORMATION
1. INTRODUCTION
What is this book about?
Which models are considered?
Whom is this book for?
How is the book organized?
The SPost software
Updating Stata
Installing SPost13
Uninstalling SPost9
Installing SPost13 using search
Installing SPost13 using net install
Uninstalling SPost13
Sample do-files and datasets
Installing the spost13_do package
Using spex to load data and run examples
Getting help with SPost
What if an SPost command does not work?
Getting help from the authors
What we need to help you
Where can I learn more about the models?
2. INTRODUCTION TO STATA
The Stata interface
Abbreviations
Getting help
Online help
PDF manuals
Error messages
Asking for help
Other resources
The working directory
Stata file types
Saving output to log files
Using and saving datasets
Data in Stata format
Data in other formats
Entering data by hand
Size limitations on datasets
Do-files
Adding comments
Long lines
Stopping a do-file while it is running
Creating do-files
Recommended structure for do-files
Using Stata for serious data analysis
Syntax of Stata commands
Commands
Variable lists
if and in qualifiers
Options
Managing data
Looking at your data
Getting information about variables
Missing values
Selecting observations
Selecting variables
Creating new variables
The generate command
The replace command
The recode command
Labeling variables and values
Variable labels
Value labels
The notes command
Global and local macros
Loops using foreach and forvalues
Graphics
The graph command
A brief tutorial
A do-file template
Conclusion
3. ESTIMATION, TESTING, AND FIT
Estimation
Stata’s output for ML estimation
ML and sample size
Problems in obtaining ML estimates
Syntax of estimation commands
Variable lists
Using factor-variable notation in the variable list
Specifying interaction and polynomials
More on factor-variable notation
Specifying the estimation sample
Missing data
Information about missing values
Postestimation commands and the estimation sample
Weights and survey data
Complex survey designs
Options for regression models
Robust standard errors
Reading the estimation output
Storing estimation results
(Advanced) Saving estimates to a file
Reformatting output with estimates table
Testing
One-tailed and two-tailed tests
Wald and likelihood-ratio tests
Wald tests with test and testparm
LR tests with lrtest
Avoiding invalid LR tests
Measures of fit
Syntax of fitstat
Methods and formulas used by fitstat
Example of fitstat
estat postestimation commands
Conclusion
4. METHODS OF INTERPRETATION
Comparing linear and nonlinear models
Approaches to interpretation
Method of interpretation based on predictions
Method of interpretation using parameters
Stata and SPost commands for interpretation
Predictions for each observation
Predictions at specified values
Why use the m* commands instead of margins?
Using margins for predictions
Predictions using interaction and polynomial terms
Making multiple predictions
Predictions for groups defined by levels of categorical variables
(Advanced) Nondefault predictions using margins
The predict() option
The expression() option
Tables of predictions using mtable
mtable with categorical and count outcomes
(Advanced) Combining and formatting tables using mtable
Marginal effects: Changes in predictions
Marginal effects using margins
Marginal effects using mtable
Posting predictions and using mlincom
Marginal effects using mchange
Plotting predictions
Plotting predictions with marginsplot
Plotting predictions using mgen
Interpretation of parameters
The listcoef command
Standardized coefficients
Factor and percentage change coefficients
Next steps
PART II MODELS FOR SPECIFIC KINDS OF OUTCOMES
5. MODELS FOR BINARY OUTCOMES: ESTIMATION, TESTING, AND FIT
The statistical model
A latent-variable model
A nonlinear probability model
Estimation using logit and probit commands
Example of logit model
Comparing logit and probit
(Advanced) Observations predicted perfectly
Hypothesis testing
Testing individual coefficients
Testing multiple coefficients
Comparing LR and Wald tests
Predicted probabilities, residuals, and influential observations
Predicted probabilities using predict
Residuals and influential observations using predict
Least likely observations
Measures of fit
Information criteria
Pseudo-R²’s
(Advanced) Hosmer–Lemeshow statistic
Other commands for binary outcomes
Conclusion
6. MODELS FOR BINARY OUTCOMES: INTERPRETATION
Interpretation using regression coefficients
Interpretation using odds ratios
(Advanced) Interpretation using y*
Marginal effects: Changes in probabilities
Linked variables
Summary measures of change
MEMs and MERs
AMEs
Standard errors of marginal effects
Should you use the AME, the MEM, or the MER?
Examples of marginal effects
AMEs for continuous variables
AMEs for factor variables
Summary table of AMEs
Marginal effects for subgroups
MEMs and MERs
Marginal effects with powers and interactions
The distribution of marginal effects
(Advanced) Algorithm for computing the distribution of effects
Ideal types
Using local means with ideal types
Comparing ideal types with statistical tests
(Advanced) Using macros to test differences between ideal types
Marginal effects for ideal types
Tables of predicted probabilities
Second differences comparing marginal effects
Graphing predicted probabilities
Using marginsplot
Using mgen with the graph command
Graphing multiple predictions
Overlapping confidence intervals
Adding power terms and plotting predictions
(Advanced) Graphs with local means
Conclusion
7. MODELS FOR ORDINAL OUTCOMES
The statistical model
A latent-variable model
A nonlinear probability model
Estimation using ologit and oprobit
Example of ordinal logit mdel
Predicting perfectly
Hypothesis testing
Testing individual coefficients
Testing multiple coefficients
Measures of fit using fitstat
(Advanced) Converting to a different parameterization
The parallel regression assumption
Testing the parallel regression assumption using oparallel
Testing the parallel regression assumption using brant
Caveat regarding the parallel regression assumption
Overview of interpretation
Interpreting transformed coefficients
Marginal change in y*
Odds ratios
Interpretations based on predicted probabilities
Predicted probabilities with predict
Marginal effects
Plotting marginal effects
Marginal effects for a quick overview
Predicted probabilities for ideal types
(Advanced) Testing differences between ideal types
Tables of predicted probabilities
Plotting predicted probabilities
Probability plots and marginal effects
Less common models for ordinal outcomes
The stereotype logistical model
The generalized ordered logit model
(Advanced) Predictions without using factor-variable notation
The sequential logit model
Conclusion
8. MODELS FOR NOMINAL OUTCOMES
The multinomial logit model
Formal statement of the model
Estimation using the mlogit command
Weights and complex samples
Options
Examples of MNLM
Selecting different base outcomes
Predicting perfectly
Hypothesis testing
mlogtest for tests of the MNLM
Testing the effects of the independent variables
Tests for combining alternatives
Independence of irrelevant alternatives
Hausman-McFadden test of IIA
Small-Hsiao test of IIA
Measures of fit
Overview of interpretation
Predicted probabilities with predict
Marginal effects
(Advanced) The distribution of marginal effects
Tables of predicted probabilities
(Advanced) Testing second differences
(Advanced) Predictions using local means and subsamples
Graphing predicted probabilities
Odds ratios
Listing odds ratios with listcoef
Plotting odds ratios
(Advanced) Additional models for nominal outcomes
Stereotype logistic regression
Conditional logit model
Multinomial probit model with IIA
Alternative-specific multinomial probit
Rank-ordered logit model
Conclusion
9. MODELS FOR COUNT OUTCOMES
The Poisson distribution
Fitting the Poisson distribution with the poisson command
Compaing observed and predicted counts with mgen
The Poisson regression model
Estimation using poisson
Example of the PRM
Factor and percentage changes in E(y | x)
Example of factor and percentage change
Marginal effects on E(y | x)
Examples of marginal effects
Interpretation using predicted probabilities
Predicted probabilities using mtable and mchange
Treating a count independent variable as a factor variable
Predicted probabilities using mgen
Comparing observed and predicted counts to evaluate model specification
(Advanced) Exposure time
The negative binomial regression model
Estimation using nbreg
NB1 and NB2 variance functions
Example of NBRM
Testing for overdispersion
Comparing the PRM and NBRM using estimates table
Robust standard errors
Interpretation using E(y | x)
Interpretation using predicted probabilities
Models for truncated counts
Estimation using tpoisson and tnbreg
Example of zero-truncated model
Interpretation using E(y | x)
Predictions in the estimation sample
Interpretation using predicted rates and probabilities
(Advanced) The hurdle regression model
Fitting the hurdle model
Predictions in the sample
Predictions at user-specified values
Warning regarding sample specification
Zero-inflated count models
Estimation using zinb and zip
Example of zero-inflated models
Interpretation of coefficients
Interpretation of predicted probabilities
Predicted probabilities with mtable
Plotting predicted probabilities with mgen
Comparisons among count models
Comparing mean probabilities
Tests to compare count models
Using countfit to compare count models
Conclusion