Regression Models for Categorical Dependent Variables using Stata

Regression Models for Categorical Dependent Variables Using Stata, Third Edition, by J. Scott Long and Jeremy Freese, is an essential reference for those who use Stata to fit and interpret regression models for categorical data. Although regression models for categorical dependent variables are common, few texts explain how to interpret such models; this text decisively fills the void.

 

The third edition is divided into two parts. Part I begins with an excellent introduction to Stata and follows with general treatments of the estimation, testing, fitting, and interpretation of models for categorical dependent variables. The book is thus accessible to new users of Stata and those who are new to categorical data analysis. Part II is devoted to a comprehensive treatment of estimation and interpretation for binary, ordinal, nominal, and count outcomes.

 

Readers familiar with previous editions will find many changes in the third edition. An entire chapter is now devoted to interpretation of regression models using predictions. This concept is explored in greater depth in Part II. The authors also discuss how many improvements made to Stata in recent years—factor variables, marginal effects with margins, plotting predictions using marginsplot—facilitate analysis of categorical data.

 

The authors advocate a variety of new methods that use predictions to interpret the effect of variables in regression models. Readers will find all discussion of statistical concepts firmly grounded in concrete examples. All the examples, datasets, and author-written commands are available on the authors’ website, so readers can easily replicate the examples with Stata.

Examples in the new edition also illustrate changes to the authors’ popular SPost commands after a recent rewrite inspired by the authors’ evolving views on interpretation. Readers will note that SPost now takes full advantage of the power of the margins command and the flexibility of factor-variable notation. Long and Freese also provide a suite of new commands, including mchange, mtable, and mgen. These commands complement margins, aiding model interpretation, hypothesis testing, and model diagnostics. They offer the same syntactical convenience features that users of Stata expect, for example including powers or interactions of covariates in regression models and seamlessly working with complex survey data. The authors also discuss how to use these commands to estimate marginal effects, either averaged over the sample or evaluated at fixed values of the regressors.

The third edition of Regression Models for Categorical Dependent Variables Using Stata continues to provide the same high-quality, practical tutorials of previous editions. It also offers significant improvements over previous editions—new content, updated information about Stata, and updates to the authors’ own commands. This book should be on the bookshelf of every applied researcher analyzing categorical data and is an invaluable learning resource for students and others who are new to categorical data analysis.

List of figures

 

PART I GENERAL INFORMATION

 

1. INTRODUCTION

What is this book about?
Which models are considered?
Whom is this book for?
How is the book organized?
The SPost software

Updating Stata
Installing SPost13

Uninstalling SPost9
Installing SPost13 using search
Installing SPost13 using net install

Uninstalling SPost13

Sample do-files and datasets

Installing the spost13_do package
Using spex to load data and run examples

Getting help with SPost

What if an SPost command does not work?
Getting help from the authors

What we need to help you

Where can I learn more about the models?

 

2. INTRODUCTION TO STATA

The Stata interface
Abbreviations
Getting help

Online help
PDF manuals
Error messages
Asking for help
Other resources

The working directory
Stata file types
Saving output to log files
Using and saving datasets

Data in Stata format
Data in other formats
Entering data by hand

Size limitations on datasets
Do-files

Adding comments
Long lines
Stopping a do-file while it is running
Creating do-files
Recommended structure for do-files

Using Stata for serious data analysis
Syntax of Stata commands

Commands
Variable lists
if and in qualifiers
Options

Managing data

Looking at your data
Getting information about variables
Missing values
Selecting observations
Selecting variables

Creating new variables

The generate command
The replace command
The recode command

Labeling variables and values

Variable labels
Value labels
The notes command

Global and local macros
Loops using foreach and forvalues
Graphics

The graph command

A brief tutorial
A do-file template
Conclusion

 

3. ESTIMATION, TESTING, AND FIT

Estimation

Stata’s output for ML estimation
ML and sample size
Problems in obtaining ML estimates
Syntax of estimation commands
Variable lists

Using factor-variable notation in the variable list
Specifying interaction and polynomials
More on factor-variable notation

Specifying the estimation sample

Missing data
Information about missing values
Postestimation commands and the estimation sample

Weights and survey data

Complex survey designs

Options for regression models
Robust standard errors
Reading the estimation output
Storing estimation results

(Advanced) Saving estimates to a file

Reformatting output with estimates table

Testing

One-tailed and two-tailed tests
Wald and likelihood-ratio tests
Wald tests with test and testparm
LR tests with lrtest

Avoiding invalid LR tests

Measures of fit

Syntax of fitstat
Methods and formulas used by fitstat
Example of fitstat

estat postestimation commands
Conclusion

 

4. METHODS OF INTERPRETATION

Comparing linear and nonlinear models
Approaches to interpretation

Method of interpretation based on predictions
Method of interpretation using parameters
Stata and SPost commands for interpretation

Predictions for each observation
Predictions at specified values

Why use the m* commands instead of margins?
Using margins for predictions

Predictions using interaction and polynomial terms
Making multiple predictions
Predictions for groups defined by levels of categorical variables

(Advanced) Nondefault predictions using margins

The predict() option
The expression() option

Tables of predictions using mtable

mtable with categorical and count outcomes
(Advanced) Combining and formatting tables using mtable

Marginal effects: Changes in predictions

Marginal effects using margins
Marginal effects using mtable
Posting predictions and using mlincom
Marginal effects using mchange

Plotting predictions

Plotting predictions with marginsplot
Plotting predictions using mgen

Interpretation of parameters

The listcoef command
Standardized coefficients
Factor and percentage change coefficients

Next steps

 

PART II MODELS FOR SPECIFIC KINDS OF OUTCOMES

 

5. MODELS FOR BINARY OUTCOMES: ESTIMATION, TESTING, AND FIT

The statistical model

A latent-variable model
A nonlinear probability model

Estimation using logit and probit commands

Example of logit model
Comparing logit and probit
(Advanced) Observations predicted perfectly

Hypothesis testing

Testing individual coefficients
Testing multiple coefficients
Comparing LR and Wald tests

Predicted probabilities, residuals, and influential observations

Predicted probabilities using predict
Residuals and influential observations using predict
Least likely observations

Measures of fit

Information criteria
Pseudo-R²’s
(Advanced) Hosmer–Lemeshow statistic

Other commands for binary outcomes
Conclusion

 

6. MODELS FOR BINARY OUTCOMES: INTERPRETATION

Interpretation using regression coefficients

Interpretation using odds ratios
(Advanced) Interpretation using y*

Marginal effects: Changes in probabilities

Linked variables
Summary measures of change

MEMs and MERs
AMEs
Standard errors of marginal effects

Should you use the AME, the MEM, or the MER?
Examples of marginal effects

AMEs for continuous variables
AMEs for factor variables
Summary table of AMEs
Marginal effects for subgroups
MEMs and MERs
Marginal effects with powers and interactions

The distribution of marginal effects
(Advanced) Algorithm for computing the distribution of effects

Ideal types

Using local means with ideal types
Comparing ideal types with statistical tests
(Advanced) Using macros to test differences between ideal types
Marginal effects for ideal types

Tables of predicted probabilities
Second differences comparing marginal effects
Graphing predicted probabilities

Using marginsplot
Using mgen with the graph command
Graphing multiple predictions
Overlapping confidence intervals
Adding power terms and plotting predictions
(Advanced) Graphs with local means

Conclusion

 

7. MODELS FOR ORDINAL OUTCOMES

The statistical model

A latent-variable model
A nonlinear probability model

Estimation using ologit and oprobit

Example of ordinal logit mdel
Predicting perfectly

Hypothesis testing

Testing individual coefficients
Testing multiple coefficients

Measures of fit using fitstat
(Advanced) Converting to a different parameterization
The parallel regression assumption

Testing the parallel regression assumption using oparallel
Testing the parallel regression assumption using brant
Caveat regarding the parallel regression assumption

Overview of interpretation
Interpreting transformed coefficients

Marginal change in y*
Odds ratios

Interpretations based on predicted probabilities
Predicted probabilities with predict
Marginal effects

Plotting marginal effects
Marginal effects for a quick overview

Predicted probabilities for ideal types

(Advanced) Testing differences between ideal types

Tables of predicted probabilities
Plotting predicted probabilities
Probability plots and marginal effects
Less common models for ordinal outcomes

The stereotype logistical model
The generalized ordered logit model
(Advanced) Predictions without using factor-variable notation
The sequential logit model

Conclusion

 

8. MODELS FOR NOMINAL OUTCOMES

The multinomial logit model

Formal statement of the model

Estimation using the mlogit command

Weights and complex samples
Options

Examples of MNLM
Selecting different base outcomes
Predicting perfectly

Hypothesis testing

mlogtest for tests of the MNLM
Testing the effects of the independent variables
Tests for combining alternatives

Independence of irrelevant alternatives

Hausman-McFadden test of IIA
Small-Hsiao test of IIA

Measures of fit
Overview of interpretation
Predicted probabilities with predict
Marginal effects

(Advanced) The distribution of marginal effects

Tables of predicted probabilities

(Advanced) Testing second differences
(Advanced) Predictions using local means and subsamples

Graphing predicted probabilities
Odds ratios

Listing odds ratios with listcoef
Plotting odds ratios

(Advanced) Additional models for nominal outcomes

Stereotype logistic regression
Conditional logit model
Multinomial probit model with IIA
Alternative-specific multinomial probit
Rank-ordered logit model

Conclusion

 

9. MODELS FOR COUNT OUTCOMES

The Poisson distribution

Fitting the Poisson distribution with the poisson command
Compaing observed and predicted counts with mgen

The Poisson regression model

Estimation using poisson

Example of the PRM

Factor and percentage changes in E(y | x)

Example of factor and percentage change

Marginal effects on E(y | x)

Examples of marginal effects

Interpretation using predicted probabilities

Predicted probabilities using mtable and mchange
Treating a count independent variable as a factor variable
Predicted probabilities using mgen

Comparing observed and predicted counts to evaluate model specification
(Advanced) Exposure time

The negative binomial regression model

Estimation using nbreg

NB1 and NB2 variance functions

Example of NBRM
Testing for overdispersion
Comparing the PRM and NBRM using estimates table
Robust standard errors
Interpretation using E(y | x)
Interpretation using predicted probabilities

Models for truncated counts

Estimation using tpoisson and tnbreg

Example of zero-truncated model

Interpretation using E(y | x)
Predictions in the estimation sample
Interpretation using predicted rates and probabilities

(Advanced) The hurdle regression model

Fitting the hurdle model
Predictions in the sample
Predictions at user-specified values
Warning regarding sample specification

Zero-inflated count models

Estimation using zinb and zip
Example of zero-inflated models
Interpretation of coefficients
Interpretation of predicted probabilities

Predicted probabilities with mtable
Plotting predicted probabilities with mgen

Comparisons among count models

Comparing mean probabilities
Tests to compare count models
Using countfit to compare count models

Conclusion

Author: J. Scott Long e Jeremy Freese
Edition: Third Edition
ISBN-13: 978-1-59718-111-2
©Copyright: 2014
Versione e-Book disponibile

Regression Models for Categorical Dependent Variables Using Stata, Third Edition, by J. Scott Long and Jeremy Freese, is an essential reference for those who use Stata to fit and interpret regression models for categorical data. Although regression models for categorical dependent variables are common, few texts explain how to interpret such models; this text decisively fills the void.