Multilevel and Longitudinal Modeling Using Stata, Fourth Edition, by Sophia Rabe-Hesketh and Anders Skrondal, is a complete resource for learning to model data in which observations are grouped—whether those groups are formed by a nesting structure, such as children nested in classrooms, or formed by repeated observations on the same individuals. This text introduces random-effects models, fixed-effects models, mixed-effects models, marginal models, dynamic models, and growth-curve models, all of which account for the grouped nature of these types of data. As Rabe-Hesketh and Skrondal introduce each model, they explain when the model is useful, its assumptions, how to fit and evaluate the model using Stata, and how to interpret the results. With this comprehensive coverage, researchers who need to apply multilevel models will find this book to be the perfect companion. It is also the ideal text for courses in multilevel modeling because it provides examples from a variety of disciplines as well as end-of-chapter exercises that allow students to practice newly learned material.
The book comprises two volumes. Volume I focuses on linear models for continuous outcomes, while volume II focuses on generalized linear models for binary, ordinal, count, and other types of outcomes.
Volume I begins with a review of linear regression and then builds on this review to introduce two-level models, the simplest extensions of linear regression to models for multilevel and longitudinal/panel data. Rabe-Hesketh and Skrondal introduce the random-intercept model without covariates, developing the model from principles and thereby familiarizing the reader with terminology, summarizing and relating the widely used estimating strategies, and providing historical perspective. Once the authors have established the foundation, they smoothly generalize to random-intercept models with covariates and then to a discussion of the various estimators (between, within, and random effects). The authors also discuss models with random coefficients. The text then turns to models specifically designed for longitudinal and panel data—dynamic models, marginal models, and growth-curve models. The last portion of volume I covers models with more than two levels and models with crossed random effects.
The foundation and in-depth coverage of linear-model principles provided in volume I allow for a straightforward transition to generalized linear models for noncontinuous outcomes, which are described in volume II. This second volume begins with chapters introducing multilevel and longitudinal models for binary, ordinal, nominal, and count data. Focus then turns to survival analysis, introducing multilevel models for both discrete-time survival data and continuous-time survival data. The volume concludes by extending the two-level generalized linear models introduced in previous chapters to models with three or more levels and to models with crossed random effects.
In both volumes, readers will find extensive applications of multilevel and longitudinal models. Using many datasets that appeal to a broad audience, Rabe-Hesketh and Skrondal provide worked examples in each chapter. They also show the breadth of Stata’s commands for fitting the models discussed. They demonstrate Stata’s xt suite of commands (xtreg, xtlogit, xtpoisson, etc.), which is designed for two-level random-intercept models for longitudinal/panel data. They demonstrate the me suite of commands (mixed, melogit, mepoisson, etc.), which is designed for multilevel models, including those with random coefficients and those with three or more levels. In volume 2, they discuss gllamm, a community-contributed Stata command developed by Rabe-Hesketh and Skrondal that can fit many latent-variable models, of which the generalized linear mixed-effects model is a special case.The types of models fit by the xt commands, the me commands, and gllamm sometimes overlap; when this happens, the authors highlight the differences in syntax, data organization, and output for the commands. The authors also point out the strengths and weaknesses of these commands, based on considerations such as computational speed, accuracy, available predictions, and available postestimation statistics.
The fourth edition of Multilevel and Longitudinal Modeling Using Stata has been thoroughly revised and updated. In it, you will find new material on Kenward–Roger degrees-of-freedom adjustments for small sample sizes, difference-in-differences estimation for natural experiments, instrumental-variables estimation to account for level-one endogeneity, and Bayesian estimation for crossed-effects models. In addition, you will find new discussions of meologit, cmxtmixlogit, mestreg, menbreg, and other commands introduced in Stata since the third edition of the book.
In summary, Multilevel and Longitudinal Modeling Using Stata, Fourth Edition is the most complete, up-to-date depiction of Stata’s capacity for fitting models to multilevel and longitudinal data. Readers will also find thorough explanations of the methods and practical advice for using these techniques. This text is a great introduction for researchers and students wanting to learn about these powerful data analysis tools.
V MODELS FOR CATEGORICAL RESPONSES
DICHTOMOUS OR BINARY RESPONSES
Introduction
Single-level logit and probit regression models for dichotomous responses
Generalized linear model formulation
Labor-participation data
Estimation using logit
Estimation using glm
Latent-response formulation
Logistic regression
Probit regression
Estimation using probit
Which treatment is best for toenail infection?
Longitudinal data structure
Proportions and fitted population-averaged or marginal probabilities
Random-intercept logistic regression
Model specification
Reduced-form specification
Two-stage formulation
Model assumptions
Estimation
Using xtlogit
Using melogit
Using gllamm
Subject-specific or conditional vs. population-averaged or marginal relationships
Measures of dependence and heterogeneity
Conditional or residual intraclass correlation of the latent responses
Median odds ratio
Measures of association for observed responses at median fixed part of the model
Inference for random-intercept logistic models
Tests and confidence intervals for odds ratios
Tests of variance components
Maximum likelihood estimation
Adaptive quadrature
Some speed and accuracy considerations
Integration methods and number of quadrature points
Starting values
Using melogit and gllamm for collapsible data
Spherical quadrature in gllamm
Assigning values to random effects
Maximum “likelihood” estimation
Empirical Bayes prediction
Empirical Bayes modal prediction
Different kinds of predicted probabilities
Predicted population-averaged or marginal probabilities
Predicted subject-specific probabilities
Predictions for hypothetical subjects: Conditional probabilities
Predictions for the subjects in the sample: Posterior mean probabilities
Other approaches to clustered dichotomous data
Conditional logistic regression
Estimation using clogit
Generalized estimating equations (GEE)
Estimation using xtgee
Summary and further reading
Exercises
ORDINAL RESPONSES
Introduction
Single-level cumulative models for ordinal responses
Generalized linear model formulation
Latent-response formulation
Proportional odds
Identification
Are antipsychotic drugs effective for patients with schizophrenia?
Longitudinal data structure and graphs
Longitudinal data structure
Plotting cumulative proportions
Plotting cumulative sample logits and transforming the time scale
Single-level proportional-odds model
Model specification
Estimation using ologit
Random-intercept proportional-odds model
Model specification
Estimation using meologit
Estimation using gllamm
Measures of dependence and heterogeneity
Residual intraclass correlation of latent responses
Median odds ratio
Random-coefficient proportional odds model
Model specification
Estimation using meologit
Estimation using gllamm
Different kinds of predicted probabilities
Predicted population-averaged or marginal probabilities
Predicted subject-specific probabilities: Posterior mean
Do experts differ in their grading of student essays?
A random-intercept probit model with grader bias
Model specification
Estimation using gllamm
Including grader-specific measurement error variances
Model specification
Estimation using gllamm
Including grader-specific thresholds
Model specification
Estimation using gllamm
Other link functions
Cumulative complementary log-log model
Continuation-ratio logit model
Adjacent-category logit model
Baseline-category logit and stereotype models
Summary and further reading
Exercises
NOMINAL RESPONSES AND DISCRETE CHOICE
Introduction
Single-level models for nominal responses
Multinomial logit models
Transport data version 1
Estimation using mlogit
Conditional logit models with alternative-specific covariates
Transport data version 2: Expanded form
Estimation using clogit
Estimation using cmclogit
Conditional logit models with alternative- and unit-specific covariates
Estimation using clogit
Estimation using cmclogit
Independence from irrelevant alternatives
Utility-maximization formulation
Does marketing affect choice of yogurt?
Single-level conditional logit models
Conditional logit models with alternative-specific intercepts
Estimation using clogit
Estimation using cmclogit
Multilevel conditional logit models
Preference heterogeneity: Brand-specific random intercepts
Estimation using cmxtmixlogit
Estimation using gllamm
Response heterogeneity: Marketing variables with random coefficients
Estimation using cmxtmixlogit
Estimation using gllamm
Preference and response heterogeneity
Estimation using cmxtmixlogit
Estimation using gllamm
Prediction of random effects and response probabilities
Prediction of random effects and household-specific choice probabilities
Summary and further reading
Exercises
VI MODELS FOR COUNTS
COUNTS
Introduction
What are counts?
Counts versus proportions
Counts as aggregated event-history data
Single-level Poisson models for counts
Did the German health-care reform reduce the number of doctor visits?
Longitudinal data structure
Single-level Poisson regression
Model specification
Estimation using poisson
Estimation using glm
Random-intercept Poisson regression
Model specification
Measures of dependence and heterogeneity
Estimation
Using xtpoisson
Using mepoisson
Using gllamm
Random-coefficient Poisson regression
Model specification
Estimation using mepoisson
Estimation using gllamm
Overdispersion in single-level models
Normally distributed random intercept
Estimation using xtpoisson
Negative binomial models
Mean dispersion or NB2
Constant dispersion or NB1
Quasilikelihood
Estimation using glm
Level-1 overdispersion in two-level models
Random-intercept Poisson model with robust standard errors
Estimation using mepoisson
Three-level random-intercept model
Negative binomial models with random intercepts
Estimation using menbreg
The HHG model
Other approaches to two-level count data
Conditional Poisson regression
Estimation using xtpoisson, fe
Estimation using Poisson regression with dummy variables for clusters
Conditional negative binomial regression
Generalized estimating equations
Estimation using xtgee
Marginal and conditional effects when responses are MAR
Which Scottish counties have a high risk of lip cancer?
Standardized mortality ratios
Random-intercept Poisson regression
Model specification
Estimation using gllamm
Prediction of standardized mortality ratios
Nonparametric maximum likelihood estimation
Specification
Estimation using gllamm
Prediction
Summary and further reading
Exercises
VII MODELS FOR SURVIVAL OR DURATION DATA
Introduction to models for survival or duration data (part VII)
DISCRETE-TIME SURVIVAL
Introduction
Single-level models for discrete-time survival data
Discrete-time hazard and discrete-time survival
Promotions data
Data expansion for discrete-time survival analysis
Estimation via regression models for dichotomous responses
Estimation using logit
Including time-constant covariates
Estimation using logit
Including time-varying covariates
Estimation using logit
Multiple absorbing events and competing risks
Estimation using mlogit
Handling left-truncated data
How does birth history affect child mortality?
Data expansion
Proportional hazards and interval-censoring
Complementary log-log models
Marginal baseline hazard
Estimation using cloglog
Including covariates
Random-intercept complementary log-log model
Model specification
Estimation using mecloglog
Population-averaged or marginal vs. subject-specific or conditional survival probabilities
Summary and further reading
Exercises
CONTINUOUS-TIME SURVIVAL
Introduction
What makes marriages fail?
Hazards and survival
Proportional hazards models
Piecewise exponential model
Estimation using streg
Estimation using poisson
Cox regression model
Estimation using stcox
Cox regression via Poisson regression for expanded data
Approximate Cox regression: Poisson regression, smooth baseline hazard
Accelerated failure-time models
Log-normal model
Estimation using streg
Estimation using stintreg
Time-varying covariates
Estimation using streg
Does nitrate reduce the risk of angina pectoris?
Marginal modeling
Cox regression with occasion-specific dummy variables
Cox regression with occasion-specific baseline hazards
Approximate Cox regression
Multilevel proportional hazards models
Cox regression with gamma shared frailty
Estimation using stcox, shared
Approximate Cox regression with log-normal shared frailty
Approximate Cox regression with normal random intercept and coefficient
Multilevel accelerated failure-time models
Log-normal model with gamma shared frailty
Estimation using streg
Log-normal model with log-normal shared frailty
Estimation using mestreg
Log-normal model with normal random intercept and random coefficient
Fixed-effects approach
Stratified Cox regression with subject-specific baseline hazards
Different approaches to recurrent-event data
Counting process risk interval
Gap-time risk interval
Summary and further reading
Exercises
VIII MODELS WITH NESTED AND CROSSED RANDOM EFFECTS
MODELS WITH NESTED AND CROSSED RANDOM EFFECTS
Introduction
Did the Guatemalan immunization campaign work?
A three-level random-intercept logistic regression model
Model specification
Measures of dependence and heterogeneity
Types of residual intraclass correlations of the latent responses
Types of median odds ratios
Three-stage formulation
Estimation
Using gllamm
Using xtmelogit
A three-level random-coefficient logistic regression model
Estimation
Using gllamm
Using xtmelogit
Prediction of random effects
Empirical Bayes prediction
Empirical Bayes modal prediction
Different kinds of predicted probabilities
Predicted population-averaged or marginal probabilities: New clusters
Predicted median or conditional probabilities
Predicted posterior mean probabilities: Existing clusters
Do salamanders from different populations mate successfully?
Crossed random-effects logistic regression
Setup for estimating crossed random-effects model using melogit
Approximate maximum likelihood estimation
Bayesian estimation
Priors for the salamander data
Estimates compared
Fully Bayesian versus empirical Bayesian inference for random effects
Summary and further reading
Exercises