Multilevel and Longitudinal Modeling Using Stata, Fourth Edition, by Sophia Rabe-Hesketh and Anders Skrondal, is a complete resource for learning to model data in which observations are grouped—whether those groups are formed by a nesting structure, such as children nested in classrooms, or formed by repeated observations on the same individuals. This text introduces random-effects models, fixed-effects models, mixed-effects models, marginal models, dynamic models, and growth-curve models, all of which account for the grouped nature of these types of data. As Rabe-Hesketh and Skrondal introduce each model, they explain when the model is useful, its assumptions, how to fit and evaluate the model using Stata, and how to interpret the results. With this comprehensive coverage, researchers who need to apply multilevel models will find this book to be the perfect companion. It is also the ideal text for courses in multilevel modeling because it provides examples from a variety of disciplines as well as end-of-chapter exercises that allow students to practice newly learned material.
The book comprises two volumes. Volume I focuses on linear models for continuous outcomes, while volume II focuses on generalized linear models for binary, ordinal, count, and other types of outcomes.
Volume I begins with a review of linear regression and then builds on this review to introduce two-level models, the simplest extensions of linear regression to models for multilevel and longitudinal/panel data. Rabe-Hesketh and Skrondal introduce the random-intercept model without covariates, developing the model from principles and thereby familiarizing the reader with terminology, summarizing and relating the widely used estimating strategies, and providing historical perspective. Once the authors have established the foundation, they smoothly generalize to random-intercept models with covariates and then to a discussion of the various estimators (between, within, and random effects). The authors also discuss models with random coefficients. The text then turns to models specifically designed for longitudinal and panel data—dynamic models, marginal models, and growth-curve models. The last portion of volume I covers models with more than two levels and models with crossed random effects.
The foundation and in-depth coverage of linear-model principles provided in volume I allow for a straightforward transition to generalized linear models for noncontinuous outcomes, which are described in volume II. This second volume begins with chapters introducing multilevel and longitudinal models for binary, ordinal, nominal, and count data. Focus then turns to survival analysis, introducing multilevel models for both discrete-time survival data and continuous-time survival data. The volume concludes by extending the two-level generalized linear models introduced in previous chapters to models with three or more levels and to models with crossed random effects.
In both volumes, readers will find extensive applications of multilevel and longitudinal models. Using many datasets that appeal to a broad audience, Rabe-Hesketh and Skrondal provide worked examples in each chapter. They also show the breadth of Stata’s commands for fitting the models discussed. They demonstrate Stata’s xt suite of commands (xtreg, xtlogit, xtpoisson, etc.), which is designed for two-level random-intercept models for longitudinal/panel data. They demonstrate the me suite of commands (mixed, melogit, mepoisson, etc.), which is designed for multilevel models, including those with random coefficients and those with three or more levels. In volume 2, they discuss gllamm, a community-contributed Stata command developed by Rabe-Hesketh and Skrondal that can fit many latent-variable models, of which the generalized linear mixed-effects model is a special case.The types of models fit by the xt commands, the me commands, and gllamm sometimes overlap; when this happens, the authors highlight the differences in syntax, data organization, and output for the commands. The authors also point out the strengths and weaknesses of these commands, based on considerations such as computational speed, accuracy, available predictions, and available postestimation statistics.
The fourth edition of Multilevel and Longitudinal Modeling Using Stata has been thoroughly revised and updated. In it, you will find new material on Kenward–Roger degrees-of-freedom adjustments for small sample sizes, difference-in-differences estimation for natural experiments, instrumental-variables estimation to account for level-one endogeneity, and Bayesian estimation for crossed-effects models. In addition, you will find new discussions of meologit, cmxtmixlogit, mestreg, menbreg, and other commands introduced in Stata since the third edition of the book.
In summary, Multilevel and Longitudinal Modeling Using Stata, Fourth Edition is the most complete, up-to-date depiction of Stata’s capacity for fitting models to multilevel and longitudinal data. Readers will also find thorough explanations of the methods and practical advice for using these techniques. This text is a great introduction for researchers and students wanting to learn about these powerful data analysis tools.
List of tables
List of figures
Preface
Multilevel and longitudinal models: When and why?
I PRELIMINARIES
REVIEW OF LINEAR REGRESSION
Introduction
Is there gender discrimination in faculty salaries?
Independent-samples t test
One-way analysis of variance
Simple linear regression
Dummy variables
Multiple linear regression
Interactions
Dummy variables for more than two groups
Other types of interactions
Interaction between dummy variables
Interaction between continuous covariates
Nonlinear effects
Residual diagnostics
Causal and noncausal interpretations of regression coefficients
Regression as conditional expectation
Regression as structural model
Summary and further reading
Exercises
II TWO-LEVEL MODELS
VARIANCE-COMPONENTS MODELS
Introduction
How reliable are peak-expiratory-flow measurements?
Inspecting within-subject dependence
The variance-components model
Model specification
Path diagram
Between-subject heterogeneity
Within-subject dependence
Intraclass correlation
Intraclass correlation versus Pearson correlation
Estimation using Stata
Data preparation: Reshaping from wide form to long form
Using xtreg
Using xtmixed
Hypothesis tests and confidence intervals
Hypothesis test and confidence interval for the population mean
Hypothesis test and confidence interval for the between-cluster variance
Likelihood-ratio test
Score test
F test
Confidence interval
Model as data-generating mechanism
Fixed versus random effects
Crossed versus nested effects
Parameter estimation
Model assumptions
Mean structure and covariance structure
Distributional assumptions
Different estimation methods
Inference for β
Estimate and standard error: Balanced case
Estimate: Unbalanced case
Assigning values to the random intercepts
Maximum “likelihood” estimation
Implementation via OLS regression
Implementation via the mean total residual
Empirical Bayes prediction
Empirical Bayes standard errors
Posterior and comparative standard errors
Diagnostic standard errors
Accounting for uncertainty in β̂
Bayesian interpretation of REML estimation and prediction
Summary and further reading
Exercises
RANDOM-INTERCEPT MODELS WITH COVARIATES
Introduction
Does smoking during pregnancy affect birthweight?
Data structure and descriptive statistics
The linear random-intercept model with covariates
Model specification
Model assumptions
Mean structure
Residual covariance structure
Graphical illustration of random-intercept model
Estimation using Stata
Using xtreg
Using xtmixed
Coefficients of determination or variance explained
Hypothesis tests and confidence intervals
Hypothesis tests for individual regression coefficients
Joint hypothesis tests for several regression coefficients
Predicted means and confidence intervals
Hypothesis test for random-intercept variance
Between and within effects of level-1 covariates
Between-mother effects
Within-mother effects
Relations among estimators
Level-2 endogeneity and cluster-level confounding
Allowing for different within and between effects
Robust Hausman test
Fixed versus random effects revisited
Assigning values to random effects: Residual diagnostics
More on statistical inference
Overview of estimation methods
Pooled OLS
Feasible generalized least squares (FGLS)
ML by iterative GLS (IGLS)
ML by Newton–Raphson and Fisher scoring
ML by the expectation-maximization (EM) algorithm
REML
Consequences of using standard regression modeling for clustered data
Purely between-cluster covariate
Purely within-cluster covariate
Power and sample-size determination
Purely between-cluster covariate
Purely within-cluster covariate
Summary and further reading
Exercises
RANDOM-COEFFICIENT MODELS
Introduction
How effective are different schools?
Separate linear regressions for each school
Specification and interpretation of a random-coefficient model
Specification of a random-coefficient model
Interpretation of the random-effects variances and covariances
Estimation using xtmixed
Random-intercept model
Random-coefficient model
Testing the slope variance
Interpretation of estimates
Assigning values to the random intercepts and slopes
Maximum “likelihood” estimation
Empirical Bayes prediction
Model visualization
Residual diagnostics
Inferences for individual schools
Two-stage model formulation
Some warnings about random-coefficient models
Meaningful specification
Many random coefficients
Convergence problems
Lack of identification
Summary and further reading
Exercises
III MODELS FOR LONGITUDINAL AND PANEL DATA
Introduction to models for longitudinal and panel data (part III)
SUBJECT-SPECIFIC EFFECTS AND DYNAMIC MODELS
Introduction
Random-effects approach: No endogeneity
Fixed-effects approach: Level-2 endogeneity
De-meaning and subject dummies
De-meaning
Subject dummies
Hausman test
Mundlak approach and robust Hausman test
First-differencing
Difference-in-differences and repeated-measures ANOVA
Does raising the minimum wage reduce employment?
Repeated-measures ANOVA
Subject-specific coefficients
Random-coefficient model: No endogeneity
Fixed-coefficient model: Level-2 endogeneity
Hausman–Taylor: Level-2 endogeneity for level-1 and level-2 covariates
Instrumental-variable methods: Level-1 (and level-2) endogeneity
Conventional fixed-effects approach
Fixed-effects IV estimator
Random-effects IV estimator
More Hausman tests
Dynamic models
Dynamic model without subject-specific intercepts
Dynamic model with subject-specific intercepts
Missing data and dropout
Maximum likelihood estimation under MAR: A simulation
Summary and further reading
Exercises
MARGINAL MODELS
Introduction
Mean structure
Covariance structures
Unstructured covariance matrix
Random-intercept or compound symmetric/exchangeable structure
Random-coefficient structure
Autoregressive and exponential structures
Moving-average residual structure
Banded and Toeplitz structures
Hybrid and complex marginal models
Random effects and correlated level-1 residuals
Heteroskedastic level-1 residuals over occasions
Heteroskedastic level-1 residuals over groups
Different covariance matrices over groups
Comparing the fit of marginal models
Generalized estimating equations (GEE)
Marginal modeling with few units and many occasions
Is a highly organized labor market beneficial for economic growth?
Marginal modeling for long panels
Fitting marginal models for long panels in Stata
Summary and further reading
Exercises
GROWTH-CURVE MODELS
Introduction
How do children grow?
Observed growth trajectories
Models for nonlinear growth
Polynomial models
Fitting the models
Predicting the mean trajectory
Predicting trajectories for individual children
Piecewise linear models
Fitting the models
Predicting the mean trajectory
Two-stage model formulation
Heteroskedasticity
Heteroskedasticity at level 1
Heteroskedasticity at level 2
How does reading improve from kindergarten through third grade?
Growth-curve model as a structural equation model
Estimation using sem
Estimation using mixed
Summary and further reading
Exercises
IV MODELS WITH NESTED AND CROSSED RANDOM EFFECTS
HIGHER-LEVEL MODELS WITH NESTED RANDOM EFFECTS
Introduction
Do peak-expiratory-flow measurements vary between methods within subjects?
Inspecting sources of variability
Three-level variance-components models
Different types of intraclass correlation
Estimation using mixed
Empirical Bayes prediction
Testing variance components
Crossed versus nested random effects revisited
Does nutrition affect cognitive development of Kenyan children?
Describing and plotting three-level data
Data structure and missing data
Level-1 variables
Level-2 variables
Level-3 variables
Plotting growth trajectories
Three-level random-intercept model
Model specification: Reduced form
Model specification: Three-stage formulation
Estimation using mixed
Three-level random-coefficient models
Random coefficient at the child level
Estimation using mixed
Random coefficient at the child and school levels
Estimation using mixed
Residual diagnostics and predictions
Summary and further reading
Exercises
CROSSED RANDOM EFFECTS
Introduction
How does investment depend on expected profit and capital stock?
A two-way error-components model
Model specification
Residual variances, covariances, and intraclass correlations
Longitudinal correlations
Cross-sectional correlations
Estimation using mixed
Prediction
How much do primary and secondary schools affect attainment at age 16?
Data structure
Additive crossed random-effects model
Specification
Intraclass correlations
Estimation using mixed
Crossed random-effects model with random interaction
Model specification
Intraclass correlations
Estimation using mixed
Testing variance components
Some diagnostics
A trick requiring fewer random effects
Summary and further reading
Exercises