TStat - O F F I C I A L

NOVITA'/TESTI

Contents of Generalized Linear Models and Extensions, Third Edition

by James W. Hardin and Joseph M. Hilbe

Generalized linear models (GLMs) extend linear regression to models with a non-Gaussian, or even discrete, response. GLM theory is predicated on the exponential family of distributions—a class so rich that it includes the commonly used logit, probit, and Poisson models. Although one can fit these models in Stata by using specialized commands (for example, logit for logit models), fitting them as GLMs with Stata’s glm command offers some advantages. For example, model diagnostics may be calculated and interpreted similarly regardless of the assumed distribution.

This text thoroughly covers GLMs, both theoretically and computationally, with an emphasis on Stata. The theory consists of showing how the various GLMs are special cases of the exponential family, showing general properties of this family of distributions, and showing the derivation of maximum likelihood (ML) estimators and standard errors. Hardin and Hilbe show how iteratively reweighted least squares, another method of parameter estimation, are a consequence of ML estimation using Fisher scoring. The authors also discuss different methods of estimating standard errors, including robust methods, robust methods with clustering, Newey–West, outer product of the gradient, bootstrap, and jackknife. The thorough coverage of model diagnostics includes measures of influence such as Cook’s distance, several forms of residuals, the Akaike and Bayesian information criteria, and various R2-type measures of explained variability.

After presenting general theory, Hardin and Hilbe then break down each distribution. Each distribution has its own chapter that explains the computational details of applying the general theory to that particular distribution. Pseudocode plays a valuable role here, because it lets the authors describe computational algorithms relatively simply. Devoting an entire chapter to each distribution (or family, in GLM terms) also allows for the inclusion of real-data examples showing how Stata fits such models, as well as presenting certain diagnostics and analytical strategies that are unique to that family. The chapters on binary data and on count (Poisson) data are excellent in this regard. Hardin and Hilbe give ample attention to the problems of overdispersion and zero inflation in count-data models.

The final part of the text concerns extensions of GLMs, which come in three forms. First, the authors cover multinomial responses, both ordered and unordered. Although multinomial responses are not strictly a part of GLM, the theory is similar in that one can think of a multinomial response as an extension of a binary response. The examples presented in these chapters often use the authors’ own Stata programs, augmenting official Stata’s capabilities. Second, GLMs may be extended to clustered data through generalized estimating equations (GEEs), and one chapter covers GEE theory and examples. Finally, GLMs may be extended by programming one’s own family and link functions for use with Stata’s official glm command, and the authors detail this process.

In addition to other enhancements—for example, a new section on marginal effects—the third edition contains several new extended GLMs, giving Stata users new ways to capture the complexity of count data. New count models include a three-parameter negative binomial known as NB-P, Poisson inverse Gaussian (PIG), zero-inflated generalized Poisson (ZIGP), a rewritten generalized Poisson, two- and three-component finite mixture models, and a generalized censored Poisson and negative binomial. This edition has a new chapter on simulation and data synthesis, but also shows how to construct a wide variety of synthetic and Monte Carlo models throughout the book.

Table of Contents

List of Tables

List of Figures

Preface

Introduction
1. Origins and motivation
2. Notational conventions
3. Applied or theoretical?
4. Road map
5. Installing the support materials

I Foundations of Generalized Linear Models

Generalized Linear Models
1. Components
2. Assumptions
3. Exponential family
4. Example: Using an offset in a GLM
5. Summary

GLM estimation algorithms
1. Newton–Raphson (using the observed Hessian)
2. Starting values for Newton–Raphson
3. IRLS (using the expected Hessian)
4. Starting values for IRLS
5. Goodness of fit
6. Estimated variance matrices
  1. Hessian
  2. Outer product of the gradient
  3. Sandwich
  4. Modified sandwich
  5. Unbiased sandwich
  6. Modified unbiased sandwich
  7. Weighted sandwich: Newey-West
  8. Jackknife
    1. Usual jackknife
    2. One-step jackknife
    3. Weighted jackknife
    4. Variable jackknife
  9. Bootstrap
    1. Usual bootstrap
    2. Grouped bootstrap
7. Estimation algorithms
8. Summary

Analysis of fit
1. Deviance
2. Diagnostics
  1. Cook's distance
  2. Overdispersion
3. Assessing the link function
4. Residual analysis
  1. Response residuals
  2. Working residuals
  3. Pearson residuals
  4. Partial residuals
  5. Anscombe residuals
  6. Deviance residuals
  7. Adjusted deviance residuals
  8. Likelihood residuals
  9. Score residuals
5. Checks for systematic departure from the model
6. Model statistics
  1. Criterion measures
    1. AIC
    2. BIC
  2. The interpretation of R2 in linear regression
    1. Percent variance explained
    2. The ratio of variances
    3. A transformation of the likelihood ratio
    4. A transformation of the F test
    5. Squared correlation
  3. Generalizations of linear regression R2 interpretations
    1. Efron's pseudo-R2
    2. McFadden's likelihood-ratio index
    3. Ben-Akiva and Lerman adjusted likelihood-ratio index
    4. McKelvey and Zavoina ratio of variances
    5. Transformation of likelihood ratio
    6. Cragg and Uhler normed measure
  4. More R2 measures
    1. The count R2
    2. The adjusted count R2
    3. Veall and Zimmermann R2
    4. Cameron–Windmeijer R2
7. Marginal effects
Data Synthesis

Generating correlated data
Generating data from a specified population

Generating data for linear regression
Generating data for logistic regression
Generating data for probit regression
Generating data for cloglog regression
Generating data for Gaussian variance and log link
Generating underdispersed count data

Simulation

Heteroskedasticity in linear regression
Power analysis
Comparing fit of Poisson and negative binomial
Effect of omitted covariate on R2Efron in Poisson regression

II Continuous Response Models

The Gaussian family
1. Derivation of the GLM Gaussian family
2. Derivation in terms of the mean
3. IRLS GLM algorithm (nonbinomial)
4. ML estimation
5. GLM log-normal models
6. Expected versus observed information matrix
7. Other Gaussian links
8. Example: Relation to OLS
9. Example: Beta-carotene

The gamma family
1. Derivation of the gamma model
2. Example: Reciprocal link
3. Maximum likelihood estimation
4. Log-gamma models
5. Identity-gamma models
6. Using the gamma model for survival analysis

The inverse Gaussian family
1. Derivation of the inverse Gaussian model
2. The inverse Gaussian algorithm
3. Maximum likelihood algorithm
4. Example: The canonical inverse Gaussian
5. Non-canonical links

The power family and link
1. Power links
2. Example: Power link
3. The power family

III Binomial Response Models

The binomial-logit family
1. Derivation of the binomial model
2. Derivation of the Bernoulli model
3. The binomial regression algorithm
4. Example: Logistic regression
  1. Model producing logistic coefficients: The heart data
  2. Model producing logistic odds ratios
5. GOF statistics
6. Interpretation of parameter estimates

The general binomial family
1. Non-canonical binomial models
2. Non-canonical binomial links (binary form)
3. The probit model
4. The clog-log and log-log models
5. Other links
6. Interpretation of coefficients
  1. Identity link
  2. Logit link
  3. Log link
  4. Log complement link
  5. Summary
7. Generalized binomial regression

The problem of overdispersion
1. Overdispersion
2. Scaling of standard errors
3. Williams' procedure
4. Robust standard errors

IV Count Response Models

The Poisson family
1. Count response regression models
2. Derivation of the Poisson algorithm
3. Poisson regression: Examples
4. Example: Testing overdispersion in the Poisson model
5. Using the Poisson model for survival analysis
6. Using offsets to compare models
7. Interpretation of coefficients

The negative binomial family
1. Constant overdispersion
2. Variable overdispersion
  1. Derivation in terms of a Poisson–gamma mixture
  2. Derivation in terms of the negative binomial probability function
  3. The canonical link negative binomial parameterization
3. The log-negative binomial parameterization
4. Negative binomial examples
5. The geometric family
6. Interpretation of coefficients

Other count data models
1. Count response regression models
2. Zero-truncated models
3. Zero-inflated models
4. Hurdle models
5. Negative binomial(P) models
6. Heterogeneous negative binomial models
7. Generalized Poisson regression models
8. Poisson inverse Gaussian models
9. Censored count response models
10. Finite mixture models

V Multinomial Response Models

The ordered response family
1. Interpretation of coefficients: Single binary predictor
2. Ordered outcomes for general link
3. Ordered outcomes for specific links
  1. Ordered logit
  2. Ordered probit
  3. Ordered clog-log
  4. Ordered log-log
  5. Ordered cauchit
4. Generalized ordered outcome models
5. Example: Synthetic data
6. Example: Automobile data
7. Partial proportional-odds models
8. Continuation ratio models

Unordered response family

The multinomial logit model

1. The multinomial probit model
  1. Example: A comparison of the models
  2. Example: Comparing probit and multinomial probit
  3. Example: Concluding remarks

VI Extensions to the GLM

Extending the likelihood
1. The quasi-likelihood
2. Example: Wedderburn's leaf blotch data
3. Generalized additive models

Clustered data
1. Generalization from individual to clustered data
2. Pooled estimators
3. Fixed effects
4. Random effects
6. GEEs
7. Other models

VII Stata Software

Programs for Stata
1. The glm command
  1. Syntax
  2. Description
  3. Options
2. The predict command after glm
  1. Syntax
  2. Options
3. User-written programs
  1. Global macros available for user-written programs
  2. User-written variance functions
  3. User-written programs for link functions
  4. User-written programs for Newey-West weights
4. Remarks
  1. Equivalent commands
  2. Special comments on family(Gaussian) models
  3. Special comments on family(binomial) models
  4. Special comments on family(nbinomial) models
  5. Special comment on family(gamma) link(log) models

A Tables

References

Author index

Subject index

via Rettangolo, 12/14 - 67039 - Sulmona (AQ) - Italia