An Introduction to Survival Analysis Using Stata, Revised Third Edition is the ideal tutorial for professional data analysts who want to learn survival analysis for the first time or who are well versed in survival analysis but are not as dexterous in using Stata to analyze survival data. This text also serves as a valuable reference to those readers who already have experience using Stata’s survival analysis routines.
The revised third edition has been updated for Stata 14, and it includes a new section on predictive margins and marginal effects, which demonstrates how to obtain and visualize marginal predictions and marginal effects using the margins and marginsplot commands after survival regression models.
Survival analysis is a field of its own that requires specialized data management and analysis procedures. To meet this requirement, Stata provides the stfamily of commands for organizing and summarizing survival data.
This book provides statistical theory, step-by-step procedures for analyzing survival data, an in-depth usage guide for Stata’s most widely used stcommands, and a collection of tips for using Stata to analyze survival data and to present the results. This book develops from first principles the statistical concepts unique to survival data and assumes only a knowledge of basic probability and statistics and a working knowledge of Stata.
The first three chapters of the text cover basic theoretical concepts: hazard functions, cumulative hazard functions, and their interpretations; survivor functions; hazard models; and a comparison of nonparametric, semiparametric, and parametric methodologies. Chapter 4 deals with censoring and truncation. The next three chapters cover the formatting, manipulation, stsetting, and error checking involved in preparing survival data for analysis using Stata’s st analysis commands. Chapter 8 covers nonparametric methods, including the Kaplan–Meier and Nelson–Aalen estimators and the various nonparametric tests for the equality of survival experience.
Chapters 9–11 discuss Cox regression and include various examples of fitting a Cox model, obtaining predictions, interpreting results, building models, model diagnostics, and regression with survey data. The next four chapters cover parametric models, which are fit using Stata’s streg command. These chapters include detailed derivations of all six parametric models currently supported in Stata and methods for determining which model is appropriate, as well as information on stratification, obtaining predictions, and advanced topics such as frailty models. Chapter 16 is devoted to power and sample-size calculations for survival studies. The final chapter covers survival analysis in the presence of competing risks.
List of tables
List of figures
Preface to the Revised Third Edition
Preface to the Third Edition
Preface to the Second Edition
Preface to the Revised Edition
Preface to the First Edition
Notation and typography
1. THE PROBLEM OF SURVIVAL ANALYSIS
Parametric modeling
Semiparametric modeling
Nonparametric analysis
Linking the three approaches
2. DESCRIBING THE DISTRIBUTION OF FAILURE TIMES
The survivor and hazard functions
The quantile function
Interpreting the cumulative hazard and hazard rate
Interpreting the cumulative hazard
Interpreting the hazard rate
Means and medians
3. HAZARD MODELS
Parametric models
Semiparametric models
Analysis time (time at risk)
4. CENSORING AND TRUNCATION
Censoring
Right-censoring
Interval-censoring
Left-censoring
Truncation
Left-truncation (delayed entry)
Right-truncation
Gaps
5. RECORDING SURVIVAL DATA
The desired format
Other formats
Example: Wide-form snapshot data
6. USING STSET
A short lesson on dates
Purposes of the stset command
Syntax of the stset command
Specifying analysis time
Variables defined by stset
Specifying what constitutes failure
Specifying when subjects exit from the analysis
Specifying when subjects enter the analysis
Specifying the subject-ID variable
Specifying the begin-of-span variable
Convenience options
7. AFTER STSET
Look at stset’s output
List some of your data
Use stdescribe
Use stvary
Perhaps use stfill
Example: Hip-fracture data
8. NONPARAMETRIC ANALYSIS
Inadequacies of standard univariate methods
The Kaplan–Meier estimator
Calculation
Censoring
Left-truncation (delayed entry)
Gaps
Relationship to the empirical distribution function
Other uses of sts list
Graphing the Kaplan–Meier estimate
The Nelson–Aalen estimator
Estimating the hazard function
Estimating mean and median survival times
Tests of hypothesis
The log-rank test
The Wilcoxon test
Other tests
Stratified tests
9. THE COX PROPORTIONAL HAZARD MODEL
Using stcox
The Cox model has no intercept
Interpreting coefficients
The effect of units on coefficients
Estimating the baseline cumulative hazard and survivor functions
Estimating the baseline hazard function
The effect of units on the baseline functions
Likelihood calculations
No tied failures
Tied failures
The marginal calculation
The partial calculation
The Breslow approximation
The Efron approximation
Summary
Stratified analysis
Obtaining coefficient estimates
Obtaining estimates of baseline functions
Cox models with shared frailty
Parameter estimation
Obtaining estimates of baseline functions
Cox models with survey data
Declaring survey characteristics
Fitting a Cox model with survey data
Some caveats of analyzing survival data from complex survey designs
Cox model with missing data—multiple imputation
Imputing missing values
Multiple-imputation inference
10. MODEL BUILDING USING STCOX
Indicator variables
Categorical variables
Continuous variables
Fractional polynomials
Interactions
Time-varying variables
Using stcox, tvc() texp()
Using stsplit
Modeling group effects: fixed-effects, random-effects, stratification, and clustering
11. THE COX MODEL: DIAGNOSTICS
Testing the proportional-hazards assumption
Tests based on reestimation
Test based on Schoenfeld residuals
Graphical methods
Residuals and diagnostic measures
Reye’s syndrome data
Determining functional form
Goodness of fit
Outliers and influential points
12. PARAMETRIC MODELS
Motivation
Classes of parametric models
Parametric proportional hazards models
Accelerated failure-time models
Comparing the two parameterizations
13. A SURVEY OF PARAMETRIC REGRESSION MODELS IN STATA
The exponential model
Exponential regression in the PH metric
Exponential regression in the AFT metric
Weibull regression
Weibull regression in the PH metric
Fitting null models
Weibull regression in the AFT metric
Gompertz regression (PH metric)
Lognormal regression (AFT metric)
Loglogistic regression (AFT metric)
Generalized gamma regression (AFT metric)
Choosing among parametric models
Nested models
Nonnested models
14. POSTESTIMATION COMMANDS FOR PARAMETRIC MODELS
Use of predict after streg
Predicting the time of failure
Predicting the hazard and related functions
Calculating residuals
Using stcurve
Predictive margins and marginal effects
Predictive margins
Marginal mean survival time
Marginal survival probabilities
Multiple-record data
Marginal effects
15. GENERALIZING THE PARAMETRIC REGRESSION MODEL
Using the ancillary() option
Stratified models
Frailty models
Unshared-frailty models
Example: Kidney data
Testing for heterogeneity
Shared-frailty models
16. POWER AND SAMPLE-SIZE DETERMINATION FOR SURVIVAL ANALYSIS
Estimating sample size
Multiple-myeloma data
Comparing two survivor functions nonparametrically
Comparing two exponential survivor functions
Cox regression models
Accounting for withdrawal and accrual of subjects
The effect of withdrawal or loss to follow-up
The effect of accrual
Examples
Estimating power and effect size
Tabulating or graphing results
17. COMPETING RISKS
Cause-specific hazards
Cumulative incidence functions
Nonparametric analysis
Breast cancer data
Cause-specific hazards
Cumulative incidence functions
Semiparametric analysis
Cause-specific hazards
Simultaneous regressions for cause-specific hazards
Cumulative incidence functions
Using stcrreg
Using stcox
Parametric analysis