Researchers wishing to fit regression models to survival data have long faced the difficult task of choosing between the Cox model and a parametric survival model such as Weibull. Cox models can be fit using Stata’s stcox command, and parametric models are fit using streg, which offers five parametric forms in addition to Weibull. While the Cox model makes minimal assumptions about the form of the baseline hazard function, prediction of hazards and other related functions for a given set of covariates is hindered by this lack of assumptions; the resulting estimated curves are not smooth and do not possess information about what occurs between the observed failure times. Parametric models offer nice, smooth predictions by assuming a functional form of the hazard, but often the assumed form is too structured for use with real data, especially if there exist significant changes in the shape of the hazard over time.
This text is concerned with obtaining a compromise between Cox and parametric models that retains the desired features of both types of models. The book is aimed at researchers who are familiar with the basic concepts of survival analysis and with the stcox and streg commands in Stata. As such, it is an excellent complement to An Introduction to Survival Analysis Using Stata, by Cleves et al. (2010).
This book is written for Stata 12, but is fully compatible with Stata 11 as well.
Much of the text is dedicated to estimation with Royston–Parmar models using the stpm2 command, which is maintained by the authors and available from the Statistical Software Components (SSC) archive at http://www.repec.org. Royston–Parmar models are highly flexible alternatives to the exponential, Weibull, loglogistic, and lognormal models (fit using streg) that allow extension from proportional hazards to proportional odds and to scaled probit models. Additional flexibility is obtained by the use of restricted cubic spline functions as alternatives to the linear functions of log time used in standard models. The authors demonstrate fitting these models and graphing predicted hazards, cumulative hazards, and survival functions with real data from breast cancer and prostate cancer studies.
After some introductory material on the motivation behind flexible parametric models and on working with survival data in Stata, the authors proceed by demonstrating that Cox models may instead be expressed as Poisson models by splitting the time scale at the observed failures. The Poisson-model expression allows extension by changing how the time scale is split and by introducing restricted cubic splines and fractional polynomials. Royston–Parmar models are then introduced, followed by material on model building and diagnostics for these models. Considerable attention is then given to time-dependent effects, how these may be modeled, and how to interpret the graphs of the predicted functions the models produce. This material is followed by a chapter on relative survival models such as those used for population-based cancer studies. This chapter is very thorough, relates well to the previous material, and is an ideal introduction for those new to the concepts of relative survival and excess mortality. The final chapter is devoted to advanced topics such as determining the number needed to treat (NNT), handling multiple-event data, and analyzing competing risks.
List of Tables
List of Figures
Preface
1. THEORY AND PRACTICE
Goals
A brief review of the Cox proportional hazards model
Beyond the Cox model
Estimating the baseline hazard
The baseline hazard contains useful information
Advantages of smooth survival functions
Some requirements of a practical survival analysis
When the proportional-hazards assumption is breached
Why parametric models?
Smooth baseline hazard and survival functions
Time-dependent HRs
Modeling on different scales
Relative survival
Prediction out of sample
Multiple time scales
Why not standard parametric models?
A brief introduction to stpm2
Estimation (model fitting)
Postestimation facilities (prediction)
Basic relationships in survival analysis
Comparing models
The delta method
Ado-file resources
How our book is organized
2. USING STSET AND STSPLIT
What is the stset command?
Some key concepts
Syntax of the stset command
Variables created by the stset command
Examples of using stset
Standard survival data
Using the scale( ) option
Date of diagnosis and date of exit
Date of diagnosis and date of exit with the scale( ) option
Restricting the follow-up time
Left-truncation
Age as the time scale
The stsplit command
Time-dependent effects
Time-varying covariates
Conclusion
3. GRAPHICAL INTRODUCTION TO THE PRINCIPAL DATASETS
Introduction
Rotterdam breast cancer data
England and Wales breast cancer data
Orchiectomy data
Conclusion
4. POISSON MODELS
Introduction
Modeling rates with the Poisson distribution
Splitting the time scale
The piecewise exponential model
Time as just another covariate
Collapsing the data to speed up computation
Splitting at unique failure times
Technical note: Why the Cox and Poisson approaches are equivalent
Comparing a different number of intervals
Fine splitting of the time scale
Splines: Motivation and definition
Calculating splines
Restricted cubic splines
Splines: Application to the Rotterdam data
Varying the number of knots
Varying the location of the knots
Estimating the survival function
FPs: Motivation and definition
Application to Rotterdam data
Higher order FP models
FP function selection procedure
Discussion
5. ROYSTON- PARMAR MODELS
Motivation and introduction
The exponential distribution
The Weibull distribution
Generalizing the Weibull
Estimating the hazard function
Proportional hazards models
Generalizing the Weibull
Example
Comparing parameters of PH(1) and Weibull models
Selecting a spline function
Knot positions
Example
How many knots?
PO models
Introduction
The loglogistic model
Generalizing the loglogistic model
Comparing parameters of PO(1) and loglogistic models
Example
Probit models
Motivation
Generalizing the probit model
Comparing parameters of probit(1) and lognormal models
Comments on probit and POs models
Royston–Parmar (RP) models
Models with ? not equal to 0 or 1
Example
Likelihood function and parameter estimation
Comparing regression coefficients
Model selection
Sensitivity to number of knots
Sensitivity to location of knots
Concluding remarks
6. PROGNOSTIC MODELS
Introduction
Developing and reporting a prognostic model
What does the baseline hazard function mean?
Example
Model selection
Choice of scale and baseline complexity
Example
Selection of variables and functional forms
Example
Quantitative outputs from the model
Survival probabilities for individuals
Survival probabilities across the risk spectrum
Survival probabilities at given covariate values
Survival probabilities in groups
Plotting adjusted survival curves
Plotting differences between survival curves
Centiles of the survival distribution
Goodness of fit
Example
Discrimination and explained variation
Example
Harrell’s C index of concordance
Out-of-sample prediction: Concept and applications
Extrapolation of survival functions: Basic technique
Extrapolation of survival functions: Further investigations
Validation of prognostic models: Basics
Validation of prognostic models: Further comments
Visualization of survival times
Example
Discussion
7. TIME-DEPENDENT EFFECTS
Introduction
Definitions
What do we mean by a TD effect?
Proportional on which scale?
Poisson models with TD effects
Piecewise models
Using restricted cubic splines
RP models with TD effects
Piecewise HRs
Continuous TD effects
More than one TD effect
Stratification is the same as including TD effects
TD effects for continuous variables
Attained age as the time scale
The orchiectomy data
Proportional hazards model
TD model
Multiple time scales
Prognostic models with TD effects
Example
Discussion
8. RELATIVE SURVIVAL
Introduction
What is relative survival?
Excess mortality and relative survival
Excess mortality
Relative survival is a ratio
Motivating example
Life-table estimation of relative survival
Using strs
Poisson models for relative survival
Piecewise models
Restricted cubic splines
RP models for relative survival
Likelihood for relative survival models
Proportional cumulative excess hazards
RP models on other scales
Application to England and Wales breast cancer data
Relative survival models on other scales
Time-dependent effects
Some comments on model selection
Age as a continuous variable
Concluding remarks
9. FURTHER TOPICS
Introduction
Number needed to treat
Example
Average and adjusted survival curves
Renal data
Modeling distributions with RP models
Example 1: Rotterdam breast cancer data
Example 2: CD4 lymphocyte data
Example 3: Prostate cancer data
Multiple events
Introduction
The AG model
The WLW model
The PWP model
Multiple events in RP models
Summary
Bayesian RP models
Introduction
The “zeros trick” in WinBUGS
Fitting a RP model
Summary
Competing risks
Summary
Period analysis
Introduction
What is period analysis?
Application to England and Wales breast cancer data
Crude probability of death from relative survival models
Introduction
Application to England and Wales breast cancer data
Conclusion
Final remarks