Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model
by Patrick Royston and Paul C. Lambert
data:image/s3,"s3://crabby-images/db8f0/db8f0fbe8b45f83dfaf86d0ce7809557bfb7a993" alt="" Researchers
wishing to fit regression models to survival data have long faced the
difficult task of choosing between the Cox model and a parametric
survival model such as Weibull. Cox models can be fit using Stata’s
stcox command, and parametric models are fit using streg, which offers
five parametric forms in addition to Weibull. While the Cox model makes
minimal assumptions about the form of the baseline hazard function,
prediction of hazards and other related functions for a given set of
covariates is hindered by this lack of assumptions; the resulting
estimated curves are not smooth and do not possess information about
what occurs between the observed failure times. Parametric models offer
nice, smooth predictions by assuming a functional form of the hazard,
but often the assumed form is too structured for use with real data,
especially if there exist significant changes in the shape of the
hazard over time.
This text is concerned with obtaining a
compromise between Cox and parametric models that retains the desired
features of both types of models. The book is aimed at researchers who
are familiar with the basic concepts of survival analysis and with the
stcox and streg commands in Stata. As such, it is an excellent
complement to An Introduction to Survival Analysis Using Stata, by Cleves et al. (2010).
This book is written for Stata 12, but is fully compatible with Stata 11 as well.
Much of the text is dedicated to
estimation with Royston–Parmar models using the stpm2 command, which is
maintained by the authors and available from the Statistical Software
Components (SSC) archive at http://www.repec.org.
Royston–Parmar models are highly flexible alternatives to the
exponential, Weibull, loglogistic, and lognormal models (fit using
streg) that allow extension from proportional hazards to proportional
odds and to scaled probit models. Additional flexibility is obtained by
the use of restricted cubic spline functions as alternatives to the
linear functions of log time used in standard models. The authors
demonstrate fitting these models and graphing predicted hazards,
cumulative hazards, and survival functions with real data from breast
cancer and prostate cancer studies.
After some introductory material on the
motivation behind flexible parametric models and on working with
survival data in Stata, the authors proceed by demonstrating that Cox
models may instead be expressed as Poisson models by splitting the time
scale at the observed failures. The Poisson-model expression allows
extension by changing how the time scale is split and by introducing
restricted cubic splines and fractional polynomials. Royston–Parmar
models are then introduced, followed by material on model building and
diagnostics for these models. Considerable attention is then given to
time-dependent effects, how these may be modeled, and how to interpret
the graphs of the predicted functions the models produce. This material
is followed by a chapter on relative survival models such as those used
for population-based cancer studies. This chapter is very thorough,
relates well to the previous material, and is an ideal introduction for
those new to the concepts of relative survival and excess mortality.
The final chapter is devoted to advanced topics such as determining the
number needed to treat (NNT), handling multiple-event data, and
analyzing competing risks.
Table of contents
List of Tables
List of Figures
Preface
1 Theory and Practice
- 1.1 Goals
- 1.2 A brief review of the Cox proportional hazards model
- 1.3 Beyond the Cox model
- 1.3.1 Estimating the baseline hazard
- 1.3.2 The baseline hazard contains useful information
- 1.3.3 Advantages of smooth survival functions
- 1.3.4 Some requirements of a practical survival analysis
- 1.3.5 When the proportional-hazards assumption is breached
- 1.4 Why parametric models?
-
1.4.1 Smooth baseline hazard and survival functions
- 1.4.2 Time-dependent HRs
1.4.3 Modeling on different scales
1.4.4 Relative survival
1.4.5 Prediction out of sample
1.4.6 Multiple time scales
- 1.5 Why not standard parametric models?
- 1.6 A brief introduction to stpm2
-
1.6.1 Estimation (model fitting)
- 1.6.2 Postestimation facilities (prediction)
- 1.7 Basic relationships in survival analysis
- 1.8 Comparing models
- 1.9 The delta method
- 1.10 Ado-file resources
- 1.11 How our book is organized
2 Using stset and stsplit
- 2.1 What is the stset command?
- 2.2 Some key concepts
- 2.3 Syntax of the stset command
- 2.4 Variables created by the stset command
- 2.5 Examples of using stset
-
2.5.1 Standard survival data
- 2.5.2 Using the scale( ) option
-
2.5.3 Date of diagnosis and date of exit
- 2.5.4 Date of diagnosis and date of exit with the scale( ) option
-
2.5.5 Restricting the follow-up time
- 2.5.6 Left-truncation
-
2.5.7 Age as the time scale
-
- 2.6 The stsplit command
-
2.6.1 Time-dependent effects
- 2.6.2 Time-varying covariates
- 2.7 Conclusion
3 Graphical introduction to the principal datasets
- 3.1 Introduction
- 3.2 Rotterdam breast cancer data
- 3.3 England and Wales breast cancer data
- 3.4 Orchiectomy data
- 3.5 Conclusion
4 Poisson models 4.1 Introduction 4.2 Modeling rates with the Poisson distribution 4.3 Splitting the time scale
4.3.1 The piecewise exponential model
4.3.2 Time as just another covariate 4.4 Collapsing the data to speed up computation
4.5 Splitting at unique failure times
4.5.1 Technical note: Why the Cox and Poisson approaches are equivalent
4.6 Comparing a different number of intervals
4.7 Fine splitting of the time scale
4.8 Splines: Motivation and definition
4.8.1 Calculating splines
4.8.2 Restricted cubic splines
4.8.3 Splines: Application to the Rotterdam data
4.8.4 Varying the number of knots
4.8.5 Varying the location of the knots
4.8.6 Estimating the survival function
4.9 FPs: Motivation and definition
4.9.1 Application to Rotterdam data
4.9.2 Higher order FP models
4.9.3 FP function selection procedure
4.10 Discussion
5 Royston–Parmar models
- 5.1 Motivation and introduction
- 5.1.1 The exponential distribution
- 5.1.2 The Weibull distribution
- 5.1.3 Generalizing the Weibull
- 5.1.4 Estimating the hazard function
5.2 Proportional hazards models 5.2.1 Generalizing the Weibull
5.2.2 Example
5.2.3 Comparing parameters of PH(1) and Weibull models 5.3 Selecting a spline function
5.3.1 Knot positions
-
Example
5.3.2 How many knots? 5.4 PO models
5.4.1 Introduction
5.4.2 The loglogistic model
5.4.3 Generalizing the loglogistic model
- 5.4.4 Comparing parameters of PO(1) and loglogistic models
-
Example
- 5.5 Probit models
- 5.5.1 Motivation
- 5.5.2 Generalizing the probit model
- 5.5.3 Comparing parameters of probit(1) and lognormal models
- 5.5.4 Comments on probit and POs models
- 5.6 Royston–Parmar (RP) models
- 5.6.1 Models with θ not equal to 0 or 1
- 5.6.2 Example
- 5.6.3 Likelihood function and parameter estimation
- 5.6.4 Comparing regression coefficients
5.6.5 Model selection
5.6.6 Sensitivity to number of knots
5.6.7 Sensitivity to location of knots
- 5.7 Concluding remarks
6 Prognostic models
- 6.1 Introduction
6.2 Developing and reporting a prognostic model 6.3 What does the baseline hazard function mean?
6.3.1 Example 6.4 Model selection
6.4.1 Choice of scale and baseline complexity -
Example
6.4.2 Selection of variables and functional forms
-
Example
- 6.5 Quantitative outputs from the model
-
6.5.1 Survival probabilities for individuals
6.5.2 Survival probabilities across the risk spectrum
6.5.3 Survival probabilities at given covariate values
6.5.4 Survival probabilities in groups
6.5.5 Plotting adjusted survival curves
6.5.6 Plotting differences between survival curves
6.5.7 Centiles of the survival distribution
- 6.6 Goodness of fit
- 6.6.1 Example
- 6.7 Discrimination and explained variation
- 6.7.1 Example
- 6.7.2 Harrell’s C index of concordance
- 6.8 Out-of-sample prediction: Concept and applications
- 6.8.1 Extrapolation of survival functions: Basic technique
6.8.2 Extrapolation of survival functions: Further investigations
6.8.3 Validation of prognostic models: Basics
6.8.4 Validation of prognostic models: Further comments
- 6.9 Visualization of survival times
- 6.9.1 Example
- 6.10 Discussion
7 Time-dependent effects - 7.1 Introduction
7.2 Definitions
- 7.3 What do we mean by a TD effect?
- 7.4 Proportional on which scale?
- 7.5 Poisson models with TD effects
7.5.1 Piecewise models
7.5.2 Using restricted cubic splines 7.6 RP models with TD effects
-
7.6.1 Piecewise HRs
7.6.2 Continuous TD effects
7.6.3 More than one TD effect
7.6.4 Stratification is the same as including TD effects
- 7.7 TD effects for continuous variables
- 7.8 Attained age as the time scale
- 7.8.1 The orchiectomy data
7.8.2 Proportional hazards model
7.8.3 TD model
- 7.9 Multiple time scales
- 7.10 Prognostic models with TD effects
- 7.10.1 Example
- 7.11 Discussion
8 Relative survival
- 8.1 Introduction
8.2 What is relative survival? 8.3 Excess mortality and relative survival -
8.3.1 Excess mortality
- 8.3.2 Relative survival is a ratio
- 8.4 Motivating example
- 8.5 Life-table estimation of relative survival
- 8.5.1 Using strs
- 8.6 Poisson models for relative survival
- 8.6.1 Piecewise models
-
8.6.2 Restricted cubic splines
- 8.7 RP models for relative survival
- 8.7.1 Likelihood for relative survival models
8.7.2 Proportional cumulative excess hazards
8.7.3 RP models on other scales
8.7.4 Application to England and Wales breast cancer data
8.7.5 Relative survival models on other scales
- 8.7.6 Time-dependent effects
- 8.8 Some comments on model selection
8.9 Age as a continuous variable
8.10 Concluding remarks
9 Further topics - 9.1 Introduction
9.2 Number needed to treat - 9.2.1 Example
9.3 Average and adjusted survival curves
-
9.3.1 Renal data
- 9.4 Modeling distributions with RP models
9.4.1 Example 1: Rotterdam breast cancer data
9.4.2 Example 2: CD4 lymphocyte data
9.4.3 Example 3: Prostate cancer data
9.5 Multiple events
9.5.1 Introduction
9.5.2 The AG model
9.5.3 The WLW model
9.5.4 The PWP model
9.5.5 Multiple events in RP models
9.5.6 Summary
9.6 Bayesian RP models
9.6.1 Introduction
9.6.2 The “zeros trick” in WinBUGS
9.6.3 Fitting a RP model
9.6.4 Summary
9.7 Competing risks
9.7.1 Summary - 9.8 Period analysis
9.8.1 Introduction
9.8.2 What is period analysis?
9.8.3 Application to England and Wales breast cancer data
9.9 Crude probability of death from relative survival models
9.9.1 Introduction
9.9.2 Application to England and Wales breast cancer data
9.9.3 Conclusion
- 9.10 Final remarks
References
Author Index
Subject Index
© Copyright StataCorp LP 2002-2015.
data:image/s3,"s3://crabby-images/c5276/c5276908e337619276266c450350d9315cfd9522" alt=""
data:image/s3,"s3://crabby-images/1af26/1af264c1fb16fdf15e3798e76de5dc8a49e91469" alt=""
|