Simulating complex survival data

Simulation studies are essential for understanding and evaluating both current and new statistical models. When simulating survival times, one often assumes an exponential or Weibull distribution for the baseline hazard function, with survival times generated using the method of Bender, Augustin, and Blettner (2005, Statistics in Medicine 24: 1713–1723). Assuming a constant or monotonic hazard can be considered too simplistic and can lack biological plausibility in many situations. We describe a new user-written command, survsim, which allows the user to simulate survival times from two-component parametric mixture models, providing much more flexibility in the underlying hazard. Standard parametric distributions can also be used, including the exponential, Weibull, and Gompertz. Furthermore, survival times can be simulated from the all-cause distribution of cause-specific hazards for competing risks by using the method of Beyersmann et al. (2009, Statistics in Medicine 24: 956–971). A multinomial distribution is used to create the event indicator, whereby the probability of experiencing each event at a simulated time t is the cause-specific hazard divided by the all-cause hazard evaluated at time t. Baseline covariates can be included in all scenarios. We also describe the extension to incorporate nonproportional hazards in standard parametric and competing-risks scenarios.

Issue Date:
Publication Type:
Journal Article
DOI and Other Identifiers:
st0275 (Other)
Record Identifier:
PURL Identifier:
Published in:
Stata Journal, Volume 12, Number 4
Page range:
Total Pages:

Record appears in:

 Record created 2017-04-01, last modified 2018-01-23

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)