1. Name
Title
Company
Social Profile (Twitter / LinkedIn)
Session TitleIntroducing
H2OCoxPH
Patrick Aboyoun
Senior Data Scientist, Training
H2O.ai
linkedin.com/in/patrickaboyoun
2. Cox Proportional Hazards Model
• Common approach for analyzing time-to-event data
• Individuals in population at risk for event of interest
• One of three outcomes for each individual
• Event happens
• Event may still happen in the future
• Circumstances changed and event no longer possible
• Comprised of linear combination of predictors and a non-linear
baseline hazard function
• Semi-parametric model
• Coefficients define influence of predictors
3. Cox Proportional Hazards Model
• The instantaneous rate of an event occurrence is expressed as
𝜆 𝑘 𝑡 𝑥𝑖
𝑇
= 𝜆 𝑘 𝑡 𝑒 𝑥 𝑖
𝑇
𝛽
• where
• 𝜆 𝑘 𝑡 is the baseline hazard for stratum 𝑘
• 𝑥𝑖 is the data vector for observation 𝑖
• 𝛽 is the coefficient vector
• Semi-parametric formulation avoids distributional assumption on
underlying hazard function
4. Survival Analysis Terminology
• Hazard Function
• Probability event happens in the next instant given that it hasn’t happened
• 𝜆 𝑡 = lim
ℎ↓0
𝑃𝑟 𝑡 ≤ 𝑇 < 𝑡 + ℎ|𝑇 ≥ 𝑡
• Survival Function
• 𝑆 𝑡 = 𝑃𝑟 𝑇 > 𝑡
• Cumulative Hazard Function
• Λ 𝑡 = 0
𝑡
𝜆 𝑢 𝑑𝑢
• Λ 𝑡 = − log 𝑆 𝑡
5. Big Data and Cox Proportional Hazards Model
• Data set with millions or tens of millions of individuals
• Predictors that change over time multiply the size of the data
• Additional rows with start, stop values and current state of predictors
• time-dependent covariates
• Low probability events require informed random sampling or large
data sets to understand