SlideShare a Scribd company logo
1 of 9
Name
Title
Company
Social Profile (Twitter / LinkedIn)
Session TitleIntroducing
H2OCoxPH
Patrick Aboyoun
Senior Data Scientist, Training
H2O.ai
linkedin.com/in/patrickaboyoun
Cox Proportional Hazards Model
• Common approach for analyzing time-to-event data
• Individuals in population at risk for event of interest
• One of three outcomes for each individual
• Event happens
• Event may still happen in the future
• Circumstances changed and event no longer possible
• Comprised of linear combination of predictors and a non-linear
baseline hazard function
• Semi-parametric model
• Coefficients define influence of predictors
Cox Proportional Hazards Model
• The instantaneous rate of an event occurrence is expressed as
𝜆 𝑘 𝑡 𝑥𝑖
𝑇
= 𝜆 𝑘 𝑡 𝑒 𝑥 𝑖
𝑇
𝛽
• where
• 𝜆 𝑘 𝑡 is the baseline hazard for stratum 𝑘
• 𝑥𝑖 is the data vector for observation 𝑖
• 𝛽 is the coefficient vector
• Semi-parametric formulation avoids distributional assumption on
underlying hazard function
Survival Analysis Terminology
• Hazard Function
• Probability event happens in the next instant given that it hasn’t happened
• 𝜆 𝑡 = lim
ℎ↓0
𝑃𝑟 𝑡 ≤ 𝑇 < 𝑡 + ℎ|𝑇 ≥ 𝑡
• Survival Function
• 𝑆 𝑡 = 𝑃𝑟 𝑇 > 𝑡
• Cumulative Hazard Function
• Λ 𝑡 = 0
𝑡
𝜆 𝑢 𝑑𝑢
• Λ 𝑡 = − log 𝑆 𝑡
Big Data and Cox Proportional Hazards Model
• Data set with millions or tens of millions of individuals
• Predictors that change over time multiply the size of the data
• Additional rows with start, stop values and current state of predictors
• time-dependent covariates
• Low probability events require informed random sampling or large
data sets to understand
R survivial::coxph
versus
H2OCoxPH
• Model:
• 112 Coefficients
• EC2 VM:
• m5.2xlarge
• 8 CPU
• 32 GB RAM
H2OCoxPH Syntax
h2o.coxph(x,
event_column,
training_frame,
model_id = NULL,
start_column = NULL,
stop_column = NULL,
weights_column = NULL,
offset_column = NULL,
stratify_by = NULL,
ties = "efron",
init = 0,
lre_min = 9,
max_iterations = 20,
interactions = NULL,
interaction_pairs = NULL,
interactions_only = NULL,
use_all_factor_levels = FALSE)
H2OCoxPH Additional Functions
• Model Extraction Functions
• h2o.coef , coef
• extractAIC
• logLik
• vcov
• Scoring Functions
• h2o.predict
• survfit
H2OCoxPH Example
DEMO

More Related Content

Similar to Introduction to H2OCoxPH

Internship_presentation
Internship_presentationInternship_presentation
Internship_presentation
Aditya Gautam
 
Using Formal Methods to Create Instruction Set Architectures
Using Formal Methods to Create Instruction Set ArchitecturesUsing Formal Methods to Create Instruction Set Architectures
Using Formal Methods to Create Instruction Set Architectures
DVClub
 
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' BacktrackingVL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
YoungSeok Yoon
 

Similar to Introduction to H2OCoxPH (20)

Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
Internship_presentation
Internship_presentationInternship_presentation
Internship_presentation
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Using Formal Methods to Create Instruction Set Architectures
Using Formal Methods to Create Instruction Set ArchitecturesUsing Formal Methods to Create Instruction Set Architectures
Using Formal Methods to Create Instruction Set Architectures
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Hdda
HddaHdda
Hdda
 
Parallel Left Ventricle Simulation Using the FEniCS Framework
Parallel Left Ventricle Simulation Using the FEniCS FrameworkParallel Left Ventricle Simulation Using the FEniCS Framework
Parallel Left Ventricle Simulation Using the FEniCS Framework
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Emergency response behaviour data collection issue
Emergency response behaviour data collection issueEmergency response behaviour data collection issue
Emergency response behaviour data collection issue
 
Protein structure prediction
Protein structure predictionProtein structure prediction
Protein structure prediction
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' BacktrackingVL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 

More from Sri Ambati

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Introduction to H2OCoxPH

  • 1. Name Title Company Social Profile (Twitter / LinkedIn) Session TitleIntroducing H2OCoxPH Patrick Aboyoun Senior Data Scientist, Training H2O.ai linkedin.com/in/patrickaboyoun
  • 2. Cox Proportional Hazards Model • Common approach for analyzing time-to-event data • Individuals in population at risk for event of interest • One of three outcomes for each individual • Event happens • Event may still happen in the future • Circumstances changed and event no longer possible • Comprised of linear combination of predictors and a non-linear baseline hazard function • Semi-parametric model • Coefficients define influence of predictors
  • 3. Cox Proportional Hazards Model • The instantaneous rate of an event occurrence is expressed as 𝜆 𝑘 𝑡 𝑥𝑖 𝑇 = 𝜆 𝑘 𝑡 𝑒 𝑥 𝑖 𝑇 𝛽 • where • 𝜆 𝑘 𝑡 is the baseline hazard for stratum 𝑘 • 𝑥𝑖 is the data vector for observation 𝑖 • 𝛽 is the coefficient vector • Semi-parametric formulation avoids distributional assumption on underlying hazard function
  • 4. Survival Analysis Terminology • Hazard Function • Probability event happens in the next instant given that it hasn’t happened • 𝜆 𝑡 = lim ℎ↓0 𝑃𝑟 𝑡 ≤ 𝑇 < 𝑡 + ℎ|𝑇 ≥ 𝑡 • Survival Function • 𝑆 𝑡 = 𝑃𝑟 𝑇 > 𝑡 • Cumulative Hazard Function • Λ 𝑡 = 0 𝑡 𝜆 𝑢 𝑑𝑢 • Λ 𝑡 = − log 𝑆 𝑡
  • 5. Big Data and Cox Proportional Hazards Model • Data set with millions or tens of millions of individuals • Predictors that change over time multiply the size of the data • Additional rows with start, stop values and current state of predictors • time-dependent covariates • Low probability events require informed random sampling or large data sets to understand
  • 6. R survivial::coxph versus H2OCoxPH • Model: • 112 Coefficients • EC2 VM: • m5.2xlarge • 8 CPU • 32 GB RAM
  • 7. H2OCoxPH Syntax h2o.coxph(x, event_column, training_frame, model_id = NULL, start_column = NULL, stop_column = NULL, weights_column = NULL, offset_column = NULL, stratify_by = NULL, ties = "efron", init = 0, lre_min = 9, max_iterations = 20, interactions = NULL, interaction_pairs = NULL, interactions_only = NULL, use_all_factor_levels = FALSE)
  • 8. H2OCoxPH Additional Functions • Model Extraction Functions • h2o.coef , coef • extractAIC • logLik • vcov • Scoring Functions • h2o.predict • survfit