Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Statistician Walks into a Tech Company
R at a rapidly scaling healthcare technology startup
Sandy Griffith
Twitter: @sgr...
My story
Academic biostatistics
© 2016 Flatiron Health, Inc. Proprietary and confidential.
My story
3
Academic biostatistics Healthcare tech
© 2016 Flatiron Health, Inc. Proprietary and confidential. 4
Flatiron’s mission is to serve cancer patients and our
partne...
Flatiron Processes EHR Data At Scale
© 2016 Flatiron Health, Inc. Proprietary and confidential. 5
Research-
Grade Data
Dem...
Rapidly Scaling
January 2015
Flatiron: ~140
Software Engineers: ~50
Quantitative Sciences team: 1
6© 2016 Flatiron Health,...
Now: We are a team of 262
7
We include…
All Flatiron data and tools are collaboratively built, implemented and maintained ...
Primary Language: time of hire
© 2015 Flatiron Health, Inc. Proprietary and confidential. 8© 2016 Flatiron Health, Inc. Pr...
Proficiency with R: time of hire
9© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
10© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
11© 2016 Flatiron Health, Inc. Proprietary and confidential.
Cultivate R culture
1. Internal R Package
2. User group
3. Slack channel
4. Trainings
5. Hiring
12© 2016 Flatiron Health, ...
Cultivate R culture
1. Internal R Package
2. User group
3. Slack channel
4. Trainings
5. Hiring
13© 2016 Flatiron Health, ...
Proficiency with R
14© 2016 Flatiron Health, Inc. Proprietary and confidential.
Time of hire Now
Now we have R users, but when should we use R?
Three scenarios:
1. R for prototyping → !R in production
2. R as a long-ter...
R for prototyping → !R in production
16© 2016 Flatiron Health, Inc. Proprietary and confidential.
Prototype
● One-time lin...
R for prototyping → !R in production
Why this made sense:
● Stable method -- No longer needed rapid iteration
● Tuning par...
R as a long-term solution
Early version (Jan 2015)
18© 2016 Flatiron Health, Inc. Proprietary and confidential.
● bash com...
R as a long-term solution
19© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Rmarkdown QA report
Why th...
R and !R in parallel
● Specific research questions
● 2 people code independently in Python/SQL and R
● Compare results
● L...
Thank you
● Melissa Curtis
● Josh Kraut
● Kathi Seidl-Rathkopf
● Cindy Revol
● Rachael Sorg
● Jay Rughani
21© 2016 Flatiro...
Upcoming SlideShare
Loading in …5
×

A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare Startup

9,942 views

Published on

Delivered by Sandy Griffith (Biostatistician, Flatiron Health) at the 2016 New York R Conference on April 8th and 9th at Work-Bench.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare Startup

  1. 1. A Statistician Walks into a Tech Company R at a rapidly scaling healthcare technology startup Sandy Griffith Twitter: @sgrifter sgriffith@flatiron.com www.flatiron.com
  2. 2. My story Academic biostatistics © 2016 Flatiron Health, Inc. Proprietary and confidential.
  3. 3. My story 3 Academic biostatistics Healthcare tech
  4. 4. © 2016 Flatiron Health, Inc. Proprietary and confidential. 4 Flatiron’s mission is to serve cancer patients and our partners by dramatically improving treatment and accelerating research. Our Mission
  5. 5. Flatiron Processes EHR Data At Scale © 2016 Flatiron Health, Inc. Proprietary and confidential. 5 Research- Grade Data Demographics Diagnosis Visits Labs e-Prescribing Pathology Report Discharge Notes Radiology Report Physician Notes Electronic Health Record Structured Data Unstructured Data Outside Practice Hospital Lab Structured Data Processing Unstructured Data Processing Standard EHR Data
  6. 6. Rapidly Scaling January 2015 Flatiron: ~140 Software Engineers: ~50 Quantitative Sciences team: 1 6© 2016 Flatiron Health, Inc. Proprietary and confidential.
  7. 7. Now: We are a team of 262 7 We include… All Flatiron data and tools are collaboratively built, implemented and maintained by a cross-disciplinary team that includes oncology, engineering, and quantitative sciences We come from… 9 Medical oncologists and nurses 70 Software engineers 10 Quantitative scientists 5 Medical informaticists + more! © 2016 Flatiron Health, Inc. Proprietary and confidential.
  8. 8. Primary Language: time of hire © 2015 Flatiron Health, Inc. Proprietary and confidential. 8© 2016 Flatiron Health, Inc. Proprietary and confidential.
  9. 9. Proficiency with R: time of hire 9© 2016 Flatiron Health, Inc. Proprietary and confidential.
  10. 10. A decision point early on 10© 2016 Flatiron Health, Inc. Proprietary and confidential.
  11. 11. A decision point early on 11© 2016 Flatiron Health, Inc. Proprietary and confidential.
  12. 12. Cultivate R culture 1. Internal R Package 2. User group 3. Slack channel 4. Trainings 5. Hiring 12© 2016 Flatiron Health, Inc. Proprietary and confidential.
  13. 13. Cultivate R culture 1. Internal R Package 2. User group 3. Slack channel 4. Trainings 5. Hiring 13© 2016 Flatiron Health, Inc. Proprietary and confidential.
  14. 14. Proficiency with R 14© 2016 Flatiron Health, Inc. Proprietary and confidential. Time of hire Now
  15. 15. Now we have R users, but when should we use R? Three scenarios: 1. R for prototyping → !R in production 2. R as a long-term solution 3. R and !R in parallel 15© 2016 Flatiron Health, Inc. Proprietary and confidential.
  16. 16. R for prototyping → !R in production 16© 2016 Flatiron Health, Inc. Proprietary and confidential. Prototype ● One-time linkage ● Small cohort (10s of thousands) ● RecordLinkage R package ● Probabilistic linkage method using EM algorithm Production ● Repeated daily at scale ● Large cohort (~5 million patients) ● Code maintained by different team ● Deterministic logic in SQL Example: Linking external mortality data
  17. 17. R for prototyping → !R in production Why this made sense: ● Stable method -- No longer needed rapid iteration ● Tuning parameters ● Similar performance, more transparency ● No R users on team that would be maintaining code 17© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Linking external mortality data
  18. 18. R as a long-term solution Early version (Jan 2015) 18© 2016 Flatiron Health, Inc. Proprietary and confidential. ● bash commands for extracting data run from R script using ETL tool ● R script run via command line ● parameters in metafiles manually updated ● Runs a series of Rmd files and renders HTML output Current Version (April 2016) Example: Rmarkdown QA report ● linked to data pipeline maintained by software engineering ● metafile generated dynamically ● Plotly survival curves ● Flatly bootstrap theme ● Plan to continue using R indefinitely
  19. 19. R as a long-term solution 19© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Rmarkdown QA report Why this made sense: ● Mature product and team ● Quantitative science members remain embedded in team ● Strong support and collaboration with software engineering ● Requirements are dynamic -- continued need for rapid prototyping
  20. 20. R and !R in parallel ● Specific research questions ● 2 people code independently in Python/SQL and R ● Compare results ● Language sometimes incidental, more about 2 different perspectives Why this made sense: ● High stakes or low error tolerance ● Complicated concepts ● Custom projects often involve novel problems 20© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Some external collaborations
  21. 21. Thank you ● Melissa Curtis ● Josh Kraut ● Kathi Seidl-Rathkopf ● Cindy Revol ● Rachael Sorg ● Jay Rughani 21© 2016 Flatiron Health, Inc. Proprietary and confidential. ● Paul You ● Aracelis Torres ● Alphan Kirayoglu ● Ben Birnbaum ● Ann Jaskiw ● James Gippetti Join our Team! Drop me a note at sgriffith@flatiron.com, @sgrifter, or visit flatiron.com/careers

×