SlideShare a Scribd company logo
1 of 44
DATA SCIENCE
TEAM COLLABORATION
FORGET ABOUT MEETING ME HALFWAY,
TAKE ME THE LAST MILE
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
OGT molecular dynamics simulation
Protein “mouth” opening, 1us
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
CERN computing facility
Geneva, Switzerland
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
SUCCESS COMES FROM TEAM WORK
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
SUCCESS COMES FROM TEAM WORK
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
IAN: ENGINEER, PHYSICIST, BIOLOGIST?
• Ian Stokes-Rees, @ijstokes
• Product Marketing Manager
• Computational Scientist
• Passionate advocate of
Open Data Science
• Educator and evangelist for use of
Python and Anaconda
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
FIRST TASTE OF “BIG DATA” COMPUTING
• 100,000 acoustic tri-phone models
• 100 parameters per model
• 10 million parameters to estimate
• adaptation = real-time adjustment
• computation = tricky!
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
PhD on CERN LHCb COMPUTING TEAM
Distributed computing infrastructure
• 1000s of concurrent users
• 100s of federated computing centers
• no centralized control
• 1M+ servers with software installed
• 20+ year life span
• 20 GB of data per second
• 14 hours per day
• 7 days a week
• 7 months of the year
March 26, 2010 LHCb first physics at 3.5 TeV
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
HOW DO CERN PHYSICISTS DO THIS?
• Some smart people over there
• Who brought us the Web, HTTP, and HTML?
• Big Data
• Multi-PB per year
• Large collaborating teams
• 1000s of people accessing systems
• Computation critical
• Or there is no way to make sense of the data
• And discover new physics December 2, 2016
LHCb proton-lead collisions
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
CERN ATLAS detector
Calorimeter end cap wiring harness
Millions of data feeds @ 40 MHz signal rate
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
HOW WOULD YOU DO IT?
Custom hardware:CMS L0 muon trigger ASIC
Giant compute and storage clusters
Wicked fast algorithms
written in Fortran and C
Python: the Swiss
army knife for
computational physics
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
PYTHON: LINGUA FRANCA FOR DATA SCIENCE
• Human readable
• Easy to learn
• Object oriented
• Cleanly wraps C and Fortran
• Amazing foundation of high
quality data science libraries
• Suitable for scripting,
algorithms, data processing
and applications
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
THE CALCULUS OF NEWTON AND LEIBNIZ
SOMETIMES ESOTERIC IS OK
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
HERMITS AND HIGH PRIESTS
NPS, Richard Proenneke 1985
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
MOLECULAR BIOLOGY:
FROM PROTONS TO PROTEINS
• It takes 3-9 months in the wet lab to
prepare protein samples
• Once prepared it is only a few days to
”image” those samples and produce
digitized representations
• However the “images” aren’t yet 3D
atomic models
• That takes from weeks to months to
complete, sitting behind a computer
• You may know it as protein folding
Nature, 2011 PMID: 21240259
Lazarus, Nam, Jiang, Sliz, Walker
HOW DO WE ACCELERATE
THE TIME TO INSIGHT?
SUCCESS COMES FROM TEAM WORK
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
WHAT DOES “HALF WAY” LOOK LIKE?
Today’s “good” data science environment:
•Provide high performance computing resources
• For example, Hadoop infrastructure
•Deploy a wide selection of the most popular
analysis software
•Training and documentation
•Technical support
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
FISH OUT OF WATER
• Why would we take an expert
biochemist and force them to be
• A software engineer?
• An IT system administrator?
• A statistician?
• What can we do to let them focus on
being a great biochemist?
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
FISH OUT OF WATER
• Why would we take an expert
business analyst and force them to be
• A software engineer?
• An IT system administrator?
• A statistician?
• What can we do to let them focus on
being a great business analyst?
SUCCESS COMES FROM TEAM WORK
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
TAKE ME THE LAST MILE
• DevOps engineer pre-configures scalable computation
• Laptop to server to cluster
• DevOps team is a partner, not a service provider
• Software engineer creates and customizes software
for the task, project or individual
• Avoiding generic, static software setups
• Data scientist composes workflow
• Analyst is provided simple high level interface
• With option to “drill down”
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
WHAT ABOUT THOSE PROTEINS?
• Normally it takes 10-200 hours of computing time to match a
”template” protein fragment to the imaging data
• There are 100k templates (known protein “folds”) to choose from
• ”Be stupid” and just try them all – sometimes you’ll be surprised!
• I spent 18 months working with biochemists and IT sys admins across
the country to create a sensible parallel & distributed workflow
• 4-40 hours wall clock time to run 2k-20k hour parallel computation
• Real-time updates of results
• Web based interface to access summary and detailed data viz
• Analysis performed in Jupyter Notebook, allowing customization
• File-system based to enable “drill down” and direct access
• 6M hours per year (~700 years), peak parallelism 20k cores
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
DATA SCIENCE PATTERN
• How is it done today?
• What is the opportunity for improvement?
• Prototype and evaluate – is it better? Rinse and repeat
• Standardize and automate the workflow/model
• Scale the workflow/model
• Preprocess and distribute the data
• Instrument execution and set quality metrics
• Establish easy access interface
• Create programmatic APIs
FIN
SUCCESS COMES FROM TEAM WORK
Remember the footnote?Collaborative cross-functional teams
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
BREAKING DATA SCIENCE OPEN
ANACONDA & COLLABORATION
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 1: ANACONDA
http://continuum.io/downloads
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
NOTEBOOKS FOR DATA SCIENCE COLLABORATION
Do you understand why notebooks are so popular?
There are many angles to this, but my take:
• Visual record of the data science process
• They tell a story, and support rich hyperlinked prose
• Data can be embedded
• Algorithms or analysis techniques are captured
• Interactive visualizations are inline
• Sharable
• Reproducible*
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: ANACONDA CLOUD
http://anaconda.org
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: ANACONDA CLOUD
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: (MY) ANACONDA CLOUD
http://anaconda.org/ijstokes
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: (MY) ANACONDA CLOUD
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 2: (MY) ANACONDA CLOUD
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 3: ANACONDA ENTERPRISE (TODAY)
#OpenDataScienceMeans #AnacondaCON Ian.Stokes-Rees @ijstokes
STEP 3: ANACONDA ENTERPRISE (COMING SOON)
ANACONDA:
GIVING SUPERPOWERS TO THE PEOPLE
WHO CHANGE THE WORLD
TEAMS
THANK YOU! QUESTIONS?
Ian Stokes-Rees @ijstokes

More Related Content

What's hot

Neo4j for Discovering Drugs and Biomarkers
Neo4j for Discovering Drugs and BiomarkersNeo4j for Discovering Drugs and Biomarkers
Neo4j for Discovering Drugs and BiomarkersNeo4j
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHeroMaggie Hays
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Edureka!
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...Databricks
 
Neo4j GraphDay Munich - Life & Health Sciences Intro to Graphs
Neo4j GraphDay Munich - Life & Health Sciences Intro to GraphsNeo4j GraphDay Munich - Life & Health Sciences Intro to Graphs
Neo4j GraphDay Munich - Life & Health Sciences Intro to GraphsNeo4j
 

What's hot (9)

Neo4j for Discovering Drugs and Biomarkers
Neo4j for Discovering Drugs and BiomarkersNeo4j for Discovering Drugs and Biomarkers
Neo4j for Discovering Drugs and Biomarkers
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHero
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
 
Neo4j GraphDay Munich - Life & Health Sciences Intro to Graphs
Neo4j GraphDay Munich - Life & Health Sciences Intro to GraphsNeo4j GraphDay Munich - Life & Health Sciences Intro to Graphs
Neo4j GraphDay Munich - Life & Health Sciences Intro to Graphs
 

Similar to Anaconda Data Science Collaboration

Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesMatthew Vaughn
 
Bradley Evans SPEDDEXES 2014
Bradley Evans SPEDDEXES 2014Bradley Evans SPEDDEXES 2014
Bradley Evans SPEDDEXES 2014aceas13tern
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive eraIBM Analytics
 
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...Databricks
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
MongoDB World 2018: Petro.ai: Big Data, Data Science, and Chat Unite in Oil &...
MongoDB World 2018: Petro.ai: Big Data, Data Science, and Chat Unite in Oil &...MongoDB World 2018: Petro.ai: Big Data, Data Science, and Chat Unite in Oil &...
MongoDB World 2018: Petro.ai: Big Data, Data Science, and Chat Unite in Oil &...MongoDB
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015Jim Belak
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Taming the Big Data Beast - Together
Taming the Big Data Beast - TogetherTaming the Big Data Beast - Together
Taming the Big Data Beast - TogetherKennisalliantie
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
 
Our path to apache spark
Our path to apache sparkOur path to apache spark
Our path to apache sparkppetr82
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfssuserff37aa
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
 
Are Data Lakes for Business Users Webinar
Are Data Lakes for Business Users WebinarAre Data Lakes for Business Users Webinar
Are Data Lakes for Business Users WebinarArcadia Data
 

Similar to Anaconda Data Science Collaboration (20)

Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data Challenges
 
Bradley Evans SPEDDEXES 2014
Bradley Evans SPEDDEXES 2014Bradley Evans SPEDDEXES 2014
Bradley Evans SPEDDEXES 2014
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive era
 
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
MongoDB World 2018: Petro.ai: Big Data, Data Science, and Chat Unite in Oil &...
MongoDB World 2018: Petro.ai: Big Data, Data Science, and Chat Unite in Oil &...MongoDB World 2018: Petro.ai: Big Data, Data Science, and Chat Unite in Oil &...
MongoDB World 2018: Petro.ai: Big Data, Data Science, and Chat Unite in Oil &...
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Taming the Big Data Beast - Together
Taming the Big Data Beast - TogetherTaming the Big Data Beast - Together
Taming the Big Data Beast - Together
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
Our path to apache spark
Our path to apache sparkOur path to apache spark
Our path to apache spark
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
NECST at a Glance and the DReAMS Research Line
NECST at a Glance and the DReAMS Research LineNECST at a Glance and the DReAMS Research Line
NECST at a Glance and the DReAMS Research Line
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Are Data Lakes for Business Users Webinar
Are Data Lakes for Business Users WebinarAre Data Lakes for Business Users Webinar
Are Data Lakes for Business Users Webinar
 
Apek Mulay resume
Apek Mulay resumeApek Mulay resume
Apek Mulay resume
 

More from Boston Consulting Group

Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...Boston Consulting Group
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt22012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2Boston Consulting Group
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt12012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1Boston Consulting Group
 
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesreesBoston Consulting Group
 
2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesrees2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesreesBoston Consulting Group
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBoston Consulting Group
 
Wide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interfaceWide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interfaceBoston Consulting Group
 
2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesreesBoston Consulting Group
 

More from Boston Consulting Group (13)

Python Blaze Overview
Python Blaze OverviewPython Blaze Overview
Python Blaze Overview
 
Making Data Analytics Awesome
Making Data Analytics AwesomeMaking Data Analytics Awesome
Making Data Analytics Awesome
 
Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...
 
SBGrid Science Portal - eScience 2012
SBGrid Science Portal - eScience 2012SBGrid Science Portal - eScience 2012
SBGrid Science Portal - eScience 2012
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt22012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt12012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
 
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
 
2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesrees2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesrees
 
Grid Computing Overview
Grid Computing OverviewGrid Computing Overview
Grid Computing Overview
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
 
Wide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interfaceWide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interface
 
2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

Anaconda Data Science Collaboration

Editor's Notes

  1. I’m going to start today by telling you about my background as a computational scientist, an area where I spent a decade partnering with scientists in areas from particle physics to molecular biology. I worked with those scientists to develop the computational models, systems, and simulations that allowed them advance the boundaries of human knowledge.
  2. So this is a personal story.
  3. About insights and discovery
  4. About numbers, computers, math, and science
  5. About the people who work together to achieve great things
  6. There is only one take away from this talk: success comes from team work. While that may seem like a truism the reality is that for a long time ”analytics” of various stripes has consisted of individuals working away in an assembly line fashion, taking inputs from the person before them, and outputting results to the next person. In my career I have used software such as Excel, Perl, and Matlab, outputting spreadsheets, PDFs and Power Point. I imagine many of you have been the recipient of the kind of work I’ve produced in the past: appreciative for its completeness and insights but unsure how to engage in a conversation to improve or adapt the results. Or worse, unable to recreate and extend the results quickly and easily the next time a similar situation arises.
  7. This is my electrical engineering class mudbowl team from 1996. See if you can spot me. I played football for 7 years and it shaped me as a person and my ideas about hard work, teams, leadership, and understanding how each person has an important role to play for success to be possible. I have spent the last 20 years of my life working on large scale data analysis and computational science problems and there has never been a time when there has been more opportunity for teams of people, each bringing their own skills and insights to the game, to be able to do amazing things together. So if there is a footnote to “Success comes from team work” it is this: Team work in data science means bringing together individuals with different backgrounds and abilities, who are able to collaborate in real-time, rapidly iterate their analysis, easily reproduce results, and scale their work from laptops to servers to clusters. I believe open data science is the only way to do that today.
  8. [Start with today and then move through a story to establish credibility, entertain, and build a case for collaborative data science with Anaconda.]
  9. 1997 to 1999, Master’s degree in large vocabulary speaker independent continuous speech recognition
  10. 1997 to 1999, Master’s degree in large vocabulary speaker indepdendent continuous speech recognition
  11. Do you think it makes sense to build a long running mission critical, high performance, distributed computing system in an interpreted and dynamically typed language? I sure didn’t, I thought these physicists had spent too much time playing with anti-matter and they’d annihilated the common sense part of their brains.
  12. What do you have without a lingua franca? [tower of babel] It is necessary to have common idioms, tools, and systems to facilitate communication and collaboration.
  13. Newton and Leibniz were 17th century renaissance thinkers who concurrently established the foundations of calculus to describe and analyze dynamic systems. History suggests that Newton used his influence to be credited as the creator of calculus at the time, however ultimately it is Leibnitz we have to thank for the foundations of calculus as we known it today. It was only with Leibnitz’s clear notation and presentation of calculus that the world was able to benefit. In contrast Newton’s calculus was esoteric and inaccessible.
  14. Data hermits work independently and have no accountability to anyone else. They can happily seclude themselves in a cottage off the grid and do their own thing in their own way. I will not deny it: sometimes this can be a path to innovation and enlightenment. But it can also be a path to isolation. Data high priests have established universal rule over data modeling and analysis. Their power comes from their control, and they exercise it behind closed doors. Few are admitted to this priesthood, as they guard their skills and responsibilities jealously, but in return deliver quantitative insights as the moons and seasons change. Of course these are both caricatures, but I am sure we’ve all seen aspects of the data hermit or the data high priest in people or organizations we’ve worked with.
  15. After I completed my PhD I spent a year at a French research institute working on models for parallel distributed option pricing before moving to Harvard Medical School and joining a structural biology lab that wanted to improve their computational techniques for protein structure determination. Here we're looking at a molecular dynamics simulation of the OGT enzyme common in mammals. It acts as a nutrient sensor and is involved with signallng metabolic behavior. OGT's role in metabolic regulation means that it is linked to diabetes, neuro-degenerative diseases, and cancers in cases where it misbehaves. I was not directly involved in this work, but my colleagues who were spent, collectively, many years working to determine the 3D structure of OGT in order to better understand its behavior. My contribution, in this particular case, was only to construct the MD simulation and produce this animation.
  16. In other words, how can we process data faster, reduce the computational time, and improve the quality of the results?
  17. Again, the answer comes from the key take-away of this talk: “Success comes from team work” Bringing together biochemists, data scientists, software engineers, and IT systems administrators it is possible to tackle these challenges.
  18. The title of this talk is: "Data Science Team Collaboration: Forget about meeting me half way, Take me the last mile" What does “half way” look like? First, “half-way” is a great start, so don’t feel badly if the following represents your reality. [GO THROUGH SLIDE] But where does that leave our biochemist trying to go from purified protein samples to a 3D molecular model and stuck on the computing part?
  19. And of course you can swap “biochemist” for “business analyst” or any other person or role you can think of. [DON’T READ SLIDE AGAIN]
  20. “Teams” do not equal “team work” Success doesn’t come from just a team of people with different skills, it comes from that team being able to work together collaboratively, in real-time, to iterate, each person applying their expertise.
  21. Then this is what it means to go ”the last mile”
  22. I heard Continuum’s founder, Travis Oliphant, give a talk at Supercomputing in 2012 where he described the vision for Continuum. It was a vision of collaborative, web-based, open data science. It was the embodiment of what I had spent the past decade doing on a one-off basis in computational physics, computational finance, and computational biology. I was hooked, so I left Harvard a few months later and joined Continuum to help make that vision a reality. You’ve heard a lot about Anaconda this week, and I hope you’ve taken time to speak with my colleagues who are providing demos of the many aspects of the product, platform, and larger ecosystem in the exhibit area. I’m going to finish my talk by providing you with the three step program to enable you to do collaborative data science with Anaconda.
  23. With millions of users, it’s the established way to put everyone onto the same page Available for Windows, Mac, and Linux, with quarterly releases and rolling updates of the 200 amazing tools and libraries that are included in Anaconda Without Anaconda it would take you days to weeks to re-create the same set of capabilities It is the gateway to Open Data Science. It is designed for a single user on a single system
  24. Notebooks supporting over 40 different language kernel, with the strongest support for Python and R