Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
2015 Healthcare Data Science
Practical Data Science: The WPC Healthcare Strategy for
Delivering Meaningful Data Science Pr...
Representative Clients
2
What’s the Problem?
3
A Common Scenario: Johnny Data Scientist
• Does not like working with others
• Too much black magic ...
Why Jupyter?
4
Interactive Computing Environment
• Notebook Web Application: Writing and running code
interactively
• Kern...
Why a Data Science Methodology?
5
Data Science Projects Involve Risk
• Strategically: Provides confidence to the business ...
Business Understanding
Uncover important factors at the Start
• Determine business objectives
• Assess situation
• Determi...
Exercise 1
7
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
Data Understanding
Become familiar with the data
• Collect initial data
• Describe data
• Explore data
• Verify data quali...
Exercise 2
9
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
Data Preparation
Construct the Final Dataset
• Select data
• Clean data
• Construct data
• Integrate data
• Format data
Da...
Exercise 3
11
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
Modeling
Various Modeling Techniques Are Selected
• Select modeling technique
• Generate test design
• Build model
• Asses...
Exercise 4
13
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
Evaluation
Review Your Steps with Certainty
• Evaluate results
• Review process
• Determine next steps
At the end of this
...
Exercise 5
15
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
Deployment
Make Use of The Model
• Plan deployment
• Plan monitoring and maintenance
• Produce final report
• Review proje...
Exercise 6
17
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
Data Scientist 2.0
18
Lead Analytically Your Organization
• Use Jupyter to document your process – real time – using
whate...
Have Questions?
E-mail: dmingle@wpchealthcare.com
Twitter: @damianmingle
LinkedIn: DamianRMingle
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful Data Science Projects
Upcoming SlideShare
Loading in …5
×

Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful Data Science Projects

525 views

Published on

Learning to make use of Jupyter to document your Data Science process - real time - and in whatever programming language you want! Using this methodology will allow you to provide insights that help your organization make better decisions to solve their business problems.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful Data Science Projects

  1. 1. 2015 Healthcare Data Science Practical Data Science: The WPC Healthcare Strategy for Delivering Meaningful Data Science Projects Damian Mingle @OPENDATASCI
  2. 2. Representative Clients 2
  3. 3. What’s the Problem? 3 A Common Scenario: Johnny Data Scientist • Does not like working with others • Too much black magic – not enough explanation • His process is always different • Uses multiple languages • Hates producing presentations • Constant unclear project status • Doesn’t capture business needs • Models aren’t production quality
  4. 4. Why Jupyter? 4 Interactive Computing Environment • Notebook Web Application: Writing and running code interactively • Kernels: Over 40 programming languages • Notebook Documents: Self-contained documents which include: Live code, Interactive widgets, Plots, Narrative text, Equations, Images, and Video
  5. 5. Why a Data Science Methodology? 5 Data Science Projects Involve Risk • Strategically: Provides confidence to the business that Data Science projects can be delivered profitably • Tactically: Management can understand status assessments • Operationally: Empowers the Data Science team to do the right thing, the right way, the first time.
  6. 6. Business Understanding Uncover important factors at the Start • Determine business objectives • Assess situation • Determine data science goals • Produce project plan Understand the Data Science project objectives and requirements from a business perspective. Then convert this knowledge into a Data Science problem definition and preliminary plan designed to achieve the objective. 6
  7. 7. Exercise 1 7 https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
  8. 8. Data Understanding Become familiar with the data • Collect initial data • Describe data • Explore data • Verify data quality Identify data quality problems, discover first insights into the data, and/or detect interesting subsets to form hypotheses regarding hidden information. 8
  9. 9. Exercise 2 9 https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
  10. 10. Data Preparation Construct the Final Dataset • Select data • Clean data • Construct data • Integrate data • Format data Data Science task in this phase have to do with selection of table, record, and attributes. In addition, transformation and cleaning of data. 10
  11. 11. Exercise 3 11 https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
  12. 12. Modeling Various Modeling Techniques Are Selected • Select modeling technique • Generate test design • Build model • Assess model In this phase, calibrating parameters is important. Some techniques may require the Data Scientist to go back to the data preparation phase. 12
  13. 13. Exercise 4 13 https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
  14. 14. Evaluation Review Your Steps with Certainty • Evaluate results • Review process • Determine next steps At the end of this phase, a decision on the use of the data science results should be reached. 14
  15. 15. Exercise 5 15 https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
  16. 16. Deployment Make Use of The Model • Plan deployment • Plan monitoring and maintenance • Produce final report • Review project This phase can be as simple as generating a report or as complex as implementing a repeatable data science process across the enterprise. 16
  17. 17. Exercise 6 17 https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
  18. 18. Data Scientist 2.0 18 Lead Analytically Your Organization • Use Jupyter to document your process – real time – using whatever language you want! • Establish a Data Science Methodology that is comprehensive • Provide insights that help the organization make better decisions to solve their business problems
  19. 19. Have Questions? E-mail: dmingle@wpchealthcare.com Twitter: @damianmingle LinkedIn: DamianRMingle

×