SlideShare a Scribd company logo
1 of 17
Building a Data-Driven WorldTM
Open Data Science Conference
A Hybrid Approach to Data
Science Project Management
Elaine Lee
elee@civisanalytics.com
@elaineklee
2Open Data Science Conference#ODSC
Organizations want to be data-driven but many obstacles stand in their way:
• Communication not trickling up to executives and key decision makers
• Silos between departments, making it difficult to share and collaborate on
analysis
• Data ingestion (ETL or Extract-Transform-Load) is difficult and time-consuming
• Lack of meaningful, yet customizable visual reporting
• Inability to flexibly scale up or down technological needs at a reasonable cost
• Inadequate or overwhelming learning resources about data science
A Common Problem With Many Faces
3Open Data Science Conference#ODSC
Where should Enroll America direct its insurance signup efforts?
Mapping the Uninsured in America
4Civis Analytics | Proprietary and Confidential
As a company, Civis traces its
origins to the 2012 Obama for
America analytics team.
We built a scientific
understanding of each voter.
Our data science influenced
every strategy and tactic: voter
targeting, messaging, media
buys, and fundraising.
This meant the campaign could
allocate resources where impact
would be greatest.
We ran the first
individualized
presidential
campaign
Civis Analytics | Proprietary and Confidential Open Data Science Conference#ODSC
5Civis Analytics | Proprietary and Confidential
Today, we
leverage data
science to help
our clients in
politics, non-
profits, and the
corporate world.
Civis Analytics | Proprietary and Confidential Open Data Science Conference#ODSC
Open Data Science Conference#ODSC Open Data Science Conference#ODSC
An easy-to-use,
end-to-end, incredibly
extendable, data science
platform in the cloud for
teams who want to make
great data-driven decisions
to drive their organizations
forward.
Introducing
Civis
7Open Data Science Conference#ODSC
The Civis Approach
ProductConsulting R&D
Applied Data Science
• Tackles the toughest data
science problems we can
find
Data Science R&D
• Generalizes and
automates the solution for
many scenarios
Software Engineering
• Integrates solutions into
user-empowering software
• Highly collaborative departments
• All departments contribute to both our services arm and product development
8Open Data Science Conference#ODSC
The Civis Approach
Our unique team structure allows
us to solve your biggest problems
with custom solutions and the
technology to scale them.
9Open Data Science Conference#ODSC
Strategies and philosophies
• Teams based on Civis’s product and consulting needs:
• “Built around code”
• Semi-annual departmental day-long off-sites to plan upcoming R&D initiatives
• Academia-influenced: evidence-based approaches to finding and reporting best
solutions
• Software development-influenced: standups, code review
• Favorite tools:
Data Science R&D
R&D
Modeling
Methodology
Unstructured
Data
Engineering
10Open Data Science Conference#ODSC
Tools
• Share and discuss data science news
• Receive feedback from colleagues
using our tools
• Discuss implementation
• Lower communication costs compared
to email
Data Science R&D
11Open Data Science Conference#ODSC
Tools
• Prototype new workflows
• Used like a log book to record and
present results
• Share preliminary results with
members of other departments
Data Science R&D
12Open Data Science Conference#ODSC
Tools
• Department heads set milestones,
check progress, and make project
staffing decisions
• Collaboratively plan development on
new functionality or organizational
processes (e.g. recruiting)
Data Science R&D
13Open Data Science Conference#ODSC
Tools
Strategies
• Designate “tag team” on R&D as
default R&D resources for client
engagements
• This is the Modeling Methodology
team
• Other R&D teams’ members may be
staffed on engagements depending on
expertise required
• R&D team member always serves as the
Consulted in the RACI model
• Transparency about challenges is
paramount
R&D <-> ADS
14Open Data Science Conference#ODSC
1. Assemble a project team of R&D data
scientists and Applied Data Scientists
2. Work with Enroll America to refine
requirements and come up with a plan
of analysis, ultimately resulting in the
design and execution of a phone
survey on a sample of individuals,
followed by building a predictive
model for the rest of the country.
3. The Applied Data Science Manager
has weekly calls with Enroll America
and status meetings with the project
team.
4. The project team delivers the
predictions and analysis to Enroll
America.
R&D <-> ADS: A Case Study
Mapping the Uninsured in America
The project team completes a postmortem
and determines these activities could be
automated: model building
15Open Data Science Conference#ODSC
Tools
Strategies
• Designate teams at the interface to
triage issues and plan new
development:
• R&D: “Engineering” team
• Tech: “Modeling” team
• Use module or project-specific chatrooms
to get answers to ad-hoc questions
quickly
• Identify opportunities to form cross-
functional teams, e.g.:
• Developing apps using the Platform’s
API
• Knowledge sharing on best practices
R&D <-> Tech
16Open Data Science Conference#ODSC
1. After the postmortem for the Enroll
America engagement, R&D begins
prototyping automated modeling
functionality and discussing its
implementation with the Tech
department.
2. R&D’s Engineering team finishes the
prototype and works with Tech’s
Modeling team to integrate it as a new
feature in the Platform.
3. During integration, ad hoc
discussions occur on GitHub and
Hipchat to address usability
questions, e.g. resource usage and
input/output specifications.
R&D <-> Tech: A Case Study
Mapping the Uninsured in America
The integration team successfully builds
and integrates the Build Model module in
the Platform.
Open Data Science Conference#ODSC
Our approach to data science consulting and product development
is enriched by valuable perspectives of our employees, who come
from a wide array of backgrounds, making our project management
strategies a hybrid of more conventional techniques.
Conclusion

More Related Content

What's hot

A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science processMathieu d'Aquin
 
Applied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up SeattleApplied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up SeattleDomino Data Lab
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centuryFrank Kienle
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in dataDavid Rostcheck
 
AI Orange Belt - Session 4
AI Orange Belt - Session 4AI Orange Belt - Session 4
AI Orange Belt - Session 4AI Black Belt
 
Developing cognitive applications v1
Developing cognitive applications v1Developing cognitive applications v1
Developing cognitive applications v1Harsha Srivatsa
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Ethics of Analytics and Machine Learning
Ethics of Analytics and Machine LearningEthics of Analytics and Machine Learning
Ethics of Analytics and Machine LearningMark Underwood
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Domino Data Lab
 
Omg co p proactive computing oct 2010
Omg co p   proactive computing oct 2010Omg co p   proactive computing oct 2010
Omg co p proactive computing oct 2010Opher Etzion
 
The State of Australian AI 2022
The State of Australian AI 2022The State of Australian AI 2022
The State of Australian AI 2022Jon Whittle
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
AI Yellow Belt - Day 1 - case by Sagacify
AI Yellow Belt - Day 1 - case by SagacifyAI Yellow Belt - Day 1 - case by Sagacify
AI Yellow Belt - Day 1 - case by SagacifyAI Black Belt
 
How to perform Secure Data Labeling for Machine Learning
How to perform Secure Data Labeling for Machine LearningHow to perform Secure Data Labeling for Machine Learning
How to perform Secure Data Labeling for Machine LearningSkyl.ai
 
Towards the Industrialization of AI
Towards the Industrialization of AITowards the Industrialization of AI
Towards the Industrialization of AIHui Lei
 
Understanding Cognitive Applications: A Framework - Sue Feldman
Understanding Cognitive Applications:  A Framework - Sue FeldmanUnderstanding Cognitive Applications:  A Framework - Sue Feldman
Understanding Cognitive Applications: A Framework - Sue Feldmandiannepatricia
 
Smart Data Webinar: A Roadmap for Deploying Modern AI in Business
Smart Data Webinar: A Roadmap for Deploying Modern AI in BusinessSmart Data Webinar: A Roadmap for Deploying Modern AI in Business
Smart Data Webinar: A Roadmap for Deploying Modern AI in BusinessDATAVERSITY
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 

What's hot (20)

A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
 
Applied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up SeattleApplied Machine Learning for the IoT - Data Science Pop-up Seattle
Applied Machine Learning for the IoT - Data Science Pop-up Seattle
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
AI Orange Belt - Session 4
AI Orange Belt - Session 4AI Orange Belt - Session 4
AI Orange Belt - Session 4
 
Developing cognitive applications v1
Developing cognitive applications v1Developing cognitive applications v1
Developing cognitive applications v1
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Ethics of Analytics and Machine Learning
Ethics of Analytics and Machine LearningEthics of Analytics and Machine Learning
Ethics of Analytics and Machine Learning
 
Data science - An Introduction
Data science - An IntroductionData science - An Introduction
Data science - An Introduction
 
QAI brochure
QAI brochureQAI brochure
QAI brochure
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Omg co p proactive computing oct 2010
Omg co p   proactive computing oct 2010Omg co p   proactive computing oct 2010
Omg co p proactive computing oct 2010
 
The State of Australian AI 2022
The State of Australian AI 2022The State of Australian AI 2022
The State of Australian AI 2022
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
AI Yellow Belt - Day 1 - case by Sagacify
AI Yellow Belt - Day 1 - case by SagacifyAI Yellow Belt - Day 1 - case by Sagacify
AI Yellow Belt - Day 1 - case by Sagacify
 
How to perform Secure Data Labeling for Machine Learning
How to perform Secure Data Labeling for Machine LearningHow to perform Secure Data Labeling for Machine Learning
How to perform Secure Data Labeling for Machine Learning
 
Towards the Industrialization of AI
Towards the Industrialization of AITowards the Industrialization of AI
Towards the Industrialization of AI
 
Understanding Cognitive Applications: A Framework - Sue Feldman
Understanding Cognitive Applications:  A Framework - Sue FeldmanUnderstanding Cognitive Applications:  A Framework - Sue Feldman
Understanding Cognitive Applications: A Framework - Sue Feldman
 
Smart Data Webinar: A Roadmap for Deploying Modern AI in Business
Smart Data Webinar: A Roadmap for Deploying Modern AI in BusinessSmart Data Webinar: A Roadmap for Deploying Modern AI in Business
Smart Data Webinar: A Roadmap for Deploying Modern AI in Business
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 

Similar to A Hybrid Approach to Data Science Project Management

The Download: Tech Talks by the HPCC Systems Community, Episode 12
 The Download: Tech Talks by the HPCC Systems Community, Episode 12 The Download: Tech Talks by the HPCC Systems Community, Episode 12
The Download: Tech Talks by the HPCC Systems Community, Episode 12HPCC Systems
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryMark Constable
 
The Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development ProcessesThe Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development ProcessesChristoph Matthies
 
IT Application Development - with SDLC.pptx
IT Application Development - with SDLC.pptxIT Application Development - with SDLC.pptx
IT Application Development - with SDLC.pptxdjualaja88
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with DatabricksGrega Kespret
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teamsVenkatesh Umaashankar
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Crafting a Compelling Data Science Resume
Crafting a Compelling Data Science ResumeCrafting a Compelling Data Science Resume
Crafting a Compelling Data Science ResumeArushi Prakash, Ph.D.
 
Capstone Presentation 2015 - Quality+
Capstone Presentation 2015 - Quality+Capstone Presentation 2015 - Quality+
Capstone Presentation 2015 - Quality+Eric M. Pastore
 
Uncovering Emerging Information Trends in Information Technology
Uncovering Emerging Information Trends in Information TechnologyUncovering Emerging Information Trends in Information Technology
Uncovering Emerging Information Trends in Information TechnologyEric M. Pastore
 
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...panagenda
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research softwareShoaib Sufi
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
 
Data Visualization in Health
Data Visualization in HealthData Visualization in Health
Data Visualization in HealthRamon Martinez
 
Smart source usa ppt
Smart source usa pptSmart source usa ppt
Smart source usa pptbonafied
 

Similar to A Hybrid Approach to Data Science Project Management (20)

The Download: Tech Talks by the HPCC Systems Community, Episode 12
 The Download: Tech Talks by the HPCC Systems Community, Episode 12 The Download: Tech Talks by the HPCC Systems Community, Episode 12
The Download: Tech Talks by the HPCC Systems Community, Episode 12
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
 
The Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development ProcessesThe Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development Processes
 
IT Application Development - with SDLC.pptx
IT Application Development - with SDLC.pptxIT Application Development - with SDLC.pptx
IT Application Development - with SDLC.pptx
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
DMP & DMPonline
DMP & DMPonlineDMP & DMPonline
DMP & DMPonline
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Crafting a Compelling Data Science Resume
Crafting a Compelling Data Science ResumeCrafting a Compelling Data Science Resume
Crafting a Compelling Data Science Resume
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Capstone Presentation 2015 - Quality+
Capstone Presentation 2015 - Quality+Capstone Presentation 2015 - Quality+
Capstone Presentation 2015 - Quality+
 
Uncovering Emerging Information Trends in Information Technology
Uncovering Emerging Information Trends in Information TechnologyUncovering Emerging Information Trends in Information Technology
Uncovering Emerging Information Trends in Information Technology
 
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
CollabSphere 2020 - ANA101 - Domino Application Strategy Key insights for suc...
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research software
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
Data Visualization in Health
Data Visualization in HealthData Visualization in Health
Data Visualization in Health
 
Smart source usa ppt
Smart source usa pptSmart source usa ppt
Smart source usa ppt
 

Recently uploaded

Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Recently uploaded (20)

Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

A Hybrid Approach to Data Science Project Management

  • 1. Building a Data-Driven WorldTM Open Data Science Conference A Hybrid Approach to Data Science Project Management Elaine Lee elee@civisanalytics.com @elaineklee
  • 2. 2Open Data Science Conference#ODSC Organizations want to be data-driven but many obstacles stand in their way: • Communication not trickling up to executives and key decision makers • Silos between departments, making it difficult to share and collaborate on analysis • Data ingestion (ETL or Extract-Transform-Load) is difficult and time-consuming • Lack of meaningful, yet customizable visual reporting • Inability to flexibly scale up or down technological needs at a reasonable cost • Inadequate or overwhelming learning resources about data science A Common Problem With Many Faces
  • 3. 3Open Data Science Conference#ODSC Where should Enroll America direct its insurance signup efforts? Mapping the Uninsured in America
  • 4. 4Civis Analytics | Proprietary and Confidential As a company, Civis traces its origins to the 2012 Obama for America analytics team. We built a scientific understanding of each voter. Our data science influenced every strategy and tactic: voter targeting, messaging, media buys, and fundraising. This meant the campaign could allocate resources where impact would be greatest. We ran the first individualized presidential campaign Civis Analytics | Proprietary and Confidential Open Data Science Conference#ODSC
  • 5. 5Civis Analytics | Proprietary and Confidential Today, we leverage data science to help our clients in politics, non- profits, and the corporate world. Civis Analytics | Proprietary and Confidential Open Data Science Conference#ODSC
  • 6. Open Data Science Conference#ODSC Open Data Science Conference#ODSC An easy-to-use, end-to-end, incredibly extendable, data science platform in the cloud for teams who want to make great data-driven decisions to drive their organizations forward. Introducing Civis
  • 7. 7Open Data Science Conference#ODSC The Civis Approach ProductConsulting R&D Applied Data Science • Tackles the toughest data science problems we can find Data Science R&D • Generalizes and automates the solution for many scenarios Software Engineering • Integrates solutions into user-empowering software • Highly collaborative departments • All departments contribute to both our services arm and product development
  • 8. 8Open Data Science Conference#ODSC The Civis Approach Our unique team structure allows us to solve your biggest problems with custom solutions and the technology to scale them.
  • 9. 9Open Data Science Conference#ODSC Strategies and philosophies • Teams based on Civis’s product and consulting needs: • “Built around code” • Semi-annual departmental day-long off-sites to plan upcoming R&D initiatives • Academia-influenced: evidence-based approaches to finding and reporting best solutions • Software development-influenced: standups, code review • Favorite tools: Data Science R&D R&D Modeling Methodology Unstructured Data Engineering
  • 10. 10Open Data Science Conference#ODSC Tools • Share and discuss data science news • Receive feedback from colleagues using our tools • Discuss implementation • Lower communication costs compared to email Data Science R&D
  • 11. 11Open Data Science Conference#ODSC Tools • Prototype new workflows • Used like a log book to record and present results • Share preliminary results with members of other departments Data Science R&D
  • 12. 12Open Data Science Conference#ODSC Tools • Department heads set milestones, check progress, and make project staffing decisions • Collaboratively plan development on new functionality or organizational processes (e.g. recruiting) Data Science R&D
  • 13. 13Open Data Science Conference#ODSC Tools Strategies • Designate “tag team” on R&D as default R&D resources for client engagements • This is the Modeling Methodology team • Other R&D teams’ members may be staffed on engagements depending on expertise required • R&D team member always serves as the Consulted in the RACI model • Transparency about challenges is paramount R&D <-> ADS
  • 14. 14Open Data Science Conference#ODSC 1. Assemble a project team of R&D data scientists and Applied Data Scientists 2. Work with Enroll America to refine requirements and come up with a plan of analysis, ultimately resulting in the design and execution of a phone survey on a sample of individuals, followed by building a predictive model for the rest of the country. 3. The Applied Data Science Manager has weekly calls with Enroll America and status meetings with the project team. 4. The project team delivers the predictions and analysis to Enroll America. R&D <-> ADS: A Case Study Mapping the Uninsured in America The project team completes a postmortem and determines these activities could be automated: model building
  • 15. 15Open Data Science Conference#ODSC Tools Strategies • Designate teams at the interface to triage issues and plan new development: • R&D: “Engineering” team • Tech: “Modeling” team • Use module or project-specific chatrooms to get answers to ad-hoc questions quickly • Identify opportunities to form cross- functional teams, e.g.: • Developing apps using the Platform’s API • Knowledge sharing on best practices R&D <-> Tech
  • 16. 16Open Data Science Conference#ODSC 1. After the postmortem for the Enroll America engagement, R&D begins prototyping automated modeling functionality and discussing its implementation with the Tech department. 2. R&D’s Engineering team finishes the prototype and works with Tech’s Modeling team to integrate it as a new feature in the Platform. 3. During integration, ad hoc discussions occur on GitHub and Hipchat to address usability questions, e.g. resource usage and input/output specifications. R&D <-> Tech: A Case Study Mapping the Uninsured in America The integration team successfully builds and integrates the Build Model module in the Platform.
  • 17. Open Data Science Conference#ODSC Our approach to data science consulting and product development is enriched by valuable perspectives of our employees, who come from a wide array of backgrounds, making our project management strategies a hybrid of more conventional techniques. Conclusion

Editor's Notes

  1. Hi everyone, it’s great to be here. My name is Elaine Lee. I am a Data Scientist in the R&D department at Civis Analytics. Civis is a Chicago-based data science consulting and software startup, and I’m excited to tell you a little bit about our company and the work that we do. In particular, I’ll be talking about how the R&D department juggles concurrent development of both our consulting services and our cloud-based data science platform. I’ll be emphasizing approaches borrowed from other more established industries as it pertains to department projects as well as interdepartmental collaborations.
  2. Many of you are already familiar with data science and the potential it has to change the way things are done. However, data science has a high barrier of entry for some teams, from a technical standpoint and organizational standpoint. It can be difficult to wrap your head around the technical needs and quantitative concepts that go into data science. In addition, it can be hard to assemble the right team to do data science and to keep the work organized. Picture a team of data scientists working on the same project. Some of them have written R or Python scripts to process the data, do feature engineering, and build models on it. Some of them have taken the results of the models and produced charts and visualizations in Excel, Tableau, or D3. All the work is being kept in a few different places – Dropbox, Google Drive, Github, MySQL, … It is difficult for this hypothetical team to figure out what exactly has been done, and even worse, what efforts have been duplicated. It is also incredibly difficult to validate the analysis. Does this sound familiar to anyone? Fortunately, many of us at Civis Analytics have faced these challenges in our previous work, but we’ve made those challenges a thing of the past! It didn’t happen overnight, but we were constantly coming up with new ideas to improve the data science workflow by, well, working on a variety of consulting projects and researching new methods. Today I will talk about what some of these ideas are. In addition, I will tell the story of how one client engagement provided us a valuable exercise in collaboration and data science best practices we’ve internalized.
  3. Throughout my talk today, I will be using our project with Enroll America to illustrate a lot my concepts. Enroll America was one of our first clients in 2013. They wanted our help identifying Americans without health insurance so they knew where to direct their outreach. This was a challenging problem because of its large scope – they want to do outreach throughout the country! – and it wasn’t obvious what’s predictive of being uninsured. Why did Enroll America specifically seek us out to solve this problem?
  4. Let’s talk a little about what expertise Civis has for tackling problems like Enroll America’s. The founding members of Civis Analytics were part of Obama For America’s analytics team in his 2012 re-election campaign. There, we developed the beginnings of a framework for doing person-level analytics (which is highly relevant for Enroll America). With scientific levels of rigor, we built models to understand all sorts of relevant vote-related behaviors in order to better identify and persuade supporters, which translated to optimizing how the campaign’s resources were used. The campaign spanned many months and during that time, lots of models were being built and refined; their results were constantly being sent to those in the field to take action upon. Developing an organized and repeatable workflow was especially crucial in order to minimize costs, time spent – especially since the staff was small, and any inadvertent human error, especially when models are built at such a large scale.
  5. After the campaign ended in 2012, we re-examined the strategies we employed and the problems they solved. We realized that if we generalized them, we could solve similar problems for clients in the political, non-profit, and corporate worlds. Which is exactly what Civis did. What you see here is a sample of clients, in addition to Enroll America, that we have helped better target their advertising dollars, identify potential customers for greener sources of electricity, and determine public awareness and sentiment on their brand or cause. In the past year, we took it a step further and we formed a partnership with Discovery Communications to inform more sophisticated audience targeting approaches, ratings forecasting, and marketing spend. We anticipate making more partnerships like this in the future. The examples I gave are all problems with a similar flavor to what Civis successfully solved in 2012 – identifying and reaching the people you care about most.
  6. Our diverse client portfolio, innovative approaches, and proven track record have made Civis Analytics’ consulting services highly sought after in the predictive analytics space. However, we’re equally passionate about removing obstacles to doing data science. Our steady client pipeline enables us to formalize our approach in the form of a cloud-based data science application. Our software, Civis, or “the Platform”, supports the entire workflow of a typical data science project, from data warehousing to data processing to predictive modeling to reporting. This enables organizations to easily take control of their own data and unlock their insights.
  7. This is how we turn our client work experiences into software. We select novel problems brought forth by our clients and work with them to deliver a solution. This is primarily addressed by our Applied Data Science department. Simultaneous to this, we’ve been conducting research and experimenting with different methods to solve the problem, with one eye towards determining how to generalize the solution. This is primarily done by the Data Science R&D department. Finally, solutions are integrated into our software platform by the Software Engineering, or Tech, department. Users of our software platform – clients and our Applied Data Scientists – provide us valuable feedback which are continuously incorporated. This unique, synergistic cycle enables us to deliver high quality results to our customers.
  8. In our day-to-day work, all departments pitch in on both lines of business, ensuring fluency on all the company’s offerings and thus better decision making. We also collaborate across departments on all projects, big or small. Today I will be focusing on how my department, the DS R&D department, manages its workload and how it works with the Applied Data Science and Tech departments.
  9. The R&D department is the only department that is intimately aligned with both lines of business. We’re split into 3 different teams. Modeling Methodology focuses on developing new modeling workflows. Unstructured data specializes in data that can’t neatly be summarized by a flat file, like text data. Engineering is responsible for managing our production codebases of new features for our software product. Our department is “built around code”: “We're trying to build up knowledge and best practices, and being built around code lowers our communication costs, errors, redundancy, and facilitates us making software.” To roadmap what we build, based on what we’ve learned from recent client engagements, we have day-long semi-annual department off-sites. When developing new methodologies, we use an academic-influenced approach – empirical and thorough such that our recommended solution covers all the edge cases. When building out workflows, we follow guidelines common to most software development projects, including some ideas from the Agile methodology – we have daily standups to make sure everyone’s on the same page about the status of the codebases and we do code reviews before any changes are shipped. Our standups are on a per-repository basis, so it doesn’t waste anyone’s time. To do our work, these are our favorite tools. Let’s take a look at how we use them.
  10. Hipchat and Github form the backbone of our communications. To those not familiar with these tools, Hipchat is an instant messaging tool for organizations. Github is a web interface, built on top of the version control system, git, for teams to collaborate on a codebase. These tools are crucial to our philosophy on being built around code They enable members across the company to participate by asking questions and generally weighing in Departmental members use it to discuss implementation These tools are much faster than email since it makes it easier to ask questions and get answers, since anyone who knows the answer can see the request and thus respond.
  11. When developing new methods, we like to use Jupyter and Google Drive. We use Jupyter for its Ipython Notebook capabilities. It allows us to run Python code, especially modules from our codebase, interactively – it allows us to chain components together to make new workflows. Jupyter also has presentation functionality, so we also use it as a log book to record and present results in internal meetings. Sometimes we also use Google Drive to record and share results with members of other departments, such as Applied Data Scientists, who have a vested interest in the project but don’t require all the details.
  12. Finally, to take the “pulse” on the R&D department as a whole, department heads use Google Drive and Asana for big picture planning. Asana is a project management tool which gives department heads a birds eye view of what each team member is working on and how each project is progressing. Google Drive tools are used to collaborate on planning documents, be it plans for new functionality to build or revising organizational processes, such as rewriting our hiring exam.
  13. That was how we, the R&D department, work together. How do we work with the Applied Data Scientists, the data scientists in our consulting arm? To make project staffing seamless, we designate a tag team to serve as the first point of contact for client engagements. This is the Modeling Methodology team. However, other R&D data scientists may be staffed on a project depending on expertise required. The R&D data scientist always serves as the Consulted in the RACI model. The RACI model is a popular project management model used in consulting. It emphasizes explicit roles for each team member to ensure accountability. R is for Responsible, a role held by the applied data scientists. A is for Accountable; this is the Applied Data Science Manager or project manager C is for consulted. And I is for Informed (the client) Lastly, we are open with Applied Data Scientists about R&D challenges in order to avoid schedule slips on the client engagement. The project plan is often tracked in Trello, a popular bulletin board app, with bulletin boards for each milestone’s requirements.
  14. Let’s revisit our client story – Mapping the Uninsured in America – to illustrate concretely how we work together. After Enroll America shared their problem to us, we assembled a project team of R&D data scientists and Applied data scientists to solve it. We worked with Enroll to refine the problem statement into a set of requirements, ultimately resulting in the design and execution of a phone survey on a sample of individuals, followed by building a model to capture the rest of the country. The project gets under way. Throughout the project, the Applied Data Science Manager has weekly status calls with Enroll and with the project team to make sure we’re on schedule. Occasionally we staffed a couple extra data scientists to the project to make sure we delivered results on time when there was risk of a schedule slip. For example, we brought in an extra data scientist towards the end of the project to help produce graphs and visualizations of the results. Finally, we finished our analysis and presented our predictions to Enroll America. Afterwards, we did a post mortem and realized that automated model building would’ve made us more efficient. This is because we conducted our experiment in waves and built similar models as the results came in, with the only difference being the input data. Also, the analysts were each working on individual components of the analysis, writing their own R scripts which had a lot of overlap (such as the data processing steps), which meant a lot of time was wasted.
  15. So that’s how we work with the Applied Data Scientists on consulting projects. How do we work with the Tech department? Much like how we work with the Applied Data Science department, we’ve designated a team to interface with the Tech department and they have as well. That would be the Engineering team on our side and the Modeling team on their side. The Engineering team in Data Science are data scientists who speak software development and the Modeling team in the Tech department are software engineers who speak data science. Most of our communications are done using module or project-specific chatrooms and github issue tickets, which gets answers quickly. To promote really inspired product development, we identify opportunities to form cross-functional teams, Such as using the Platform’s API to develop new apps And teaching each other best practices for software development via brownbag sessions.
  16. Let’s revisit the Enroll America project for an example of how the R&D data scientists work with the software engineers. After the post mortem for the Enroll engagement, we began prototyping automated modeling functionality, communicating to the Tech department the motivation for it and including them in discussions about implementation and feasibility. Once we finish the prototype, ensuring that it passes all the tests and code review, the Engineering team in R&D work with the Modeling team in Tech to integrate it as a new feature in the Platform. We use Github and Hipchat to discuss questions that come up, such as resource usage, input/output specifications, and data visualizations we wanted to provide to the end user. Together, the R&D department and the Tech department successfully built and integrated the Build Model module that exists today in Platform.
  17. In summary, a lot of our approaches have a common theme, which is minimizing communication costs within the R&D department and with other departments. This is evidenced by our embrace of some free or open-source tools for collaboration and our general belief in transparency about challenges. We also emphasize collaborative opportunities between departments to strengthen our cohesiveness as a team, be it working on a client engagements together or learning best practices in a seminar format. A lot of our ideas come from the valuable perspectives of our employees, who come from a wide array of backgrounds. Thus, our project management strategies are a hybrid of techniques seen in more established industries such as software engineering, consulting, and academia. I hope the tips presented in my talk today has made doing data science more manageable for your team. Thank you for your time.