SlideShare a Scribd company logo
1 of 20
Agile Data Science
INFORMS BIG DATA CONFERENCE
6/23/2013
!
Presented by Joel S Horwitz
Follow me @JSHorwitz
Alpine Data Labs
Agile Manifesto
We are uncovering better ways of developing software by doing it and
helping others do it. Through this work we have come to value:
!
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
!
That is, while there is value in the items on the right, we value the items on
the left more.
We are uncovering better ways of developing software models by doing it
and helping others do it. Through this work we have come to value:
!
Individuals and interactions over processes and tools
Working software models over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
!
That is, while there is value in the items on the right, we value the items on
the left more.
Agile Manifesto
Business Analytics Technologist
Linear workflows in non-agile culture
????
Business Owner
Business Analyst
Data Owner
Data Scientist
Business Stakeholders
< / >
001011
= 5
= 10
Agile is about continuous interactions
Analytics
• Feature Creation
• Model / Scoring
• Evaluation
Technologist
• Wrangling
• Interpretation
• Pipelines
Business
• Presentation
• Deployment
• Productize
Minimally Viable Data Products (MVDP)
crossfilter.js or D3.JS
Tableau
Dashboards
Product
Recommendations
Also, Product
Recommendations
People
Recommendations
One model, many use cases.
Agile Data Science Feedback Loop
Instrumentation
/ Data Collection
Analysis &
Design
Results &
Interpretation
Cultivating
Data Intuition
What do you need?
1. Business Champion
2.Integrated Environment
3. Analytics Ninjas
1. Business Champion(s): Defining the Problem
Executive Sponsor(s) who have a vested interest… ready to take action from results, has an impact to their business goals.
Chief Technology Officer
SVP of Product
Chief Marketing Officer
VP of Sales
Problem statement… Monthly active user count
growth was stagnating. Evidence of where to look…
acquisition funnel, user engagement, and customer loyalty.
Question to answer… How
do I connect each part of
the acquisition funnel?
We started with the acquisition funnel… web visits, downloads, installs, and activations.
2. Environment: Before
• Many data silos and technology limitations
• Data definitions not well defined.
• Multiple data formats.
• No place to build MVDPs at scale.
2. Environment: After
Open Source Technologies
Flume, Sqoop, Oozie, and others
Analytics Sandbox
Relational
Database
Minimally Viable Data Products
Selectively include data
that is Relevant
How was our analytics platform deployed and
connected to the network of systems.
First there was FTP… dump raw log files from web analytics, content delivery network, and
application log files.
!
Second there was MS SQL… Most of the data was in Microsoft SQL databases.
!
Third there was Hadoop (with Hive)… Engineering and Development backup to Hadoop.
!
Now there is web based analytics…
What is Hadoop?
• Overview: Apache Hadoop is a framework for running applications on large cluster built of
commodity hardware.
• Storage: Hadoop Distributed File System (HDFS™) is the primary storage system used by
Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on
compute nodes throughout a cluster to enable reliable, extremely rapid computations.
• Applications:  Apache Hive is a large scale Data Warehouse system and Apache Mahout is a
machine learning system.
3. Analytics Ninjas
Build a TIGER Team… subject matter expert, modeler, technologist, and storyteller
Curiosity
Willingness to learn
Resourceful
Risk Takers
!
Schedule Standups… daily is best depending on the project it can vary
Take notes!
Open dialogue (peer review)
Share knowledge
!
Centralize and Version Control Your work… files, code, knowledge, and data lineage are key to success
Create a Wiki
Work backwards from the end goal to the analysis to the data
Share, share, and share! Stand on the shoulder of Giants! Why start new, there is plenty of boilerplate to go around.
What is Analytics?
Analytics is the application of computer technology, operational research, and statistics to solve problems in business
and industry.
!
Historically, Analytics was heavily used in banking for portfolio assessment using social status, geographical location, net
value, and many other factors.
!
Today, Analytics is applied to a vast number of industries and is re-emerging due to the phenomenal explosion of data
from our connected world.
!
Big Data consists of data sets that grow so large and complex that they become awkward to work with using on-hand
database management tools
!
McKinsey Global Institute estimates that big data analysis could save the American health care system $300 billion per
year and the European public sector €250 billion.
Analytics Example: Platform Monetization
Search Lifetime Value
• 3rd Party Distribution
• SEM
• Organic
Traffic Quality Analysis
• PPC
• PPD
• PPI
• PPA
1. Business Champion
• Sales & Product
2.Integrated Environment
• Web analytics / app data and SQL
Database.
3. Analytics Ninjas
• Search engine marketers, business
analysts, and statisticians.
Examples of Analytics: Campaign Management
1. Business Champion
• Sales, Product, Marketing, and Project
Managers.
2.Integrated Environment
• Data Warehouse, SQL DB, Hadoop, and
Tableau
3. Analytics Ninjas
• Web analytics, product managers,
business analysts, and business
intelligence.
Examples of Analytics: Customer Sentiment
Keywords by Ratings
1. Business Champion
• Product and Marketing
2.Integrated Environment
• Mobile analytics, app logfiles,
and Hadoop.
3. Analytics Ninjas
• Web analytics, product
managers, business analysts,
mobile developers.
What to avoid
Its all about plugins for web analytics… Google Analytics, App Stores (iOS, Android, others), Social (Twitter, Facebook
FQL, others?).
Analysis lifecycle…
1. Give me data… import data and ETL it into submission.
2. What does this data mean… Crunch, Blend, Join, Pivot, Predict, Count, Map, or whatever
3. Show and Tell… Static (powerpoint, excel, etc.) or dynamic (trends, filtering, drilldown). (No more dashboards)...
known knowns = data puking
(dashboards)
Interestingness rocks! Tell me where to look (I’m
feeling lucky…)
get people to "make love to data" to make
actual good use of it
Viva la revolucion! Data democracy!
Thank you! Any Questions?
Want to jump start your Agile Data Science project? Head over to
http://start.alpinenow.com
Follow me on @JSHorwitz

More Related Content

What's hot

8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science Mahesh Kumar CV
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big DataRevolution Analytics
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data CircleDataiku
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big DataDataWorks Summit
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 

What's hot (20)

8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 

Viewers also liked

Key performance indicators in professional service firms
Key performance indicators in professional service firmsKey performance indicators in professional service firms
Key performance indicators in professional service firmstransentis consulting
 
Target Operating Model Research
Target Operating Model ResearchTarget Operating Model Research
Target Operating Model ResearchGenpact Ltd
 
Target Operating Model Definition
Target Operating Model DefinitionTarget Operating Model Definition
Target Operating Model Definitionstuart1403
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologySergey Shelpuk
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...Damian R. Mingle, MBA
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsMichał Łopuszyński
 
Operating Model
Operating ModelOperating Model
Operating Modelrmuse70
 

Viewers also liked (11)

Key performance indicators in professional service firms
Key performance indicators in professional service firmsKey performance indicators in professional service firms
Key performance indicators in professional service firms
 
Target Operating Model Research
Target Operating Model ResearchTarget Operating Model Research
Target Operating Model Research
 
Target Operating Model Definition
Target Operating Model DefinitionTarget Operating Model Definition
Target Operating Model Definition
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
 
Crisp-DM
Crisp-DMCrisp-DM
Crisp-DM
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Operating Model
Operating ModelOperating Model
Operating Model
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 

Similar to Agile data science

How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with DatabricksGrega Kespret
 
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyUnderstanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyInside Analysis
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIsBala Iyer
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Intro to Artificial Intelligence w/ Target's Director of PM
 Intro to Artificial Intelligence w/ Target's Director of PM Intro to Artificial Intelligence w/ Target's Director of PM
Intro to Artificial Intelligence w/ Target's Director of PMProduct School
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
 
The Analytic Platform: Empowering the Business Now
The Analytic Platform: Empowering the Business NowThe Analytic Platform: Empowering the Business Now
The Analytic Platform: Empowering the Business NowInside Analysis
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis ReportAbanoub Amgad
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social MediaSeth Grimes
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analyticssunnypatil1778
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptxpriti jadhao
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 

Similar to Agile data science (20)

How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyUnderstanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data Quickly
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIs
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Intro to Artificial Intelligence w/ Target's Director of PM
 Intro to Artificial Intelligence w/ Target's Director of PM Intro to Artificial Intelligence w/ Target's Director of PM
Intro to Artificial Intelligence w/ Target's Director of PM
 
Open Data Canvas 0.1
Open Data Canvas 0.1Open Data Canvas 0.1
Open Data Canvas 0.1
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
The Analytic Platform: Empowering the Business Now
The Analytic Platform: Empowering the Business NowThe Analytic Platform: Empowering the Business Now
The Analytic Platform: Empowering the Business Now
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
semana1.pptx
semana1.pptxsemana1.pptx
semana1.pptx
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptx
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 

Recently uploaded

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Agile data science

  • 1. Agile Data Science INFORMS BIG DATA CONFERENCE 6/23/2013 ! Presented by Joel S Horwitz Follow me @JSHorwitz Alpine Data Labs
  • 2. Agile Manifesto We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: ! Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan ! That is, while there is value in the items on the right, we value the items on the left more.
  • 3. We are uncovering better ways of developing software models by doing it and helping others do it. Through this work we have come to value: ! Individuals and interactions over processes and tools Working software models over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan ! That is, while there is value in the items on the right, we value the items on the left more. Agile Manifesto
  • 4. Business Analytics Technologist Linear workflows in non-agile culture ???? Business Owner Business Analyst Data Owner Data Scientist Business Stakeholders < / > 001011 = 5 = 10
  • 5. Agile is about continuous interactions Analytics • Feature Creation • Model / Scoring • Evaluation Technologist • Wrangling • Interpretation • Pipelines Business • Presentation • Deployment • Productize
  • 6. Minimally Viable Data Products (MVDP) crossfilter.js or D3.JS Tableau Dashboards Product Recommendations Also, Product Recommendations People Recommendations One model, many use cases.
  • 7. Agile Data Science Feedback Loop Instrumentation / Data Collection Analysis & Design Results & Interpretation Cultivating Data Intuition
  • 8. What do you need? 1. Business Champion 2.Integrated Environment 3. Analytics Ninjas
  • 9. 1. Business Champion(s): Defining the Problem Executive Sponsor(s) who have a vested interest… ready to take action from results, has an impact to their business goals. Chief Technology Officer SVP of Product Chief Marketing Officer VP of Sales Problem statement… Monthly active user count growth was stagnating. Evidence of where to look… acquisition funnel, user engagement, and customer loyalty. Question to answer… How do I connect each part of the acquisition funnel? We started with the acquisition funnel… web visits, downloads, installs, and activations.
  • 10. 2. Environment: Before • Many data silos and technology limitations • Data definitions not well defined. • Multiple data formats. • No place to build MVDPs at scale.
  • 11. 2. Environment: After Open Source Technologies Flume, Sqoop, Oozie, and others Analytics Sandbox Relational Database Minimally Viable Data Products Selectively include data that is Relevant
  • 12. How was our analytics platform deployed and connected to the network of systems. First there was FTP… dump raw log files from web analytics, content delivery network, and application log files. ! Second there was MS SQL… Most of the data was in Microsoft SQL databases. ! Third there was Hadoop (with Hive)… Engineering and Development backup to Hadoop. ! Now there is web based analytics…
  • 13. What is Hadoop? • Overview: Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. • Storage: Hadoop Distributed File System (HDFS™) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. • Applications:  Apache Hive is a large scale Data Warehouse system and Apache Mahout is a machine learning system.
  • 14. 3. Analytics Ninjas Build a TIGER Team… subject matter expert, modeler, technologist, and storyteller Curiosity Willingness to learn Resourceful Risk Takers ! Schedule Standups… daily is best depending on the project it can vary Take notes! Open dialogue (peer review) Share knowledge ! Centralize and Version Control Your work… files, code, knowledge, and data lineage are key to success Create a Wiki Work backwards from the end goal to the analysis to the data Share, share, and share! Stand on the shoulder of Giants! Why start new, there is plenty of boilerplate to go around.
  • 15. What is Analytics? Analytics is the application of computer technology, operational research, and statistics to solve problems in business and industry. ! Historically, Analytics was heavily used in banking for portfolio assessment using social status, geographical location, net value, and many other factors. ! Today, Analytics is applied to a vast number of industries and is re-emerging due to the phenomenal explosion of data from our connected world. ! Big Data consists of data sets that grow so large and complex that they become awkward to work with using on-hand database management tools ! McKinsey Global Institute estimates that big data analysis could save the American health care system $300 billion per year and the European public sector €250 billion.
  • 16. Analytics Example: Platform Monetization Search Lifetime Value • 3rd Party Distribution • SEM • Organic Traffic Quality Analysis • PPC • PPD • PPI • PPA 1. Business Champion • Sales & Product 2.Integrated Environment • Web analytics / app data and SQL Database. 3. Analytics Ninjas • Search engine marketers, business analysts, and statisticians.
  • 17. Examples of Analytics: Campaign Management 1. Business Champion • Sales, Product, Marketing, and Project Managers. 2.Integrated Environment • Data Warehouse, SQL DB, Hadoop, and Tableau 3. Analytics Ninjas • Web analytics, product managers, business analysts, and business intelligence.
  • 18. Examples of Analytics: Customer Sentiment Keywords by Ratings 1. Business Champion • Product and Marketing 2.Integrated Environment • Mobile analytics, app logfiles, and Hadoop. 3. Analytics Ninjas • Web analytics, product managers, business analysts, mobile developers.
  • 19. What to avoid Its all about plugins for web analytics… Google Analytics, App Stores (iOS, Android, others), Social (Twitter, Facebook FQL, others?). Analysis lifecycle… 1. Give me data… import data and ETL it into submission. 2. What does this data mean… Crunch, Blend, Join, Pivot, Predict, Count, Map, or whatever 3. Show and Tell… Static (powerpoint, excel, etc.) or dynamic (trends, filtering, drilldown). (No more dashboards)... known knowns = data puking (dashboards) Interestingness rocks! Tell me where to look (I’m feeling lucky…) get people to "make love to data" to make actual good use of it Viva la revolucion! Data democracy!
  • 20. Thank you! Any Questions? Want to jump start your Agile Data Science project? Head over to http://start.alpinenow.com Follow me on @JSHorwitz