SlideShare a Scribd company logo
Agile Data Science
INFORMS BIG DATA CONFERENCE
6/23/2013
!
Presented by Joel S Horwitz
Follow me @JSHorwitz
Alpine Data Labs
Agile Manifesto
We are uncovering better ways of developing software by doing it and
helping others do it. Through this work we have come to value:
!
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
!
That is, while there is value in the items on the right, we value the items on
the left more.
We are uncovering better ways of developing software models by doing it
and helping others do it. Through this work we have come to value:
!
Individuals and interactions over processes and tools
Working software models over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
!
That is, while there is value in the items on the right, we value the items on
the left more.
Agile Manifesto
Business Analytics Technologist
Linear workflows in non-agile culture
????
Business Owner
Business Analyst
Data Owner
Data Scientist
Business Stakeholders
< / >
001011
= 5
= 10
Agile is about continuous interactions
Analytics
• Feature Creation
• Model / Scoring
• Evaluation
Technologist
• Wrangling
• Interpretation
• Pipelines
Business
• Presentation
• Deployment
• Productize
Minimally Viable Data Products (MVDP)
crossfilter.js or D3.JS
Tableau
Dashboards
Product
Recommendations
Also, Product
Recommendations
People
Recommendations
One model, many use cases.
Agile Data Science Feedback Loop
Instrumentation
/ Data Collection
Analysis &
Design
Results &
Interpretation
Cultivating
Data Intuition
What do you need?
1. Business Champion
2.Integrated Environment
3. Analytics Ninjas
1. Business Champion(s): Defining the Problem
Executive Sponsor(s) who have a vested interest… ready to take action from results, has an impact to their business goals.
Chief Technology Officer
SVP of Product
Chief Marketing Officer
VP of Sales
Problem statement… Monthly active user count
growth was stagnating. Evidence of where to look…
acquisition funnel, user engagement, and customer loyalty.
Question to answer… How
do I connect each part of
the acquisition funnel?
We started with the acquisition funnel… web visits, downloads, installs, and activations.
2. Environment: Before
• Many data silos and technology limitations
• Data definitions not well defined.
• Multiple data formats.
• No place to build MVDPs at scale.
2. Environment: After
Open Source Technologies
Flume, Sqoop, Oozie, and others
Analytics Sandbox
Relational
Database
Minimally Viable Data Products
Selectively include data
that is Relevant
How was our analytics platform deployed and
connected to the network of systems.
First there was FTP… dump raw log files from web analytics, content delivery network, and
application log files.
!
Second there was MS SQL… Most of the data was in Microsoft SQL databases.
!
Third there was Hadoop (with Hive)… Engineering and Development backup to Hadoop.
!
Now there is web based analytics…
What is Hadoop?
• Overview: Apache Hadoop is a framework for running applications on large cluster built of
commodity hardware.
• Storage: Hadoop Distributed File System (HDFS™) is the primary storage system used by
Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on
compute nodes throughout a cluster to enable reliable, extremely rapid computations.
• Applications:  Apache Hive is a large scale Data Warehouse system and Apache Mahout is a
machine learning system.
3. Analytics Ninjas
Build a TIGER Team… subject matter expert, modeler, technologist, and storyteller
Curiosity
Willingness to learn
Resourceful
Risk Takers
!
Schedule Standups… daily is best depending on the project it can vary
Take notes!
Open dialogue (peer review)
Share knowledge
!
Centralize and Version Control Your work… files, code, knowledge, and data lineage are key to success
Create a Wiki
Work backwards from the end goal to the analysis to the data
Share, share, and share! Stand on the shoulder of Giants! Why start new, there is plenty of boilerplate to go around.
What is Analytics?
Analytics is the application of computer technology, operational research, and statistics to solve problems in business
and industry.
!
Historically, Analytics was heavily used in banking for portfolio assessment using social status, geographical location, net
value, and many other factors.
!
Today, Analytics is applied to a vast number of industries and is re-emerging due to the phenomenal explosion of data
from our connected world.
!
Big Data consists of data sets that grow so large and complex that they become awkward to work with using on-hand
database management tools
!
McKinsey Global Institute estimates that big data analysis could save the American health care system $300 billion per
year and the European public sector €250 billion.
Analytics Example: Platform Monetization
Search Lifetime Value
• 3rd Party Distribution
• SEM
• Organic
Traffic Quality Analysis
• PPC
• PPD
• PPI
• PPA
1. Business Champion
• Sales & Product
2.Integrated Environment
• Web analytics / app data and SQL
Database.
3. Analytics Ninjas
• Search engine marketers, business
analysts, and statisticians.
Examples of Analytics: Campaign Management
1. Business Champion
• Sales, Product, Marketing, and Project
Managers.
2.Integrated Environment
• Data Warehouse, SQL DB, Hadoop, and
Tableau
3. Analytics Ninjas
• Web analytics, product managers,
business analysts, and business
intelligence.
Examples of Analytics: Customer Sentiment
Keywords by Ratings
1. Business Champion
• Product and Marketing
2.Integrated Environment
• Mobile analytics, app logfiles,
and Hadoop.
3. Analytics Ninjas
• Web analytics, product
managers, business analysts,
mobile developers.
What to avoid
Its all about plugins for web analytics… Google Analytics, App Stores (iOS, Android, others), Social (Twitter, Facebook
FQL, others?).
Analysis lifecycle…
1. Give me data… import data and ETL it into submission.
2. What does this data mean… Crunch, Blend, Join, Pivot, Predict, Count, Map, or whatever
3. Show and Tell… Static (powerpoint, excel, etc.) or dynamic (trends, filtering, drilldown). (No more dashboards)...
known knowns = data puking
(dashboards)
Interestingness rocks! Tell me where to look (I’m
feeling lucky…)
get people to "make love to data" to make
actual good use of it
Viva la revolucion! Data democracy!
Thank you! Any Questions?
Want to jump start your Agile Data Science project? Head over to
http://start.alpinenow.com
Follow me on @JSHorwitz

More Related Content

What's hot

8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
Mahesh Kumar CV
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
Mark West
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
Gregory Piatetsky-Shapiro
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
Sri Ambati
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
Mohammed Barakat
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Dr.Sotarat Thammaboosadee CIMP-Data Governance
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Ilkay Altintas, Ph.D.
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Sampath Kumar
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
Giuseppe Manco
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Data ScienceTech Institute
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
Srinath Perera
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
Revolution Analytics
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
Jason Geng
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
DataWorks Summit
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 

What's hot (20)

8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 

Viewers also liked

Key performance indicators in professional service firms
Key performance indicators in professional service firmsKey performance indicators in professional service firms
Key performance indicators in professional service firms
transentis consulting
 
Target Operating Model Research
Target Operating Model ResearchTarget Operating Model Research
Target Operating Model Research
Genpact Ltd
 
Target Operating Model Definition
Target Operating Model DefinitionTarget Operating Model Definition
Target Operating Model Definition
stuart1403
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
Sergey Shelpuk
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
Russell Jurney
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
Dardarian78
 
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Damian R. Mingle, MBA
 
Crisp-DM
Crisp-DMCrisp-DM
Crisp-DM
Aldo Quelopana
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
Operating Model
Operating ModelOperating Model
Operating Model
rmuse70
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
Silicon Valley Data Science
 

Viewers also liked (11)

Key performance indicators in professional service firms
Key performance indicators in professional service firmsKey performance indicators in professional service firms
Key performance indicators in professional service firms
 
Target Operating Model Research
Target Operating Model ResearchTarget Operating Model Research
Target Operating Model Research
 
Target Operating Model Definition
Target Operating Model DefinitionTarget Operating Model Definition
Target Operating Model Definition
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
 
Crisp-DM
Crisp-DMCrisp-DM
Crisp-DM
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Operating Model
Operating ModelOperating Model
Operating Model
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 

Similar to Agile data science

Unit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big DataUnit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big Data
RajendraKankrale1
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyUnderstanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Inside Analysis
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIs
Bala Iyer
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Intro to Artificial Intelligence w/ Target's Director of PM
 Intro to Artificial Intelligence w/ Target's Director of PM Intro to Artificial Intelligence w/ Target's Director of PM
Intro to Artificial Intelligence w/ Target's Director of PM
Product School
 
Open Data Canvas 0.1
Open Data Canvas 0.1Open Data Canvas 0.1
Open Data Canvas 0.1
Nicolas Terpolilli
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
NagarajanG35
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
Dr. Ananth Krishnamoorthy
 
The Analytic Platform: Empowering the Business Now
The Analytic Platform: Empowering the Business NowThe Analytic Platform: Empowering the Business Now
The Analytic Platform: Empowering the Business Now
Inside Analysis
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
Dr. Ananth Krishnamoorthy
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
Abanoub Amgad
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
Seth Grimes
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Devon Ziegenfuss
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
sunnypatil1778
 
semana1.pptx
semana1.pptxsemana1.pptx
semana1.pptx
AidaVivancoLuna1
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptx
priti jadhao
 

Similar to Agile data science (20)

Unit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big DataUnit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big Data
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data QuicklyUnderstanding What’s Possible: Getting Business Value from Big Data Quickly
Understanding What’s Possible: Getting Business Value from Big Data Quickly
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIs
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Intro to Artificial Intelligence w/ Target's Director of PM
 Intro to Artificial Intelligence w/ Target's Director of PM Intro to Artificial Intelligence w/ Target's Director of PM
Intro to Artificial Intelligence w/ Target's Director of PM
 
Open Data Canvas 0.1
Open Data Canvas 0.1Open Data Canvas 0.1
Open Data Canvas 0.1
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Data science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptxData science in business Administration Nagarajan.pptx
Data science in business Administration Nagarajan.pptx
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
The Analytic Platform: Empowering the Business Now
The Analytic Platform: Empowering the Business NowThe Analytic Platform: Empowering the Business Now
The Analytic Platform: Empowering the Business Now
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
semana1.pptx
semana1.pptxsemana1.pptx
semana1.pptx
 
Morden EcoSystem.pptx
Morden EcoSystem.pptxMorden EcoSystem.pptx
Morden EcoSystem.pptx
 

Recently uploaded

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 

Recently uploaded (20)

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 

Agile data science

  • 1. Agile Data Science INFORMS BIG DATA CONFERENCE 6/23/2013 ! Presented by Joel S Horwitz Follow me @JSHorwitz Alpine Data Labs
  • 2. Agile Manifesto We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: ! Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan ! That is, while there is value in the items on the right, we value the items on the left more.
  • 3. We are uncovering better ways of developing software models by doing it and helping others do it. Through this work we have come to value: ! Individuals and interactions over processes and tools Working software models over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan ! That is, while there is value in the items on the right, we value the items on the left more. Agile Manifesto
  • 4. Business Analytics Technologist Linear workflows in non-agile culture ???? Business Owner Business Analyst Data Owner Data Scientist Business Stakeholders < / > 001011 = 5 = 10
  • 5. Agile is about continuous interactions Analytics • Feature Creation • Model / Scoring • Evaluation Technologist • Wrangling • Interpretation • Pipelines Business • Presentation • Deployment • Productize
  • 6. Minimally Viable Data Products (MVDP) crossfilter.js or D3.JS Tableau Dashboards Product Recommendations Also, Product Recommendations People Recommendations One model, many use cases.
  • 7. Agile Data Science Feedback Loop Instrumentation / Data Collection Analysis & Design Results & Interpretation Cultivating Data Intuition
  • 8. What do you need? 1. Business Champion 2.Integrated Environment 3. Analytics Ninjas
  • 9. 1. Business Champion(s): Defining the Problem Executive Sponsor(s) who have a vested interest… ready to take action from results, has an impact to their business goals. Chief Technology Officer SVP of Product Chief Marketing Officer VP of Sales Problem statement… Monthly active user count growth was stagnating. Evidence of where to look… acquisition funnel, user engagement, and customer loyalty. Question to answer… How do I connect each part of the acquisition funnel? We started with the acquisition funnel… web visits, downloads, installs, and activations.
  • 10. 2. Environment: Before • Many data silos and technology limitations • Data definitions not well defined. • Multiple data formats. • No place to build MVDPs at scale.
  • 11. 2. Environment: After Open Source Technologies Flume, Sqoop, Oozie, and others Analytics Sandbox Relational Database Minimally Viable Data Products Selectively include data that is Relevant
  • 12. How was our analytics platform deployed and connected to the network of systems. First there was FTP… dump raw log files from web analytics, content delivery network, and application log files. ! Second there was MS SQL… Most of the data was in Microsoft SQL databases. ! Third there was Hadoop (with Hive)… Engineering and Development backup to Hadoop. ! Now there is web based analytics…
  • 13. What is Hadoop? • Overview: Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. • Storage: Hadoop Distributed File System (HDFS™) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. • Applications:  Apache Hive is a large scale Data Warehouse system and Apache Mahout is a machine learning system.
  • 14. 3. Analytics Ninjas Build a TIGER Team… subject matter expert, modeler, technologist, and storyteller Curiosity Willingness to learn Resourceful Risk Takers ! Schedule Standups… daily is best depending on the project it can vary Take notes! Open dialogue (peer review) Share knowledge ! Centralize and Version Control Your work… files, code, knowledge, and data lineage are key to success Create a Wiki Work backwards from the end goal to the analysis to the data Share, share, and share! Stand on the shoulder of Giants! Why start new, there is plenty of boilerplate to go around.
  • 15. What is Analytics? Analytics is the application of computer technology, operational research, and statistics to solve problems in business and industry. ! Historically, Analytics was heavily used in banking for portfolio assessment using social status, geographical location, net value, and many other factors. ! Today, Analytics is applied to a vast number of industries and is re-emerging due to the phenomenal explosion of data from our connected world. ! Big Data consists of data sets that grow so large and complex that they become awkward to work with using on-hand database management tools ! McKinsey Global Institute estimates that big data analysis could save the American health care system $300 billion per year and the European public sector €250 billion.
  • 16. Analytics Example: Platform Monetization Search Lifetime Value • 3rd Party Distribution • SEM • Organic Traffic Quality Analysis • PPC • PPD • PPI • PPA 1. Business Champion • Sales & Product 2.Integrated Environment • Web analytics / app data and SQL Database. 3. Analytics Ninjas • Search engine marketers, business analysts, and statisticians.
  • 17. Examples of Analytics: Campaign Management 1. Business Champion • Sales, Product, Marketing, and Project Managers. 2.Integrated Environment • Data Warehouse, SQL DB, Hadoop, and Tableau 3. Analytics Ninjas • Web analytics, product managers, business analysts, and business intelligence.
  • 18. Examples of Analytics: Customer Sentiment Keywords by Ratings 1. Business Champion • Product and Marketing 2.Integrated Environment • Mobile analytics, app logfiles, and Hadoop. 3. Analytics Ninjas • Web analytics, product managers, business analysts, mobile developers.
  • 19. What to avoid Its all about plugins for web analytics… Google Analytics, App Stores (iOS, Android, others), Social (Twitter, Facebook FQL, others?). Analysis lifecycle… 1. Give me data… import data and ETL it into submission. 2. What does this data mean… Crunch, Blend, Join, Pivot, Predict, Count, Map, or whatever 3. Show and Tell… Static (powerpoint, excel, etc.) or dynamic (trends, filtering, drilldown). (No more dashboards)... known knowns = data puking (dashboards) Interestingness rocks! Tell me where to look (I’m feeling lucky…) get people to "make love to data" to make actual good use of it Viva la revolucion! Data democracy!
  • 20. Thank you! Any Questions? Want to jump start your Agile Data Science project? Head over to http://start.alpinenow.com Follow me on @JSHorwitz