Exploring What a Typical Data Science Project Looks Like

Exploring What A Typical Data Science Project
Looks like with Bowery Analytics

FREE INVITE
Join 10,000+ Product Managers on

Data Analytics
for Managers
Courses

Include @productschool and #prodmgmt at the end
of your tweet
Tweet to get a free ticket
for our next Event!

ANIA WIECZOREK
TONIGHT’S SPEAKER

Jay Kesavan
Ania Wieczorek
Bowery Analytics LLC
What does a typical Data
Science Project look like?

AGENDA
Intro to data science
Case Study
Data Science Project Lifecycle
Data Science Projects v/s
Traditional Projects
Resources
9BOWERY ANALYTICS LLC

NOT
COVERED
1. Data mining algorithms and techniques.
2. Big data technologies.
3. Data visualization technologies.

WHAT IS DATA
SCIENCE ?
Area of work
concerned with the
collection,
preparation,
analysis,
visualization,
management, and
preservation of
large collection of
information.
Jeffrey Stanton
Syracuse
University School
of Information

WHY NOW ?
90%
all data created in last two years
2.5
million TB

HOW DID
WE GET
HERE?
EVOLUTION OF BIG
DATA
1.0
1950-
2000
Enterprise
Data Ware
House
Static and
Low Volume
Focus on
Operational
Efficiency
2.0
2000-2012
Google
Amazon
(Internet)
High Volume,
High Velocity
Data tools as a
product
(people you
may know,
products you
like)
3.0
2012 – Now
Data
generated at
every event
(iOT)
Cognitive
Analytics
(Echo, Home).
Every company
offering data
products.
The different eras of Big Data

AI, ML, DL
Courtesy: Nvidia14BOWERY ANALYTICS LLC

USE CASE
PROJECT
LEAF
Reducing
member
churn using
predictive
analytics

THE
PARK
MORTON
ARBORETUM
Reduce membership churn for the park using
predictive analytics.
Objective
Explore park’s member profile and the transaction
data and make predictions.
Data
Leverage data analytics tools and predictive
algorithms.
Tools

DATA
UNDERST
ANDING,
TRANSFO
RMATION
AND
EXPLORAT
ION
All membership profile
information (DB)
All events attended by
member (CSV)
Every transaction made by
the member at Morton (3rd
Party DB)

DATA
TRANSFOR
MATION
Multiple steps
Clean data (remove blank,
empty spaces, data formatting)
Add additional attributes to
data
 Binary flag Churn/No Churn
 Distance from Morton by zip
 Age Group
 Average income leveraging
census data

VISUAL DATA EXPLORATION –
POPULATION DISTRIBUTION

DATA EXPLORATION –
GEOGRAPHIC PRESENCE OF MEMBERS

RESEARCH
HYPOTHESIS
Data understanding and
preparation led us to a simple
hypothesis.
More
interaction with
the Park should
be correlated
with more
renewals.

SUPERVISED V/S
UNSUPERVISED
1
Do our customers fit into different
groups ? (Unsupervised)
2
Do we know customers who will
quit after their contract expires ?
(Supervised)

MODELIN
G –
QUICK
INTRO
•Estimate a numerical value
– the probability of this
customer leaving ?
(Supervised/Regression)
•Will the customer leave
(yes/no) ?
(Unsupervised/Classificatio
n)

MODELING
oDealing with Missing Values
oDummy Transformations (churn/no churn variables)
oIdentify train and test data (60 and 40%)

IMPLICATIONS
OF THE
MODEL
False Positives
False Negatives
ROI

DATA MINING
PROCESS

DATA SCIENCE
APPLICATIONS
1. Customer Attrition (eCommerce and Subscriptions)
2. Predict who is quitting (HR)
3. Heavy equipment reducing downtime (Oil and Natural gas)
4. Image recognition and profiling. (Govt.)
5. Precision medicine and reducing downtime. (Healthcare)
6. Topic Modeling and Document Management (Legal and
Contracts)
BOWERY ANALYTICS LLC 28

SOME DATA
SCIENCE
ALGORITHMS
Classification
 How likely will the customer respond
to our campaign ?
Regression (Estimation)
 How much will she use the service ?
Similarity
 Can we find customers similar to my
best customers?
Clustering
 Do my customers for natural groups?

CONT..
Co-Occurrence
 You might also like …
Description – Profiling
 What does normal behavior look
like?
Causal Modeling
 Why are my customers leaving?

TOOLS AND SKILLS
Business
Understandin
g
Data
Understandin
g
Data
Preparation
Modeling Evaluation Deployment
Micro Soft
Visio, Excel
Power User
and Macros,
Micro Soft
Word, SQL
QlikView,
Tableau,
Power BI,
ggplot
using R,
Embedded
web plot
using
Plotly and
D3.js
Statistics, R,
Python, Scala
R, Python,
Scala,
Azure ML,
R Studio,
IBM Blue
Mix, Google
Tensor
Flow, AWS
ML,
Hadoop,
Apache
Spark Eco
System
R, Python,
Scala,
Azure ML,
R Studio,
IBM Blue
Mix,
Google
Tensor
Flow, AWS
ML,
Hadoop,
Apache
Spark Eco
System
R, Python,
Scala, Azure
ML, R Studio,
IBM Blue
Mix, Google
Tensor Flow,
AWS ML,
Hadoop,
Apache
Spark Eco
System

TYPICAL JOB ROLES
Data Analysts and Visualization (Eg.Tableau
Developer)
 Analysts – Part of a Data Science Team
 Tableau Dashboard Developer
Data Mining and Infrastructure setup (AWS and
Hadoop/NoSQL Developer/Administrator)
 Setup Hadoop Infrastructure
 Write Java code to run Hadoop Jobs
 AWS Instance Administrator
Data Scientist (Data Modeling using
Predictive Analytics)
 Data Science Manager
 Team Lead
 Architect

PREPARING
FOR THE
PROJECT
Do we have the data ?
What kind of data do we
have ?
Do we have the team ?
Do we have the buy in from
the stakeholders ?

GOOD DATA = GOOD
MODELS

COMMON
MISTAKES
•Data Science Project is not a Software
Project
•Overestimate the significance of data
•Vendor, Team and Skills

RESOURCES
AND TOOLS
1. An Intro to statistical learning
Gareth James , Daniela Witten,
Trevor Hastie and Robert Tibshirani
2. Business Analytics for Managers
Wolfgang Jank
3. An Introduction to Machine Learning
Miroslav Kubat
4. Understanding Statistics using R
Randall Schumaker, Sara Tomek
5. www.rstudio.com
Primary development environment for R
6. Visual Studio Community Edition
2015 and above comes with an
Integrated R environment

Questions?

CONTACT
US
www.boweryanalytics.com
Jay
jay@boweryanalytics.com
Ania
Ania@boweryanalytics.com

Part-time Product Management Courses in
New York

Exploring What a Typical Data Science Project Looks Like

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Exploring What a Typical Data Science Project Looks Like

Similar to Exploring What a Typical Data Science Project Looks Like (20)

More from Product School

More from Product School (20)

Recently uploaded

Recently uploaded (20)

Exploring What a Typical Data Science Project Looks Like

Editor's Notes