SlideShare a Scribd company logo
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
1
M
odule 2: Data Analytics Lifecycle
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
U
pon com
pletion of this lesson, you should be able to:
•
Apply the Data Analytics Lifecycle to a case study scenario
•
Fram
e a business problem
as an analytics problem
•
Identify the four m
ain deliverables in an analytics project
M
odule 2: Data Analytics Lifecycle
2
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
How
to Approach Your Analytics Problem
s
•
How
do you currently approach
your analytics problem
s?
•
Do you follow
a m
ethodology or
som
e kind of fram
ew
ork?
•
How
do you plan for an analytic
project?
3
M
odule 2: Data Analytics Lifecycle
Your Thoughts?
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
•
Focus your tim
e
•
Ensure rigor and com
pleteness
•
Enable better transition to m
em
bers of the cross-functional
analytic team
s
Repeatable
Scale to additional analysts
Support validity of findings
4
“A journey of a thousand m
iles begins w
ith a single step“ (Lao Tzu)
M
odule 2: Data Analytics Lifecycle
Value of U
sing the Data Analytics Lifecycle
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
5
1.
W
ell-defined processes
can help guide any analytic
project
2.
Focus of Data Analytics
Lifecycle is on Data Science
projects, not business
intelligence
3.
Data Science projects tend to require a m
ore consultative
approach, and differ in a few
w
ays
M
ore due diligence in Discovery phase
M
ore projects w
hich lack shape or structure
Less predictable data
N
eed For a Process to Guide Data Science Projects
5
M
odule 2: Data Analytics Lifecycle
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Key Roles for a Successful Analytic Project
M
odule 2: Data Analytics Lifecycle
6
Role
Description
Business U
ser
Som
eone w
ho benefits from
the end results
and can consult and advise project team
on
value of end results and how
these w
ill be operationalized
Project Sponsor
Person responsible for the genesis of the project, providing the
im
petus for the project and
core business problem
, generally provides the funding
and w
ill gauge the degree of value
from
the final outputs of the w
orking team
Project M
anager
Ensure key m
ilestonesand objectives are m
et on tim
e and at expected quality.
BusinessIntelligence
Analyst
Businessdom
ain expertise w
ith deep understanding of the data,KPIs, key m
etrics and
business intelligence from
a reporting perspective
Data Engineer
Deep technical skills to assist w
ith tuning SQ
L queries for data m
anagem
ent, extraction and
support data ingest to analytic sandbox
Database
Adm
inistrator (DBA)
Database Adm
inistratorw
ho provisions and configures database environm
ent to support
the analytical needs of the w
orking team
Data Scientist
Provide subject m
atter expertise
for analytical techniques, data m
odeling, applying
valid
analytical techniques to given business problem
s and ensuring overall analytical objectives
are m
et
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
M
odule 2: Data Analytics Lifecycle
7
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
1
23
4
65
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 1: Discovery
M
odule 2: Data Analytics Lifecycle
8
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Learn the Business Dom
ain
Determ
ine am
ount of dom
ain know
ledge needed to orient you to the data and
interpret results dow
nstream
Determ
ine the general analytic problem
type (such as clustering, classification)
If you don’t know, then conduct initial research to learn about
the dom
ain area
you’ll be analyzing
•
Learn from
the past
Have there been previous attem
pts in the organization to solve this problem
?
If so, w
hy did they fail? W
hy are w
e trying again? How
have things changed?
1
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 1: Discovery
M
odule 2: Data Analytics Lifecycle
9
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Resources
Assess available technology
Available data –
sufficient to m
eet your needs
People for the w
orking team
Assess scope of tim
e for the project in calendar tim
e and person-hours
Do you have sufficient resources to attem
pt the project? If not, can you get
m
ore?
1
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 1: Discovery
M
odule 2: Data Analytics Lifecycle
10
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Fram
e the problem
…
..Fram
ing is the process of stating the analytics problem
to be solved
State the analytics problem
, w
hy it is im
portant, and to w
hom
Identify key stakeholders and their interests in the project
Clearly articulate the current situation and pain points
O
bjectives –
identify w
hat needs to be achieved in business term
s and w
hat needs
to be done to m
eet the needs
W
hat is the goal? W
hat are the criteria for success? W
hat’s “good enough”?
W
hat is the failure criterion (w
hen do w
e just stop trying or settle for w
hat w
e
have)?
Identify the success criteria, key risks, and stakeholders (such
as RACI)
1
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Tips for Interview
ing the Analytics Sponsor
•
Even if you are “given” an analytic problem
you should w
ork w
ith clients to
clarify and fram
e the problem
You’re typically handed solutions, you need to
identify the problem
and their desired outcom
e
Sponsor Interview
Tips
•
Prepare for the interview
–
draft your questions, review
w
ith colleague, team
•
U
se open-ended questions, don’t ask leading questions
•
Probe for details, follow
-up
•
Don’t fill every silence –
give them
tim
e to think
•
Let them
express their ideas, don’t put w
ords in their m
outh, let them
share their feelings
•
Ask clarifying questions, ask w
hy –
is that correct? Am
I on target? Is there anything else?
•
U
se active listening –
repeat it back to m
ake sure you heard it correctly
•
Don’t express your opinions
•
Be m
indful of your body language and theirs –
use eye contact, be attentive
•
M
inim
ize distractions
•
Docum
ent w
hat you heard and review
it back w
ith the sponsor
1111
M
odule 2: Data Analytics Lifecycle
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Tips for Interview
ing the Analytics Sponsor
Interview
Q
uestions
•
W
hat is the business problem
you’re trying to solve?
•
W
hat is your desired outcom
e?
•
W
ill the focus and scope of the problem
change if the follow
ing dim
ensions
change:
•
Tim
e –
analyzing 1 year or 10 years w
orth of data?
•
People –
how
w
ould this project change this?
•
Risk –
conservative to aggressive
•
Resources –
none to unlim
ited (tools, tech, …
..)
•
Size and attributes of Data
•
W
hat data sources do you have?
•
W
hat industry issues m
ay im
pact the analysis?
•
W
hat tim
elines are you up against?
•
W
ho could provide insight into the project? Consulted?
•
W
ho has final say on the project?
1212
M
odule 2: Data Analytics Lifecycle
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 1: Discovery
M
odule 2: Data Analytics Lifecycle
13
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Form
ulate Initial Hypotheses
IH, H
1 , H
2, H
3 , …
H
n
Gather and assess hypotheses from
stakeholders and
dom
ain experts
Prelim
inary data exploration to inform
discussions w
ith
stakeholders during the hypothesis form
ing stage
•
Identify Data Sources –
Begin Learning the Data
Aggregate sources for preview
ing the data and provide
high-level understanding
Review
the raw
data
Determ
ine the structures and tools needed
Scope the kind of data needed for this kind of problem
1
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
U
sing a Sam
ple Case Study to Track the Phases in the
Data Analytics Lifecycle
Situation Synopsis
•
Retail Bank, Yoyodyne Bank w
ants to im
prove the N
et Present Value
(N
PV) and retention rate of custom
ers
•
They w
ant to establish an effective m
arketing cam
paign targeting
custom
ers to reduce the churn rate by at least five percent
•
The bank w
ants to determ
ine w
hether those custom
ers are w
orth
retaining. In addition, the bank also w
ants to analyze reasons for
custom
er attrition and w
hat they can do to keep them
•
The bank w
ants to build a data w
arehouse to support M
arketing
and other related custom
er care groups
14
M
ini C
ase Study: C
hurn Prediction for
Yoyodyne B
ank
14
M
odule 2: Data Analytics Lifecycle
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
How
to Fram
e an Analytics Problem
Sam
ple
Business Problem
s
Q
ualifiers
Analytical
Approach
•
How
can w
e im
prove on x?
•
W
hat’s happening real-tim
e?
Trends?
•
How
can w
e use analytics
differentiate ourselves
•
How
can w
e use analytics to
innovate?
•
How
can w
e stay ahead of our
biggest com
petitor?
W
ill the focus and scope of the problem
change if
the follow
ing dim
ensions change:
•
Tim
e
•
People
–
how
w
ould x change this?
•
Risk –
conservative/aggressive
•
Resources –
none/unlim
ited
•
Size of Data?
Define an analytical
approach, including
key term
s, m
etrics, and
data needed.
Yoyodyne Bank
How
can
w
e im
prove
N
et Present Value (N
PV) and
retention rate of the custom
ers?
•
Tim
e: Trailing 5 m
onths
•
People: W
orking team
and business users
from
the
Bank
•
Risk: the projectw
ill fail if w
e cannot
determ
ine valid predictors of churn
•
Resources: EDW
, analytic
sandbox, O
LTP
system
•
Data:U
se 24 m
onths for the training set, then
analyze 5 m
onths of historical data for those
custom
ers w
ho churned
How
do w
e identify
churn/no churn for a
custom
er?
Pilot study follow
ed
full scale analytical
m
odel
1515
M
odule 2: Data Analytics Lifecycle
C
hurn Prediction for
Yoyodyne B
ank
M
ini C
ase
Study
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 2: Data Preparation
M
odule 2: Data Analytics Lifecycle
16
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Prepare Analytic Sandbox
W
ork space for the analytic team
10x+ vs. EDW
•
Perform
ELT
Determ
ine needed transform
ations
Assess data quality and structuring
Derive statistically useful m
easures
Determ
ine and establish data connections
for raw
data
Execute Big ELT and/or Big ETL
•
U
seful Tools for this phase:
•
For D
ata Transform
ation &
C
leansing: S
Q
L, H
adoop, M
apR
educe, A
lpine M
iner
2
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 2: Data Preparation
M
odule 2: Data Analytics Lifecycle
17
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Fam
iliarize yourself w
ith the data thoroughly
List your data sources
W
hat’s needed vs. w
hat’s available
•
Data Conditioning
Clean and norm
alize data
Discern w
hat you keep vs. w
hat you discard
•
Survey &
Visualize
O
verview, zoom
&
filter, details-on-dem
and
Descriptive Statistics
Data Q
uality
•
U
seful Tools for this phase:
•
D
escriptive S
tatistics on candidate variables for diagnostics &
quality
•
Visualization: R
(base package, ggplot and lattice), G
nuP
lot, G
gobi/R
ggobi, S
potfire,
Tableau
2
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 3: M
odel Planning
M
odule 2: Data Analytics Lifecycle
18
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Determ
ine M
ethods
Select m
ethods based on hypotheses, data
structure and volum
e
Ensure techniques and approach w
ill m
eet
business objectives
•
Techniques &
W
orkflow
Candidate tests and sequence
Identify and docum
ent m
odeling
assum
ptions
•
U
seful Tools for this phase: R
/P
ostgresS
Q
L, S
Q
L
A
nalytics, A
lpine M
iner, S
A
S
/A
C
C
E
S
S
, S
P
S
S
/O
B
D
C
3
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 3: M
odel Planning
M
odule 2: Data Analytics Lifecycle
19
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Data Exploration
•
Variable Selection
Inputs from
stakeholders and dom
ain
experts
Capture essence of the predictors, leverage
a technique for dim
ensionality reduction
Iterative testing to confirm
the m
ost
significant variables
•
M
odel Selection
Conversion to SQ
L or database language for
best perform
ance
Choose technique based on the end goal
3
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Sam
ple Research: Churn Prediction in O
ther Verticals
M
arket Sector
Analytic Techniques/M
ethods U
sed
W
ireless Telecom
DM
EL m
ethod (data m
ining by evolutionary learning)
Retail Business
Logistic regression, ARD (autom
atic
relevance determ
ination), decision tree
Daily Grocery
M
LR (m
ultiple linear regression), ARD, and decision tree
W
ireless Telecom
N
eural netw
ork, decision tree, hierarchical neurofuzzy system
s, rule evolver
Retail Banking
M
ultiple regression
W
ireless Telecom
Logistic regression, neural netw
ork, decision tree
2020
M
odule 2: Data Analytics Lifecycle
M
ini C
ase Study:
C
hurn Prediction for
Yoyodyne B
ank
•
After conducting research on churn prediction, you have
identified m
any
m
ethods for analyzing custom
er churn across m
ultiple verticals (those in
bold
are taught in this course)
•
At this point, a Data Scientist w
ould assess the m
ethods and select the best
m
odel for the situation
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 4: M
odel Building
M
odule 2: Data Analytics Lifecycle
21
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Develop data sets for testing, training, and production purposes
N
eed to ensure that the m
odel data is sufficiently robust for the m
odel
and analytical techniques
Sm
aller, test sets for validating approach, training set for initial
experim
ents
•
G
et the best environm
ent you can for building m
odels and
w
orkflow
s…
fast hardw
are, parallel processing
•
U
seful Tools for this phase: R
, P
L/R
, S
Q
L, A
lpine M
iner, S
A
S
E
nterprise M
iner
4
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 5: Com
m
unicate Results
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
Did w
e succeed? Did w
e fail?
•
Interpret the results
•
Com
pare to IH’s from
Phase 1
•
Identify key findings
•
Q
uantify business value
•
Sum
m
arizing findings, depending on
audience
5
For the YoyoD
yne C
ase S
tudy,
w
hat w
ould be som
e possible results and key findings?
M
ini C
ase Study:
C
hurn Prediction for
Yoyodyne B
ank
M
odule 2: Data Analytics Lifecycle
22
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Data Analytics Lifecycle
Phase 6: O
perationalize
M
odule 2: Data Analytics Lifecycle
23
D
iscovery
O
perationalize
M
odel
P
lanning
D
ata P
rep
M
odel
B
uilding
C
om
m
unicate
R
esults
Do I have enough
inform
ation to draft an
analytic plan and share for
peer review
?
Do I have
enough good
quality data to
start building
the m
odel?
Do I have a good idea
about the type of m
odel
to try? Can I refine the
analytic plan?
Is the m
odel robust
enough? Have w
e
failed for sure?
•
Run a pilot
•
Assess the benefits
•
Provide final deliverables
•
Im
plem
ent the m
odel in the
production environm
ent
•
Define process to update, retrain,
and retire the m
odel, as needed
6
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Analytic Plan
24
Com
ponentsof
Analytic Plan
RetailBanking: Yoyodyne
Bank
Phase
1: Discovery
Business Problem
Fram
ed
How
do w
e identify churn/no churn for a custom
er?
InitialHypotheses
Transaction volum
e and type
are key predictors of churn rates.
Data
5 m
onths of custom
er account history.
Phase
3: M
odel Planning
-Analytic
Technique
Logistic regression to identify m
ost influentialfactors predicting churn.
Phase 5:
Result&
Key Findings
O
nce custom
ers stop using their accounts for gas and groceries,they w
ill
soon erode their accounts and churn.
If custom
ers use their debitcard few
er than 5 tim
es per m
onth, they w
ill
leave the bank w
ithin 60 days.
BusinessIm
pact
If w
e can target custom
ers w
ho are high-risk for churn, w
e can reduce
custom
er attrition by 25%
. This w
ould save $3 m
illion in lost of
custom
er revenue and avoid $1.5 m
illion in new
custom
er acquisition
costs each year.
24
M
odule 2: Data Analytics Lifecycle
M
ini C
ase Study:
C
hurn Prediction for
R
etail B
anking
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Key O
utputs from
a Successful Analytic Project, by RoleM
odule 2: Data Analytics Lifecycle
25
Role
Description
W
hat the Role N
eeds in the Final Deliverables
Business U
ser
Som
eone w
ho benefits from
the end resultsand can consult
and advise project team
on value of end results and how
these w
ill be operationalized
•
SponsorPresentation addressing:
•
Are the results good for m
e?
•
W
hatare the benefits of the findings?
•
W
hat are the im
plicationsof this for m
e?
Project
Sponsor
Person responsible for the genesis of the project, providing
the im
petus for the project and
core business problem
,
generally provides the funding
and w
ill gauge the degree of
value from
the final outputs of the w
orking team
•
SponsorPresentation addressing:
•
W
hat’s the business im
pact of doing this?
•
W
hat are the risks? RO
I?
•
How
can this be evangelized w
ithin the
organization (and beyond)?
Project
M
anager
Ensure key m
ilestonesand objectives are m
et on tim
e and at
expected quality.
Business
Intelligence
Analyst
Businessdom
ain expertise w
ith deep understanding of the
data,KPIs, key m
etrics and business intelligence from
a
reporting perspective
•
Show
the analyst presentation
•
Determ
ine if the reports w
ill change
Data Engineer
Deep technical skills to assist w
ith tuning SQ
L queries for
data m
anagem
ent, extraction and support data ingest to
analytic sandbox
•
Share the code
from
the analytical project
•
Create technicaldocum
ent on how
to
im
plem
ent it.
Database
Adm
inistrator
(DBA)
Database Adm
inistratorw
ho provisions and configures
database environm
ent to support the analytical needs of the
w
orking team
•
Share the code
from
the analytical project
•
Create technicaldocum
ent on how
to
im
plem
ent it.
Data Scientist
Provide subject m
atter expertise
for analytical techniques,
data m
odeling, applying
valid analytical techniques to given
business problem
s and ensuring overall analytical objectives
are m
et
•
Show
the analyst presentation
•
Share the code
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
4 Core Deliverables to M
eet M
ost Stakeholder N
eeds
1.
Presentation for Project Sponsors
•
“Big picture" takeaw
ays for executive level stakeholders
•
Determ
ine key m
essages to aid their decision-m
aking process
•
Focus on clean, easy visuals for the presenter to explain and for
the
view
er to grasp
2.
Presentation for Analysts
•
Business process changes
•
Reporting changes
•
Fellow
Data Scientists w
ill w
ant the details and are com
fortable w
ith
technical graphs (such as RO
C curves, density plots, histogram
s)
3.
Code
for technical people
4.
Technical specs of im
plem
enting the code
26
M
odule 2: Data Analytics Lifecycle
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Analyst W
ish List for a Successful Analytics
Project
Data &
W
orkspaces
•
Access to all the data, including aggregated O
LAP data, BI tools, raw
data, structured
and various states of unstructured data as needed
•
U
p-to-date data dictionary to describe the data
•
Area for staging and production data sets
•
Ability to m
ove data back and forth
betw
een w
orkspaces and staging areas
•
Analytic sandbox w
ith strong com
pute pow
er to experim
ent and play w
ith the data
Tools
•
Statistical/m
athem
atical/visual softw
are of choice for a given situation and problem
set,
such as SAS, M
atlab, R, java tools, Tableau, Spotfire
•
Collaboration: an online platform
or environm
ent for collaboration and com
m
unicating
w
ith team
m
em
bers
•
Tool or place to log errorsw
ith system
s, environm
ents or data sets
2727
M
odule 2: Data Analytics Lifecycle
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Concepts in Practice
Greenplum
’s Approach to Analytics
M
odule 2: Data Analytics Lifecycle
28
ED
C PLATFO
RM
D
ata
Analytics
A
nalyze
data
R
estructure
data
R
epeat
Past Future
Facts
Interpretation
W
hat w
ill
happen?
How
can w
e
do better?
W
hat
happened
w
here and
w
hen?
How
and
w
hy did it
happen?
M
agnetic
Attract all kinds of data
Agile
Flexible and elastic data structures
Deep
Rich data repository and
algorithm
ic engine
S
ource: M
A
D
S
kills: N
ew
A
nalysis P
ractices for B
ig D
ata, M
arch 2009
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
“The pessim
ist –
com
plains about the w
ind
The optim
ist –
expects it to change
The leader–
adjusts the sails
John M
axw
ell
(Leadership Author)
29
M
odule 2: Data Analytics Lifecycle
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Check Your Know
ledge
•
In w
hich phase w
ould you expect to invest m
ost of your project tim
e and
w
hy? W
here w
ould expect to spend the least tim
e?
•
W
hat are the benefits of doing a pilot program
before a full scale rollout of a
new
analytical m
ethodology? Discuss this in the context of the m
ini case
study.
•
W
hat kinds of tools w
ould be used in the follow
ing phases, and for w
hich
kinds of use scenarios?
Phase 2: Data Preparation
Phase 4: M
odel Execution
•
N
ow
that you have com
pleted the analytical project at Yoyodyne, you have an
opportunity to repurpose this approach for an online eCom
m
erce com
pany.
W
hat phases of the lifecycle do you need to focus on to identify w
ays to do
this?
30
M
odule 2: Data Analytics Lifecycle
Your Thoughts?
Copyright ©
2014 EM
C Corporation. All Rights Reserved.
Sum
m
ary
Key points covered in this lesson:
•
The Data Analytics Lifecycle w
as applied to a case study
scenario
•
A business problem
w
as fram
ed as an analytics problem
•
The four m
ain deliverables in an analytics project w
ere
identified
M
odule 2: Data Analytics Lifecycle
31

More Related Content

Similar to Copyright © 2014 EMC Corporation. All Rights Reserved..docx

Analytics Roadmap Developing Management Platform Automation Framework Technol...
Analytics Roadmap Developing Management Platform Automation Framework Technol...Analytics Roadmap Developing Management Platform Automation Framework Technol...
Analytics Roadmap Developing Management Platform Automation Framework Technol...
SlideTeam
 
Bba 3274 qm week 1 introduction
Bba 3274 qm week 1 introductionBba 3274 qm week 1 introduction
Bba 3274 qm week 1 introduction
Stephen Ong
 
Business analysis1.9 - business side
Business analysis1.9 - business sideBusiness analysis1.9 - business side
Business analysis1.9 - business side
Anton Galitskiy
 
Scanning of Business Analysis
Scanning of Business AnalysisScanning of Business Analysis
Scanning of Business Analysis
TechShiv
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
Mandar Parikh
 
Big Data & Business Strategy
Big Data & Business StrategyBig Data & Business Strategy
Big Data & Business Strategy
Sylvia Ogweng
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
Lisa Cohen
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
sunnypatil1778
 
Building personas
Building personasBuilding personas
Building personas
Elaine Chen
 
Roadmap to next generation digital lab
Roadmap to next generation digital labRoadmap to next generation digital lab
Roadmap to next generation digital lab
Stephan Gürtler
 
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Julia Grosman
 
Huntel global webinar aligning data talent with your analytics needs
Huntel global webinar aligning data talent with your analytics needsHuntel global webinar aligning data talent with your analytics needs
Huntel global webinar aligning data talent with your analytics needs
Huntel Global
 
G325 Section A
G325 Section AG325 Section A
G325 Section A
MrsHouseLND
 
Starting a career in data science
Starting a career in data scienceStarting a career in data science
Starting a career in data science
Brian Spiering
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
ryanorban
 
A3 Management (Part 2 of 2)
A3 Management (Part 2 of 2)A3 Management (Part 2 of 2)
A3 Management (Part 2 of 2)
TKMG, Inc.
 
BA pre sales 8.00_yrs-
BA pre sales 8.00_yrs-BA pre sales 8.00_yrs-
BA pre sales 8.00_yrs-
ShreemInstituteProfe
 
Selling Text Analytics to your boss
Selling Text Analytics to your bossSelling Text Analytics to your boss
Selling Text Analytics to your boss
Ramkumar Ravichandran
 
Requirements Workshop -Text Analytics System - Serene Zawaydeh
Requirements Workshop -Text Analytics System - Serene ZawaydehRequirements Workshop -Text Analytics System - Serene Zawaydeh
Requirements Workshop -Text Analytics System - Serene Zawaydeh
Serene Zawaydeh
 
Unit 2
Unit 2Unit 2

Similar to Copyright © 2014 EMC Corporation. All Rights Reserved..docx (20)

Analytics Roadmap Developing Management Platform Automation Framework Technol...
Analytics Roadmap Developing Management Platform Automation Framework Technol...Analytics Roadmap Developing Management Platform Automation Framework Technol...
Analytics Roadmap Developing Management Platform Automation Framework Technol...
 
Bba 3274 qm week 1 introduction
Bba 3274 qm week 1 introductionBba 3274 qm week 1 introduction
Bba 3274 qm week 1 introduction
 
Business analysis1.9 - business side
Business analysis1.9 - business sideBusiness analysis1.9 - business side
Business analysis1.9 - business side
 
Scanning of Business Analysis
Scanning of Business AnalysisScanning of Business Analysis
Scanning of Business Analysis
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
 
Big Data & Business Strategy
Big Data & Business StrategyBig Data & Business Strategy
Big Data & Business Strategy
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Building personas
Building personasBuilding personas
Building personas
 
Roadmap to next generation digital lab
Roadmap to next generation digital labRoadmap to next generation digital lab
Roadmap to next generation digital lab
 
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
 
Huntel global webinar aligning data talent with your analytics needs
Huntel global webinar aligning data talent with your analytics needsHuntel global webinar aligning data talent with your analytics needs
Huntel global webinar aligning data talent with your analytics needs
 
G325 Section A
G325 Section AG325 Section A
G325 Section A
 
Starting a career in data science
Starting a career in data scienceStarting a career in data science
Starting a career in data science
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
 
A3 Management (Part 2 of 2)
A3 Management (Part 2 of 2)A3 Management (Part 2 of 2)
A3 Management (Part 2 of 2)
 
BA pre sales 8.00_yrs-
BA pre sales 8.00_yrs-BA pre sales 8.00_yrs-
BA pre sales 8.00_yrs-
 
Selling Text Analytics to your boss
Selling Text Analytics to your bossSelling Text Analytics to your boss
Selling Text Analytics to your boss
 
Requirements Workshop -Text Analytics System - Serene Zawaydeh
Requirements Workshop -Text Analytics System - Serene ZawaydehRequirements Workshop -Text Analytics System - Serene Zawaydeh
Requirements Workshop -Text Analytics System - Serene Zawaydeh
 
Unit 2
Unit 2Unit 2
Unit 2
 

More from dickonsondorris

Copyright © eContent Management Pty Ltd. Health Sociology Revi.docx
Copyright © eContent Management Pty Ltd. Health Sociology Revi.docxCopyright © eContent Management Pty Ltd. Health Sociology Revi.docx
Copyright © eContent Management Pty Ltd. Health Sociology Revi.docx
dickonsondorris
 
Copyright © Pearson Education 2010 Digital Tools in Toda.docx
Copyright © Pearson Education 2010 Digital Tools in Toda.docxCopyright © Pearson Education 2010 Digital Tools in Toda.docx
Copyright © Pearson Education 2010 Digital Tools in Toda.docx
dickonsondorris
 
Copyright © Jen-Wen Lin 2018 1 STA457 Time series .docx
Copyright © Jen-Wen Lin 2018   1 STA457 Time series .docxCopyright © Jen-Wen Lin 2018   1 STA457 Time series .docx
Copyright © Jen-Wen Lin 2018 1 STA457 Time series .docx
dickonsondorris
 
Copyright © John Wiley & Sons, Inc. All rights reserved..docx
Copyright © John Wiley & Sons, Inc. All rights reserved..docxCopyright © John Wiley & Sons, Inc. All rights reserved..docx
Copyright © John Wiley & Sons, Inc. All rights reserved..docx
dickonsondorris
 
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docxCopyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
dickonsondorris
 
Copyright © Cengage Learning. All rights reserved. CHAPTE.docx
Copyright © Cengage Learning.  All rights reserved. CHAPTE.docxCopyright © Cengage Learning.  All rights reserved. CHAPTE.docx
Copyright © Cengage Learning. All rights reserved. CHAPTE.docx
dickonsondorris
 
Copyright © by Holt, Rinehart and Winston. All rights reserved.docx
Copyright © by Holt, Rinehart and Winston. All rights reserved.docxCopyright © by Holt, Rinehart and Winston. All rights reserved.docx
Copyright © by Holt, Rinehart and Winston. All rights reserved.docx
dickonsondorris
 
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docxCopyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
dickonsondorris
 
Copyright © 2019, American Institute of Certified Public Accou.docx
Copyright © 2019, American Institute of Certified Public Accou.docxCopyright © 2019, American Institute of Certified Public Accou.docx
Copyright © 2019, American Institute of Certified Public Accou.docx
dickonsondorris
 
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docxCopyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
dickonsondorris
 
Copyright © 2018 Pearson Education, Inc. C H A P T E R 6.docx
Copyright © 2018 Pearson Education, Inc. C H A P T E R  6.docxCopyright © 2018 Pearson Education, Inc. C H A P T E R  6.docx
Copyright © 2018 Pearson Education, Inc. C H A P T E R 6.docx
dickonsondorris
 
Copyright © 2018 Capella University. Copy and distribution o.docx
Copyright © 2018 Capella University. Copy and distribution o.docxCopyright © 2018 Capella University. Copy and distribution o.docx
Copyright © 2018 Capella University. Copy and distribution o.docx
dickonsondorris
 
Copyright © 2018 Pearson Education, Inc.C H A P T E R 3.docx
Copyright © 2018 Pearson Education, Inc.C H A P T E R  3.docxCopyright © 2018 Pearson Education, Inc.C H A P T E R  3.docx
Copyright © 2018 Pearson Education, Inc.C H A P T E R 3.docx
dickonsondorris
 
Copyright © 2018 by Steven Levitsky and Daniel.docx
Copyright © 2018 by Steven Levitsky and Daniel.docxCopyright © 2018 by Steven Levitsky and Daniel.docx
Copyright © 2018 by Steven Levitsky and Daniel.docx
dickonsondorris
 
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docxCopyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
dickonsondorris
 
Copyright © 2017 Wolters Kluwer Health Lippincott Williams.docx
Copyright © 2017 Wolters Kluwer Health  Lippincott Williams.docxCopyright © 2017 Wolters Kluwer Health  Lippincott Williams.docx
Copyright © 2017 Wolters Kluwer Health Lippincott Williams.docx
dickonsondorris
 
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docxCopyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
dickonsondorris
 
Copyright © 2017 by University of Phoenix. All rights rese.docx
Copyright © 2017 by University of Phoenix. All rights rese.docxCopyright © 2017 by University of Phoenix. All rights rese.docx
Copyright © 2017 by University of Phoenix. All rights rese.docx
dickonsondorris
 
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docxCopyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
dickonsondorris
 
Copyright © 2016 Pearson Education, Inc. .docx
Copyright © 2016 Pearson Education, Inc.                    .docxCopyright © 2016 Pearson Education, Inc.                    .docx
Copyright © 2016 Pearson Education, Inc. .docx
dickonsondorris
 

More from dickonsondorris (20)

Copyright © eContent Management Pty Ltd. Health Sociology Revi.docx
Copyright © eContent Management Pty Ltd. Health Sociology Revi.docxCopyright © eContent Management Pty Ltd. Health Sociology Revi.docx
Copyright © eContent Management Pty Ltd. Health Sociology Revi.docx
 
Copyright © Pearson Education 2010 Digital Tools in Toda.docx
Copyright © Pearson Education 2010 Digital Tools in Toda.docxCopyright © Pearson Education 2010 Digital Tools in Toda.docx
Copyright © Pearson Education 2010 Digital Tools in Toda.docx
 
Copyright © Jen-Wen Lin 2018 1 STA457 Time series .docx
Copyright © Jen-Wen Lin 2018   1 STA457 Time series .docxCopyright © Jen-Wen Lin 2018   1 STA457 Time series .docx
Copyright © Jen-Wen Lin 2018 1 STA457 Time series .docx
 
Copyright © John Wiley & Sons, Inc. All rights reserved..docx
Copyright © John Wiley & Sons, Inc. All rights reserved..docxCopyright © John Wiley & Sons, Inc. All rights reserved..docx
Copyright © John Wiley & Sons, Inc. All rights reserved..docx
 
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docxCopyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
Copyright © by The McGraw-Hill Companies, Inc. The Aztec Accou.docx
 
Copyright © Cengage Learning. All rights reserved. CHAPTE.docx
Copyright © Cengage Learning.  All rights reserved. CHAPTE.docxCopyright © Cengage Learning.  All rights reserved. CHAPTE.docx
Copyright © Cengage Learning. All rights reserved. CHAPTE.docx
 
Copyright © by Holt, Rinehart and Winston. All rights reserved.docx
Copyright © by Holt, Rinehart and Winston. All rights reserved.docxCopyright © by Holt, Rinehart and Winston. All rights reserved.docx
Copyright © by Holt, Rinehart and Winston. All rights reserved.docx
 
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docxCopyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
Copyright © 2020 by Jones & Bartlett Learning, LLC, an Ascend .docx
 
Copyright © 2019, American Institute of Certified Public Accou.docx
Copyright © 2019, American Institute of Certified Public Accou.docxCopyright © 2019, American Institute of Certified Public Accou.docx
Copyright © 2019, American Institute of Certified Public Accou.docx
 
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docxCopyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
Copyright © 2018 Pearson Education, Inc. All Rights ReservedChild .docx
 
Copyright © 2018 Pearson Education, Inc. C H A P T E R 6.docx
Copyright © 2018 Pearson Education, Inc. C H A P T E R  6.docxCopyright © 2018 Pearson Education, Inc. C H A P T E R  6.docx
Copyright © 2018 Pearson Education, Inc. C H A P T E R 6.docx
 
Copyright © 2018 Capella University. Copy and distribution o.docx
Copyright © 2018 Capella University. Copy and distribution o.docxCopyright © 2018 Capella University. Copy and distribution o.docx
Copyright © 2018 Capella University. Copy and distribution o.docx
 
Copyright © 2018 Pearson Education, Inc.C H A P T E R 3.docx
Copyright © 2018 Pearson Education, Inc.C H A P T E R  3.docxCopyright © 2018 Pearson Education, Inc.C H A P T E R  3.docx
Copyright © 2018 Pearson Education, Inc.C H A P T E R 3.docx
 
Copyright © 2018 by Steven Levitsky and Daniel.docx
Copyright © 2018 by Steven Levitsky and Daniel.docxCopyright © 2018 by Steven Levitsky and Daniel.docx
Copyright © 2018 by Steven Levitsky and Daniel.docx
 
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docxCopyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
Copyright © 2017, 2014, 2011 Pearson Education, Inc. All Right.docx
 
Copyright © 2017 Wolters Kluwer Health Lippincott Williams.docx
Copyright © 2017 Wolters Kluwer Health  Lippincott Williams.docxCopyright © 2017 Wolters Kluwer Health  Lippincott Williams.docx
Copyright © 2017 Wolters Kluwer Health Lippincott Williams.docx
 
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docxCopyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
Copyright © 2016, 2013, 2010 Pearson Education, Inc. All Right.docx
 
Copyright © 2017 by University of Phoenix. All rights rese.docx
Copyright © 2017 by University of Phoenix. All rights rese.docxCopyright © 2017 by University of Phoenix. All rights rese.docx
Copyright © 2017 by University of Phoenix. All rights rese.docx
 
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docxCopyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
Copyright © 2016 John Wiley & Sons, Inc.Copyright © 20.docx
 
Copyright © 2016 Pearson Education, Inc. .docx
Copyright © 2016 Pearson Education, Inc.                    .docxCopyright © 2016 Pearson Education, Inc.                    .docx
Copyright © 2016 Pearson Education, Inc. .docx
 

Recently uploaded

A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Ashish Kohli
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
Assignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docxAssignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docx
ArianaBusciglio
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 

Recently uploaded (20)

A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
Assignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docxAssignment_4_ArianaBusciglio Marvel(1).docx
Assignment_4_ArianaBusciglio Marvel(1).docx
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 

Copyright © 2014 EMC Corporation. All Rights Reserved..docx

  • 1. Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle 1 M odule 2: Data Analytics Lifecycle Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle U pon com pletion of this lesson, you should be able to: • Apply the Data Analytics Lifecycle to a case study scenario • Fram e a business problem
  • 2. as an analytics problem • Identify the four m ain deliverables in an analytics project M odule 2: Data Analytics Lifecycle 2 Copyright © 2014 EM C Corporation. All Rights Reserved. How to Approach Your Analytics Problem s • How do you currently approach your analytics problem s? • Do you follow
  • 3. a m ethodology or som e kind of fram ew ork? • How do you plan for an analytic project? 3 M odule 2: Data Analytics Lifecycle Your Thoughts? Copyright © 2014 EM C Corporation. All Rights Reserved. • Focus your tim e •
  • 4. Ensure rigor and com pleteness • Enable better transition to m em bers of the cross-functional analytic team s Repeatable Scale to additional analysts Support validity of findings 4 “A journey of a thousand m iles begins w ith a single step“ (Lao Tzu) M odule 2: Data Analytics Lifecycle Value of U sing the Data Analytics Lifecycle Copyright © 2014 EM C Corporation. All Rights Reserved.
  • 5. 5 1. W ell-defined processes can help guide any analytic project 2. Focus of Data Analytics Lifecycle is on Data Science projects, not business intelligence 3. Data Science projects tend to require a m ore consultative approach, and differ in a few w ays M ore due diligence in Discovery phase M ore projects w hich lack shape or structure Less predictable data N eed For a Process to Guide Data Science Projects
  • 6. 5 M odule 2: Data Analytics Lifecycle Copyright © 2014 EM C Corporation. All Rights Reserved. Key Roles for a Successful Analytic Project M odule 2: Data Analytics Lifecycle 6 Role Description Business U ser Som eone w ho benefits from the end results and can consult and advise project team on value of end results and how these w
  • 7. ill be operationalized Project Sponsor Person responsible for the genesis of the project, providing the im petus for the project and core business problem , generally provides the funding and w ill gauge the degree of value from the final outputs of the w orking team Project M anager Ensure key m ilestonesand objectives are m et on tim e and at expected quality. BusinessIntelligence Analyst Businessdom ain expertise w ith deep understanding of the data,KPIs, key m
  • 8. etrics and business intelligence from a reporting perspective Data Engineer Deep technical skills to assist w ith tuning SQ L queries for data m anagem ent, extraction and support data ingest to analytic sandbox Database Adm inistrator (DBA) Database Adm inistratorw ho provisions and configures database environm ent to support the analytical needs of the w orking team Data Scientist Provide subject m atter expertise for analytical techniques, data m
  • 9. odeling, applying valid analytical techniques to given business problem s and ensuring overall analytical objectives are m et Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle M odule 2: Data Analytics Lifecycle 7 D iscovery O perationalize M odel P lanning D
  • 10. ata P rep M odel B uilding C om m unicate R esults Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel?
  • 11. Do I have a good idea about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure? 1 23 4 65 Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle Phase 1: Discovery M
  • 12. odule 2: Data Analytics Lifecycle 8 D iscovery O perationalize M odel P lanning D ata P rep M odel B uilding C om m unicate R esults
  • 13. Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel? Do I have a good idea about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure? • Learn the Business Dom
  • 14. ain Determ ine am ount of dom ain know ledge needed to orient you to the data and interpret results dow nstream Determ ine the general analytic problem type (such as clustering, classification) If you don’t know, then conduct initial research to learn about the dom ain area you’ll be analyzing • Learn from the past Have there been previous attem pts in the organization to solve this problem ? If so, w hy did they fail? W hy are w
  • 15. e trying again? How have things changed? 1 Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle Phase 1: Discovery M odule 2: Data Analytics Lifecycle 9 D iscovery O perationalize M odel P lanning D ata P
  • 16. rep M odel B uilding C om m unicate R esults Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel? Do I have a good idea
  • 17. about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure? • Resources Assess available technology Available data – sufficient to m eet your needs People for the w orking team Assess scope of tim e for the project in calendar tim e and person-hours Do you have sufficient resources to attem pt the project? If not, can you get m
  • 18. ore? 1 Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle Phase 1: Discovery M odule 2: Data Analytics Lifecycle 10 D iscovery O perationalize M odel P lanning D ata P rep
  • 19. M odel B uilding C om m unicate R esults Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel? Do I have a good idea about the type of m
  • 20. odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure? • Fram e the problem … ..Fram ing is the process of stating the analytics problem to be solved State the analytics problem , w hy it is im portant, and to w hom Identify key stakeholders and their interests in the project Clearly articulate the current situation and pain points O
  • 21. bjectives – identify w hat needs to be achieved in business term s and w hat needs to be done to m eet the needs W hat is the goal? W hat are the criteria for success? W hat’s “good enough”? W hat is the failure criterion (w hen do w e just stop trying or settle for w hat w e have)? Identify the success criteria, key risks, and stakeholders (such as RACI) 1 Copyright © 2014 EM
  • 22. C Corporation. All Rights Reserved. Tips for Interview ing the Analytics Sponsor • Even if you are “given” an analytic problem you should w ork w ith clients to clarify and fram e the problem You’re typically handed solutions, you need to identify the problem and their desired outcom e Sponsor Interview Tips • Prepare for the interview – draft your questions, review w ith colleague, team • U
  • 23. se open-ended questions, don’t ask leading questions • Probe for details, follow -up • Don’t fill every silence – give them tim e to think • Let them express their ideas, don’t put w ords in their m outh, let them share their feelings • Ask clarifying questions, ask w hy – is that correct? Am I on target? Is there anything else? • U se active listening – repeat it back to m
  • 24. ake sure you heard it correctly • Don’t express your opinions • Be m indful of your body language and theirs – use eye contact, be attentive • M inim ize distractions • Docum ent w hat you heard and review it back w ith the sponsor 1111 M odule 2: Data Analytics Lifecycle Copyright © 2014 EM
  • 25. C Corporation. All Rights Reserved. Tips for Interview ing the Analytics Sponsor Interview Q uestions • W hat is the business problem you’re trying to solve? • W hat is your desired outcom e? • W ill the focus and scope of the problem change if the follow ing dim ensions change: • Tim
  • 26. e – analyzing 1 year or 10 years w orth of data? • People – how w ould this project change this? • Risk – conservative to aggressive • Resources – none to unlim ited (tools, tech, … ..) • Size and attributes of Data • W hat data sources do you have? • W hat industry issues m
  • 27. ay im pact the analysis? • W hat tim elines are you up against? • W ho could provide insight into the project? Consulted? • W ho has final say on the project? 1212 M odule 2: Data Analytics Lifecycle Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle Phase 1: Discovery M odule 2: Data Analytics Lifecycle
  • 29. inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel? Do I have a good idea about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure? • Form
  • 30. ulate Initial Hypotheses IH, H 1 , H 2, H 3 , … H n Gather and assess hypotheses from stakeholders and dom ain experts Prelim inary data exploration to inform discussions w ith stakeholders during the hypothesis form ing stage • Identify Data Sources – Begin Learning the Data Aggregate sources for preview ing the data and provide high-level understanding Review
  • 31. the raw data Determ ine the structures and tools needed Scope the kind of data needed for this kind of problem 1 Copyright © 2014 EM C Corporation. All Rights Reserved. U sing a Sam ple Case Study to Track the Phases in the Data Analytics Lifecycle Situation Synopsis • Retail Bank, Yoyodyne Bank w ants to im prove the N et Present Value (N PV) and retention rate of custom ers
  • 32. • They w ant to establish an effective m arketing cam paign targeting custom ers to reduce the churn rate by at least five percent • The bank w ants to determ ine w hether those custom ers are w orth retaining. In addition, the bank also w ants to analyze reasons for custom er attrition and w hat they can do to keep them • The bank w ants to build a data w arehouse to support M
  • 33. arketing and other related custom er care groups 14 M ini C ase Study: C hurn Prediction for Yoyodyne B ank 14 M odule 2: Data Analytics Lifecycle Copyright © 2014 EM C Corporation. All Rights Reserved. How to Fram e an Analytics Problem Sam ple
  • 34. Business Problem s Q ualifiers Analytical Approach • How can w e im prove on x? • W hat’s happening real-tim e? Trends? • How can w e use analytics differentiate ourselves • How can w
  • 35. e use analytics to innovate? • How can w e stay ahead of our biggest com petitor? W ill the focus and scope of the problem change if the follow ing dim ensions change: • Tim e • People – how w ould x change this? •
  • 36. Risk – conservative/aggressive • Resources – none/unlim ited • Size of Data? Define an analytical approach, including key term s, m etrics, and data needed. Yoyodyne Bank How can w e im prove N et Present Value (N PV) and retention rate of the custom
  • 37. ers? • Tim e: Trailing 5 m onths • People: W orking team and business users from the Bank • Risk: the projectw ill fail if w e cannot determ ine valid predictors of churn • Resources: EDW , analytic sandbox, O LTP
  • 38. system • Data:U se 24 m onths for the training set, then analyze 5 m onths of historical data for those custom ers w ho churned How do w e identify churn/no churn for a custom er? Pilot study follow ed full scale analytical m odel 1515 M odule 2: Data Analytics Lifecycle
  • 39. C hurn Prediction for Yoyodyne B ank M ini C ase Study Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle Phase 2: Data Preparation M odule 2: Data Analytics Lifecycle 16 D iscovery O perationalize M odel
  • 40. P lanning D ata P rep M odel B uilding C om m unicate R esults Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good
  • 41. quality data to start building the m odel? Do I have a good idea about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure? • Prepare Analytic Sandbox W ork space for the analytic team 10x+ vs. EDW • Perform ELT Determ
  • 42. ine needed transform ations Assess data quality and structuring Derive statistically useful m easures Determ ine and establish data connections for raw data Execute Big ELT and/or Big ETL • U seful Tools for this phase: • For D ata Transform ation & C leansing: S Q L, H adoop, M apR educe, A
  • 43. lpine M iner 2 Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle Phase 2: Data Preparation M odule 2: Data Analytics Lifecycle 17 D iscovery O perationalize M odel P lanning D ata P
  • 44. rep M odel B uilding C om m unicate R esults Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel? Do I have a good idea
  • 45. about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure? • Fam iliarize yourself w ith the data thoroughly List your data sources W hat’s needed vs. w hat’s available • Data Conditioning Clean and norm alize data Discern w hat you keep vs. w
  • 46. hat you discard • Survey & Visualize O verview, zoom & filter, details-on-dem and Descriptive Statistics Data Q uality • U seful Tools for this phase: • D escriptive S tatistics on candidate variables for diagnostics & quality • Visualization: R (base package, ggplot and lattice), G nuP
  • 47. lot, G gobi/R ggobi, S potfire, Tableau 2 Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle Phase 3: M odel Planning M odule 2: Data Analytics Lifecycle 18 D iscovery O perationalize M odel
  • 48. P lanning D ata P rep M odel B uilding C om m unicate R esults Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to
  • 49. start building the m odel? Do I have a good idea about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure? • Determ ine M ethods Select m ethods based on hypotheses, data structure and volum e Ensure techniques and approach w ill m
  • 50. eet business objectives • Techniques & W orkflow Candidate tests and sequence Identify and docum ent m odeling assum ptions • U seful Tools for this phase: R /P ostgresS Q L, S Q L A nalytics, A lpine M
  • 51. iner, S A S /A C C E S S , S P S S /O B D C 3 Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle
  • 52. Phase 3: M odel Planning M odule 2: Data Analytics Lifecycle 19 D iscovery O perationalize M odel P lanning D ata P rep M odel B uilding C om m
  • 53. unicate R esults Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel? Do I have a good idea about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e
  • 54. failed for sure? • Data Exploration • Variable Selection Inputs from stakeholders and dom ain experts Capture essence of the predictors, leverage a technique for dim ensionality reduction Iterative testing to confirm the m ost significant variables • M odel Selection Conversion to SQ L or database language for best perform ance Choose technique based on the end goal
  • 55. 3 Copyright © 2014 EM C Corporation. All Rights Reserved. Sam ple Research: Churn Prediction in O ther Verticals M arket Sector Analytic Techniques/M ethods U sed W ireless Telecom DM EL m ethod (data m ining by evolutionary learning) Retail Business Logistic regression, ARD (autom atic
  • 56. relevance determ ination), decision tree Daily Grocery M LR (m ultiple linear regression), ARD, and decision tree W ireless Telecom N eural netw ork, decision tree, hierarchical neurofuzzy system s, rule evolver Retail Banking M ultiple regression W ireless Telecom Logistic regression, neural netw ork, decision tree 2020 M odule 2: Data Analytics Lifecycle M
  • 57. ini C ase Study: C hurn Prediction for Yoyodyne B ank • After conducting research on churn prediction, you have identified m any m ethods for analyzing custom er churn across m ultiple verticals (those in bold are taught in this course) • At this point, a Data Scientist w ould assess the m ethods and select the best m odel for the situation
  • 58. Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle Phase 4: M odel Building M odule 2: Data Analytics Lifecycle 21 D iscovery O perationalize M odel P lanning D ata P rep M odel B
  • 59. uilding C om m unicate R esults Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel? Do I have a good idea about the type of m odel to try? Can I refine the analytic plan?
  • 60. Is the m odel robust enough? Have w e failed for sure? • Develop data sets for testing, training, and production purposes N eed to ensure that the m odel data is sufficiently robust for the m odel and analytical techniques Sm aller, test sets for validating approach, training set for initial experim ents • G et the best environm ent you can for building m odels and w orkflow
  • 61. s… fast hardw are, parallel processing • U seful Tools for this phase: R , P L/R , S Q L, A lpine M iner, S A S E nterprise M iner 4 Copyright © 2014 EM C Corporation. All Rights Reserved.
  • 62. Data Analytics Lifecycle Phase 5: Com m unicate Results D iscovery O perationalize M odel P lanning D ata P rep M odel B uilding C om m unicate
  • 63. R esults Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel? Do I have a good idea about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure?
  • 64. Did w e succeed? Did w e fail? • Interpret the results • Com pare to IH’s from Phase 1 • Identify key findings • Q uantify business value • Sum m arizing findings, depending on audience 5 For the YoyoD yne C ase S
  • 65. tudy, w hat w ould be som e possible results and key findings? M ini C ase Study: C hurn Prediction for Yoyodyne B ank M odule 2: Data Analytics Lifecycle 22 Copyright © 2014 EM C Corporation. All Rights Reserved. Data Analytics Lifecycle Phase 6: O perationalize
  • 66. M odule 2: Data Analytics Lifecycle 23 D iscovery O perationalize M odel P lanning D ata P rep M odel B uilding C om m unicate R esults
  • 67. Do I have enough inform ation to draft an analytic plan and share for peer review ? Do I have enough good quality data to start building the m odel? Do I have a good idea about the type of m odel to try? Can I refine the analytic plan? Is the m odel robust enough? Have w e failed for sure? •
  • 68. Run a pilot • Assess the benefits • Provide final deliverables • Im plem ent the m odel in the production environm ent • Define process to update, retrain, and retire the m odel, as needed 6 Copyright © 2014 EM C Corporation. All Rights Reserved. Analytic Plan
  • 69. 24 Com ponentsof Analytic Plan RetailBanking: Yoyodyne Bank Phase 1: Discovery Business Problem Fram ed How do w e identify churn/no churn for a custom er? InitialHypotheses Transaction volum e and type are key predictors of churn rates. Data 5 m onths of custom er account history.
  • 70. Phase 3: M odel Planning -Analytic Technique Logistic regression to identify m ost influentialfactors predicting churn. Phase 5: Result& Key Findings O nce custom ers stop using their accounts for gas and groceries,they w ill soon erode their accounts and churn. If custom ers use their debitcard few er than 5 tim es per m onth, they w ill leave the bank w ithin 60 days.
  • 71. BusinessIm pact If w e can target custom ers w ho are high-risk for churn, w e can reduce custom er attrition by 25% . This w ould save $3 m illion in lost of custom er revenue and avoid $1.5 m illion in new custom er acquisition costs each year. 24 M odule 2: Data Analytics Lifecycle M ini C
  • 72. ase Study: C hurn Prediction for R etail B anking Copyright © 2014 EM C Corporation. All Rights Reserved. Key O utputs from a Successful Analytic Project, by RoleM odule 2: Data Analytics Lifecycle 25 Role Description W hat the Role N eeds in the Final Deliverables Business U ser Som
  • 73. eone w ho benefits from the end resultsand can consult and advise project team on value of end results and how these w ill be operationalized • SponsorPresentation addressing: • Are the results good for m e? • W hatare the benefits of the findings? • W hat are the im plicationsof this for m e? Project Sponsor Person responsible for the genesis of the project, providing
  • 74. the im petus for the project and core business problem , generally provides the funding and w ill gauge the degree of value from the final outputs of the w orking team • SponsorPresentation addressing: • W hat’s the business im pact of doing this? • W hat are the risks? RO I? • How can this be evangelized w
  • 75. ithin the organization (and beyond)? Project M anager Ensure key m ilestonesand objectives are m et on tim e and at expected quality. Business Intelligence Analyst Businessdom ain expertise w ith deep understanding of the data,KPIs, key m etrics and business intelligence from a reporting perspective • Show the analyst presentation •
  • 76. Determ ine if the reports w ill change Data Engineer Deep technical skills to assist w ith tuning SQ L queries for data m anagem ent, extraction and support data ingest to analytic sandbox • Share the code from the analytical project • Create technicaldocum ent on how to im plem ent it. Database Adm
  • 77. inistrator (DBA) Database Adm inistratorw ho provisions and configures database environm ent to support the analytical needs of the w orking team • Share the code from the analytical project • Create technicaldocum ent on how to im plem ent it. Data Scientist Provide subject m atter expertise
  • 78. for analytical techniques, data m odeling, applying valid analytical techniques to given business problem s and ensuring overall analytical objectives are m et • Show the analyst presentation • Share the code Copyright © 2014 EM C Corporation. All Rights Reserved. 4 Core Deliverables to M eet M ost Stakeholder N eeds 1. Presentation for Project Sponsors
  • 79. • “Big picture" takeaw ays for executive level stakeholders • Determ ine key m essages to aid their decision-m aking process • Focus on clean, easy visuals for the presenter to explain and for the view er to grasp 2. Presentation for Analysts • Business process changes • Reporting changes • Fellow Data Scientists w ill w ant the details and are com
  • 80. fortable w ith technical graphs (such as RO C curves, density plots, histogram s) 3. Code for technical people 4. Technical specs of im plem enting the code 26 M odule 2: Data Analytics Lifecycle Copyright © 2014 EM C Corporation. All Rights Reserved. Analyst W ish List for a Successful Analytics Project
  • 81. Data & W orkspaces • Access to all the data, including aggregated O LAP data, BI tools, raw data, structured and various states of unstructured data as needed • U p-to-date data dictionary to describe the data • Area for staging and production data sets • Ability to m ove data back and forth betw een w orkspaces and staging areas • Analytic sandbox w ith strong com pute pow er to experim
  • 82. ent and play w ith the data Tools • Statistical/m athem atical/visual softw are of choice for a given situation and problem set, such as SAS, M atlab, R, java tools, Tableau, Spotfire • Collaboration: an online platform or environm ent for collaboration and com m unicating w ith team m em bers • Tool or place to log errorsw
  • 83. ith system s, environm ents or data sets 2727 M odule 2: Data Analytics Lifecycle Copyright © 2014 EM C Corporation. All Rights Reserved. Concepts in Practice Greenplum ’s Approach to Analytics M odule 2: Data Analytics Lifecycle 28 ED C PLATFO RM D ata Analytics
  • 85. here and w hen? How and w hy did it happen? M agnetic Attract all kinds of data Agile Flexible and elastic data structures Deep Rich data repository and algorithm ic engine S ource: M A D S
  • 86. kills: N ew A nalysis P ractices for B ig D ata, M arch 2009 Copyright © 2014 EM C Corporation. All Rights Reserved. “The pessim ist – com plains about the w ind The optim ist – expects it to change The leader– adjusts the sails John M
  • 87. axw ell (Leadership Author) 29 M odule 2: Data Analytics Lifecycle Copyright © 2014 EM C Corporation. All Rights Reserved. Check Your Know ledge • In w hich phase w ould you expect to invest m ost of your project tim e and w hy? W here w ould expect to spend the least tim e?
  • 88. • W hat are the benefits of doing a pilot program before a full scale rollout of a new analytical m ethodology? Discuss this in the context of the m ini case study. • W hat kinds of tools w ould be used in the follow ing phases, and for w hich kinds of use scenarios? Phase 2: Data Preparation Phase 4: M odel Execution • N ow that you have com pleted the analytical project at Yoyodyne, you have an
  • 89. opportunity to repurpose this approach for an online eCom m erce com pany. W hat phases of the lifecycle do you need to focus on to identify w ays to do this? 30 M odule 2: Data Analytics Lifecycle Your Thoughts? Copyright © 2014 EM C Corporation. All Rights Reserved. Sum m ary Key points covered in this lesson: • The Data Analytics Lifecycle w
  • 90. as applied to a case study scenario • A business problem w as fram ed as an analytics problem • The four m ain deliverables in an analytics project w ere identified M odule 2: Data Analytics Lifecycle 31