SlideShare a Scribd company logo
Khosrow Hassibi, PhD
CWRU EECS Seminar Series
3
• "Data Science(DS)" is nothing new but the term itself and the
recent level of interest in it.
6
• Data mining
• Predictive analytics / Advanced analytics
• Machine learning
Machine Learning (ML) Traditional statistics (TS)
Goal: “learning” from data of all sorts Goal: Analyzing and summarizing data
No rigid pre-assumptions about the problem and data
distributions in general
Tight assumptions about the problem and data
distributions
More liberal in the techniques and approaches Conservative in techniques and approaches
Generalization is pursued empirically through training,
validation and test datasets
Generalization is pursued using statistical tests on the
training dataset
Not shy of using heuristics in approaches in search of
a “good solution”
Using tight initial assumptions about data and the problem,
typically in search of an optimal solution under those
assumptions
Redundancy in features (variables) is okay, and often
helpful. Preferable to use algorithms designed to
handle large number of features
Often requires independent features. Preferable to use
less number of input features
Does not promote data reduction prior to learning.
Promotes a culture of abundance: “the more data, the
better”
Promotes data reduction as much as possible before
modeling (sampling, less inputs, …)
Has faced with solving more complex problems in
learning, reasoning, perception, knowledge
presentation, …
Mainly focused on traditional data analysis
15
CompetitiveAdvantage
Analytic Sophistication
Std.
Reports
Ad hoc
Reports
OLAP
Drill Down
Dashboard
& Visualize
Alerts
Statistical
Analysis
Forecasting
Predictive
Modeling
Optimization
What Happened?
What Happened specifically?
Where exactly is the problem?
What is happening overall?
What actions are needed?
Why is this happening?
What is the trend?
What will happen?
What is the best that can happen given constraints?
Advanced Analytics:
Predictive & Proactive
Basic Analytics:
Descriptive & Reactive
16
• Speed of processing response
advanced analytics
real-time
• Data preparation
advanced
analytics
18
19
Organization
Category
Information
Management
Proficiency
Analytics
Proficiency
Data Culture
Aspirational Low Low
Line of business
driven
Experienced Medium Medium
Moving toward
enterprise driven
Transformed High High Enterprise driven
Source: MIT Sloan, “Analytics: The Widening Divide,” 2011.
• Three progressive levels of analytics sophistication
Data
Scenario
Big Data? Storage Analysis Business Value
1 No Standard Standard Known
2 Yes Possible Nonstandard Somewhat known
3 Yes Possible Not possible Not known
4 Yes Not possible — Not known
20
• Big Data Scenarios in Transformed or Experienced Analytics
Environments
21
Platform Architecture Storage
1 Workstation Multicore Local
2 Enterprise Server SMP Shared
3 Cluster or Grid CCSS Shared
4 General MPP Database SN Distributed data
5 Hadoop SN Distributed data
6 MPP Analytics Appliance SN Distributed data
7 MPP In-Memory Analytics
Appliance
SNIM Distributed in-memory
data (volatile)
SMP: Symmetric Multi-Processing
SN: Shared Nothing Distributed Computing
CCSS: Cluster Computing with Shared Storage
SNIM: Shared Nothing In-Memory Distributed Computing
23
25
• Myth #1:
• Myth #2:
• Myth #3:
• Myth #4:
27
Aggregations, Joins,
Sorts, Transformations
v
28
31
Product holdings
Banking tenure
Account Balances
Checking account
Data
Demographic/
Formographic data
Web Data
ATM
transaction
Discount
Brokerage Data
Online/Bill pay Data
The warehouse/data lake hosts data from different sources which provide a
comprehensive view of customer information.
Call Center Data
Other Accounts
Data
Savings Account
Data
Marketing
Response Data ……..
32
33
34
35
Debt<10% of Income Debt=0%
Good
Credit
Risks
Bad
Credit
Risks
Good
Credit
Risks
Yes
YesYes
NO
NONO
Income>$40K
Development ADS
Production
ADS Model
Model
Development
Model
Deployment
Scores
36
Debt<10% of Income Debt=0%
Good
Credit
Risks
Bad
Credit
Risks
Good
Credit
Risks
Yes
YesYes
NO
NONO
Income>$40K
Development ADS
Production
ADS Model
Model
Development
Model
Deployment
Scores
Data Store/
Warehouse
37
Debt<10% of Income Debt=0%
Good
Credit
Risks
Bad
Credit
Risks
Good
Credit
Risks
Yes
YesYes
NO
NONO
Income>$40K
Production
ADS
Model
Model
Development
Model
Deployment
Scores
Data Store/
Warehouse
Development ADS
38
39
40
41
Model Development with
Reusable ADS
Analytic Server for
Model Development
Aggregations, Joins, Sorts
Transformations
Aggregations, Joins,
Sorts, Transformations
v
42
43
44
45
Horizontals: Applications
Customer
Lifestyle
Life-stage
Lifetime
Value
Customer
Satisfaction
Survey
Analysis
Customer
Acquisition
Campaign
Effectiveness
Customer
Retention
Cross Sell
Up Sell
Propensity
To Buy
Market
Segmentation
Identity
Theft
Failure/
Defect
Detection
Fraud
Prevention
Six Sigma
Process
Yield
Optimization
Demand
Forecasting
Risk
Management
Financial
Forecast
Pricing
Analysis
Customer
Segmentation
Product
Recommen-
dations
Customer Marketing Sales Operations Finance
46
Cost
Revenue
Customer leavesCustomer joins
(or rejoins)
Subscription life (months)
0
2 3 4
Create
Interest
Acquisition
Cost
Recurring Revenue
1
2 3
4 Direct Cost
to Serve
5
Cross-sell
Upsell
6Renewal
7
Migration
8 9
10
Churn Bad debt
Win back
12
Lifetime Value
-500
-1,000
1,000
500
24 Source: McKinsey & Co
47
Offer 1
Offer 2
Offer 3
Offer 4
Offer 5
Offer 6
Call Centres
Face 2 Face: Retail /
Dealer / Sales force
Web Presence / Email
SMS / MMS/ WAP
Direct Mail / Bill Inserts /
Bill Messages
Product
Targets
Contact Centre
Capacity
Contact Frequency /
Permissions /
Preferences
Min List
Sizes
Constraints
Saturation
Wrong Timing
Missed
Opportunity
Many Offers Many Channels Millions of Customers
Predictive
Model
(ANNs)
Feature
Computation
ScoreRaw Inputs Cooked inputs
Profiles
(Memory)
Cursive machine-print text
Real-time Fraud Detection
* See Hinton lecture at Google
Real-time Transactional Fraud Detection
using Neural Networks
OCR of machine-print cursive text using
neural networks (typically using hundreds of
thousand of weights.
Recent Book: “High-Performance Data Mining and Big Data
Analytics”
My Blog
What is Data Science and How to Succeed in it

More Related Content

What's hot

Baworld adapting to whats happening
Baworld adapting to whats happeningBaworld adapting to whats happening
Baworld adapting to whats happening
Dave Davis PMP, PgMP, PBA
 
CommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_UnbrandedCommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_Unbranded
Jim Parnitzke
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
Vivastream
 

What's hot (20)

Baworld adapting to whats happening
Baworld adapting to whats happeningBaworld adapting to whats happening
Baworld adapting to whats happening
 
1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptop1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptop
 
CommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_UnbrandedCommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_Unbranded
 
Data science 101
Data science 101Data science 101
Data science 101
 
Analytics from data to better decision
Analytics   from data to better decisionAnalytics   from data to better decision
Analytics from data to better decision
 
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
 
E strategies socialbusiness2017
E strategies socialbusiness2017E strategies socialbusiness2017
E strategies socialbusiness2017
 
The Impact of Data Science on Finance
The Impact of Data Science on FinanceThe Impact of Data Science on Finance
The Impact of Data Science on Finance
 
How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
 
Developing an Analytical Mindset – Becoming an Analytical Competitor
Developing an Analytical Mindset – Becoming an Analytical CompetitorDeveloping an Analytical Mindset – Becoming an Analytical Competitor
Developing an Analytical Mindset – Becoming an Analytical Competitor
 
Making a Systematic Business Case for Analytics
Making a Systematic Business Case for AnalyticsMaking a Systematic Business Case for Analytics
Making a Systematic Business Case for Analytics
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
1000 track1 gland_sims
1000 track1 gland_sims1000 track1 gland_sims
1000 track1 gland_sims
 
1030 track 3 rolleston_using our laptop
1030 track 3 rolleston_using our laptop1030 track 3 rolleston_using our laptop
1030 track 3 rolleston_using our laptop
 
Impact of Data Science
Impact of Data Science Impact of Data Science
Impact of Data Science
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
 
INTRODUCTION TO BUSINESS ANALYTICS
INTRODUCTION TO BUSINESS ANALYTICSINTRODUCTION TO BUSINESS ANALYTICS
INTRODUCTION TO BUSINESS ANALYTICS
 
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive AnalyticsDI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
 
Estrategies intro
Estrategies introEstrategies intro
Estrategies intro
 

Viewers also liked

2020vision Case Praxis
2020vision Case Praxis2020vision Case Praxis
2020vision Case Praxis
Friso de Jong
 
Graydon handboek Data Driven Marketing voor B2B marketeers
Graydon handboek Data Driven Marketing voor B2B marketeersGraydon handboek Data Driven Marketing voor B2B marketeers
Graydon handboek Data Driven Marketing voor B2B marketeers
Niels de Jager
 
Mark2052 focus group powerpoint
Mark2052 focus group powerpointMark2052 focus group powerpoint
Mark2052 focus group powerpoint
Gabriel Yuen
 

Viewers also liked (20)

From Data To Insights
From Data To InsightsFrom Data To Insights
From Data To Insights
 
Sap presentation unleash the power of big data with the sap hana platform
Sap presentation   unleash the power of big data with the sap hana platformSap presentation   unleash the power of big data with the sap hana platform
Sap presentation unleash the power of big data with the sap hana platform
 
T-Mobile - Mobiele data explosie door Raymond Perrenet, t-mobile
T-Mobile - Mobiele data explosie door Raymond Perrenet, t-mobileT-Mobile - Mobiele data explosie door Raymond Perrenet, t-mobile
T-Mobile - Mobiele data explosie door Raymond Perrenet, t-mobile
 
Design your emails for interruption
Design your emails for interruptionDesign your emails for interruption
Design your emails for interruption
 
Capturing Customer Data and Insights that Elevate the Customer Experience
Capturing Customer Data and Insights that Elevate the Customer ExperienceCapturing Customer Data and Insights that Elevate the Customer Experience
Capturing Customer Data and Insights that Elevate the Customer Experience
 
Of insights, data and the stakeholders
Of insights, data and the stakeholdersOf insights, data and the stakeholders
Of insights, data and the stakeholders
 
2013: As-is Selection Best practices
2013: As-is Selection Best practices2013: As-is Selection Best practices
2013: As-is Selection Best practices
 
2020vision Case Praxis
2020vision Case Praxis2020vision Case Praxis
2020vision Case Praxis
 
Graydon handboek Data Driven Marketing voor B2B marketeers
Graydon handboek Data Driven Marketing voor B2B marketeersGraydon handboek Data Driven Marketing voor B2B marketeers
Graydon handboek Data Driven Marketing voor B2B marketeers
 
Conversation Company in de praktijk, hobbels en hoogtepunten - Stéphan Lam (M...
Conversation Company in de praktijk, hobbels en hoogtepunten - Stéphan Lam (M...Conversation Company in de praktijk, hobbels en hoogtepunten - Stéphan Lam (M...
Conversation Company in de praktijk, hobbels en hoogtepunten - Stéphan Lam (M...
 
Mark2052 focus group powerpoint
Mark2052 focus group powerpointMark2052 focus group powerpoint
Mark2052 focus group powerpoint
 
Online communicatieonderzoek - MIE 2009 - Ruigrok | NetPanel & KPN
Online communicatieonderzoek - MIE 2009 - Ruigrok | NetPanel & KPNOnline communicatieonderzoek - MIE 2009 - Ruigrok | NetPanel & KPN
Online communicatieonderzoek - MIE 2009 - Ruigrok | NetPanel & KPN
 
B2B Content Marketing Insights - What Does The Data Tell Us?
B2B Content Marketing Insights - What Does The Data Tell Us?B2B Content Marketing Insights - What Does The Data Tell Us?
B2B Content Marketing Insights - What Does The Data Tell Us?
 
Striata Mobile Loyalty Programme Summit 2016
Striata Mobile Loyalty Programme Summit 2016Striata Mobile Loyalty Programme Summit 2016
Striata Mobile Loyalty Programme Summit 2016
 
Presentatie Steven van Belleghem Landgoed Wolfslaar Inspiratie Avond Thema ma...
Presentatie Steven van Belleghem Landgoed Wolfslaar Inspiratie Avond Thema ma...Presentatie Steven van Belleghem Landgoed Wolfslaar Inspiratie Avond Thema ma...
Presentatie Steven van Belleghem Landgoed Wolfslaar Inspiratie Avond Thema ma...
 
Get ready for marketing 2020 door Steven van Belleghem (Crowdale webinar) 03-...
Get ready for marketing 2020 door Steven van Belleghem (Crowdale webinar) 03-...Get ready for marketing 2020 door Steven van Belleghem (Crowdale webinar) 03-...
Get ready for marketing 2020 door Steven van Belleghem (Crowdale webinar) 03-...
 
Customer First: From Data to Insights to Impact
Customer First: From Data to Insights to ImpactCustomer First: From Data to Insights to Impact
Customer First: From Data to Insights to Impact
 
Emakina Nederland - Loyalty Rules!
Emakina Nederland - Loyalty Rules!Emakina Nederland - Loyalty Rules!
Emakina Nederland - Loyalty Rules!
 
Predictive Analytics World Berlin 2016 Call for Speakers
Predictive Analytics World Berlin 2016 Call for SpeakersPredictive Analytics World Berlin 2016 Call for Speakers
Predictive Analytics World Berlin 2016 Call for Speakers
 
Customer loyalty dimensions
Customer loyalty dimensionsCustomer loyalty dimensions
Customer loyalty dimensions
 

Similar to What is Data Science and How to Succeed in it

Building the Analytics Capability
Building the Analytics CapabilityBuilding the Analytics Capability
Building the Analytics Capability
Bala Iyer
 

Similar to What is Data Science and How to Succeed in it (20)

Datamining
DataminingDatamining
Datamining
 
Datamining
DataminingDatamining
Datamining
 
Data is love data viz best practices
Data is love   data viz best practicesData is love   data viz best practices
Data is love data viz best practices
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industry
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Marketers Flunk The Big Data Text
Marketers Flunk The Big Data TextMarketers Flunk The Big Data Text
Marketers Flunk The Big Data Text
 
Building the Analytics Capability
Building the Analytics CapabilityBuilding the Analytics Capability
Building the Analytics Capability
 
Does big data = big insights?
Does big data = big insights?Does big data = big insights?
Does big data = big insights?
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualization
 
Data Visualization for Business - Pallav Nadhani
Data Visualization for Business - Pallav NadhaniData Visualization for Business - Pallav Nadhani
Data Visualization for Business - Pallav Nadhani
 
Power of Small Data
Power of Small DataPower of Small Data
Power of Small Data
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Wagner Analytics Bb World2012
Wagner Analytics Bb World2012Wagner Analytics Bb World2012
Wagner Analytics Bb World2012
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Bigdata for sme-industrial intelligence information-24july2017-final
Bigdata for sme-industrial intelligence information-24july2017-finalBigdata for sme-industrial intelligence information-24july2017-final
Bigdata for sme-industrial intelligence information-24july2017-final
 
The Softer Skills that analysts need (beyond Data Visualisation)
The Softer Skills that analysts need (beyond Data Visualisation)The Softer Skills that analysts need (beyond Data Visualisation)
The Softer Skills that analysts need (beyond Data Visualisation)
 
Self-service Analytic for Business Users-19july2017-final
Self-service Analytic for Business Users-19july2017-finalSelf-service Analytic for Business Users-19july2017-final
Self-service Analytic for Business Users-19july2017-final
 
Giving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business QuestionsGiving Organisations new Capabilities to ask the Right Business Questions
Giving Organisations new Capabilities to ask the Right Business Questions
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Creating a Data-Driven Organization -- thisismetis meetup
Creating a Data-Driven Organization -- thisismetis meetupCreating a Data-Driven Organization -- thisismetis meetup
Creating a Data-Driven Organization -- thisismetis meetup
 

Recently uploaded

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 

Recently uploaded (20)

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 

What is Data Science and How to Succeed in it

  • 1. Khosrow Hassibi, PhD CWRU EECS Seminar Series
  • 2.
  • 3. 3
  • 4.
  • 5. • "Data Science(DS)" is nothing new but the term itself and the recent level of interest in it.
  • 6. 6
  • 7.
  • 8.
  • 9. • Data mining • Predictive analytics / Advanced analytics • Machine learning
  • 10.
  • 11. Machine Learning (ML) Traditional statistics (TS) Goal: “learning” from data of all sorts Goal: Analyzing and summarizing data No rigid pre-assumptions about the problem and data distributions in general Tight assumptions about the problem and data distributions More liberal in the techniques and approaches Conservative in techniques and approaches Generalization is pursued empirically through training, validation and test datasets Generalization is pursued using statistical tests on the training dataset Not shy of using heuristics in approaches in search of a “good solution” Using tight initial assumptions about data and the problem, typically in search of an optimal solution under those assumptions Redundancy in features (variables) is okay, and often helpful. Preferable to use algorithms designed to handle large number of features Often requires independent features. Preferable to use less number of input features Does not promote data reduction prior to learning. Promotes a culture of abundance: “the more data, the better” Promotes data reduction as much as possible before modeling (sampling, less inputs, …) Has faced with solving more complex problems in learning, reasoning, perception, knowledge presentation, … Mainly focused on traditional data analysis
  • 12.
  • 13.
  • 14.
  • 15. 15 CompetitiveAdvantage Analytic Sophistication Std. Reports Ad hoc Reports OLAP Drill Down Dashboard & Visualize Alerts Statistical Analysis Forecasting Predictive Modeling Optimization What Happened? What Happened specifically? Where exactly is the problem? What is happening overall? What actions are needed? Why is this happening? What is the trend? What will happen? What is the best that can happen given constraints? Advanced Analytics: Predictive & Proactive Basic Analytics: Descriptive & Reactive
  • 16. 16
  • 17.
  • 18. • Speed of processing response advanced analytics real-time • Data preparation advanced analytics 18
  • 19. 19 Organization Category Information Management Proficiency Analytics Proficiency Data Culture Aspirational Low Low Line of business driven Experienced Medium Medium Moving toward enterprise driven Transformed High High Enterprise driven Source: MIT Sloan, “Analytics: The Widening Divide,” 2011. • Three progressive levels of analytics sophistication
  • 20. Data Scenario Big Data? Storage Analysis Business Value 1 No Standard Standard Known 2 Yes Possible Nonstandard Somewhat known 3 Yes Possible Not possible Not known 4 Yes Not possible — Not known 20 • Big Data Scenarios in Transformed or Experienced Analytics Environments
  • 21. 21 Platform Architecture Storage 1 Workstation Multicore Local 2 Enterprise Server SMP Shared 3 Cluster or Grid CCSS Shared 4 General MPP Database SN Distributed data 5 Hadoop SN Distributed data 6 MPP Analytics Appliance SN Distributed data 7 MPP In-Memory Analytics Appliance SNIM Distributed in-memory data (volatile) SMP: Symmetric Multi-Processing SN: Shared Nothing Distributed Computing CCSS: Cluster Computing with Shared Storage SNIM: Shared Nothing In-Memory Distributed Computing
  • 22.
  • 23. 23
  • 24.
  • 25. 25
  • 26. • Myth #1: • Myth #2: • Myth #3: • Myth #4: 27 Aggregations, Joins, Sorts, Transformations v
  • 27. 28
  • 28.
  • 29. 31 Product holdings Banking tenure Account Balances Checking account Data Demographic/ Formographic data Web Data ATM transaction Discount Brokerage Data Online/Bill pay Data The warehouse/data lake hosts data from different sources which provide a comprehensive view of customer information. Call Center Data Other Accounts Data Savings Account Data Marketing Response Data ……..
  • 30. 32
  • 31. 33
  • 32. 34
  • 33. 35 Debt<10% of Income Debt=0% Good Credit Risks Bad Credit Risks Good Credit Risks Yes YesYes NO NONO Income>$40K Development ADS Production ADS Model Model Development Model Deployment Scores
  • 34. 36 Debt<10% of Income Debt=0% Good Credit Risks Bad Credit Risks Good Credit Risks Yes YesYes NO NONO Income>$40K Development ADS Production ADS Model Model Development Model Deployment Scores Data Store/ Warehouse
  • 35. 37 Debt<10% of Income Debt=0% Good Credit Risks Bad Credit Risks Good Credit Risks Yes YesYes NO NONO Income>$40K Production ADS Model Model Development Model Deployment Scores Data Store/ Warehouse Development ADS
  • 36. 38
  • 37. 39
  • 38. 40
  • 39. 41 Model Development with Reusable ADS Analytic Server for Model Development Aggregations, Joins, Sorts Transformations Aggregations, Joins, Sorts, Transformations v
  • 40. 42
  • 41. 43
  • 42. 44
  • 43. 45 Horizontals: Applications Customer Lifestyle Life-stage Lifetime Value Customer Satisfaction Survey Analysis Customer Acquisition Campaign Effectiveness Customer Retention Cross Sell Up Sell Propensity To Buy Market Segmentation Identity Theft Failure/ Defect Detection Fraud Prevention Six Sigma Process Yield Optimization Demand Forecasting Risk Management Financial Forecast Pricing Analysis Customer Segmentation Product Recommen- dations Customer Marketing Sales Operations Finance
  • 44. 46 Cost Revenue Customer leavesCustomer joins (or rejoins) Subscription life (months) 0 2 3 4 Create Interest Acquisition Cost Recurring Revenue 1 2 3 4 Direct Cost to Serve 5 Cross-sell Upsell 6Renewal 7 Migration 8 9 10 Churn Bad debt Win back 12 Lifetime Value -500 -1,000 1,000 500 24 Source: McKinsey & Co
  • 45. 47 Offer 1 Offer 2 Offer 3 Offer 4 Offer 5 Offer 6 Call Centres Face 2 Face: Retail / Dealer / Sales force Web Presence / Email SMS / MMS/ WAP Direct Mail / Bill Inserts / Bill Messages Product Targets Contact Centre Capacity Contact Frequency / Permissions / Preferences Min List Sizes Constraints Saturation Wrong Timing Missed Opportunity Many Offers Many Channels Millions of Customers
  • 46.
  • 47. Predictive Model (ANNs) Feature Computation ScoreRaw Inputs Cooked inputs Profiles (Memory) Cursive machine-print text Real-time Fraud Detection * See Hinton lecture at Google Real-time Transactional Fraud Detection using Neural Networks OCR of machine-print cursive text using neural networks (typically using hundreds of thousand of weights.
  • 48.
  • 49. Recent Book: “High-Performance Data Mining and Big Data Analytics” My Blog