SlideShare a Scribd company logo
Predictive Analytics
Peter Bruce
THE INSTITUTE FOR STATISTICS EDUCATION
at Statistics.com
peter.bruce@statistics.com
About Statistics.com
THE INSTITUTE FOR STATISTICS EDUCATION
• 100+ courses, introductory and advanced
• Traditional statistics, data mining, machine
learning, text mining, clinical
trials, optimization, use of R
• All online
• Typically 4 weeks, scheduled dates
• Don’t need to be online particular times/days
• Private discussion forum with instructors - noted
authors & experts
A man walks into a Target® store…
Predictive Analytics
• In marketing, used for model driven targeted
sales efforts
• Also… will loan default, what diagnosis (given
symptoms), is tax return fraudulent, …
Market Research
• Traditionally surveys, analysis, information
gathering, strategy
• Moving online increases the amount of
data, speeds its flow, and makes it more
accessible
Washington Post (web)
• 35 different reports tracking traffic daily
• Midday report “are we on track for visitors?”
• # visitors from key domains - .gov, .mil, .senate
or .house
Daily Mail (UK web)
• A traditional ingredient is stories about
animals – tracked on web
• “The animals that do best are
monkeys, dogs, and cats, in that order…”
Martin Clark (editor)
Back to Target
Predictive Analytics
• Goes beyond the obvious, capturing
complexity
• Implemented for real-time behavior and
decisions
Pregnant?
• Obvious retail clues – maternity clothes, baby
food, baby clothes, crib …
• These may be too late
• Earlier clues not so obvious –
lotions, supplements, and, esp., combinations
and changes in purchase patterns
• Data mining algorithms can capture these less
obvious, more complex signals
Training the Model
• Bridal registry
• Women of similar demographic not on bridal
registry
• Together, the training set
– Known outcome
– Purchase data over time
Hypothetical Data
Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90 Registry ?
1011 1 1 1 1 0 1 0
1012 1 0 1 0 1 0 1
1013 1 1 0 1 1 0 1
1014 0 1 1 0 1 1 0
1015 1 1 0 1 1 0 0
1016 0 0 1 0 1 0 1
Classification Algorithms
• K-nearest neighbors (involves 3 notions)
– Distance measure
– Centroid
– Majority vote or average
K-NN
Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90
Registry
?
1 1 1 1 1 0 1 0
2 1 1 1 1 0 1 0
3 1 1 0 1 0 1 0
4 0 1 1 0 1 1 0
5 1 1 0 1 1 0 1
6 0 0 1 0 1 0 1
NEW 1 0 1 1 0 1 ?
Classification Algorithms, cont.
• Logistic Regression
• CART
• Discriminant Analysis
• Neural Network
• Naïve Bayes
The Overfit Problem
0
200
400
600
800
1000
1200
1400
1600
0 200 400 600 800 1000
Revenue
Expenditure
Complex function - overfit
0
200
400
600
800
1000
1200
1400
1600
0 200 400 600 800 1000
Revenue
Expenditure
Therefore: Validate the Model
• Partition the original data
– Training
– Validation
• Fit the model to the training data
• Assess performance using the validation data
Performance Metrics
• Continuous
– RMSE
• Categorical (often binary)
– % accurate (confusion matrix)
– Lift
Confusion Matrix and Cutoff Control
Training Data scoring - Summary Report
Cut off Prob.Val. for Success (Updatable) 0.5
Classification Confusion Matrix
Predicted Class
Actual Class 1 0
1 43 8
0 6 247
Lift
• In classifying “pregnant” vs. “not-pregnant”
classifying everyone as “not-pregnant” has
very high overall accuracy
• Need metric that reflects greater importance
of the “pregnant” category, which is rare
• Lift is the model’s improvement over average
random selection
Decile Lift Chart
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Decilemean/Globalmean
Deciles
Decile-wise lift chart (validation dataset)
Validate the Model
• Compare one model to another
• Avoid overfit
• Solution: apply model to hold-out sample
– Assess performance of different models
– Fine tune parameters of individual models
Partitioning
• Randomly split the initial data into 2 or 3
groups
– Training
– Validation
– Test
• Repeated use of validation data to compare
and fine tune models -> overfit to
validation, in addition to training
– “Test” partition used only once, at the end
Software
• SAS Enterprise Miner $$$$
• IBM SPSS Modeler (Clementine) $$$$
• XLMiner (Excel add-in) $
• Statistica Data Miner $$
• Salford Systems $$
• Rapid Miner $$ (open source free version)
• R open source free
Data Mining - More
• Clustering (segmentation)
• Profiling (explanatory models)
• Time series
• Affinity (recommender systems)
• Text analytics (NLP, sentiment analysis)
Skill Shortage
• McKinsey “Big Data” report
– Supply gap of 140,000-190,000 “deep analytical
talent”
• Emergence of “Analytics” masters programs
(Northwestern, NC State, …)

More Related Content

Viewers also liked

Déjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictifDéjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictifagileDSS
 
Prospect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladeshProspect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladeshFerdous Kabir
 
Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modellinglalit Lalitm7225
 
Le price training presentation
Le price training presentationLe price training presentation
Le price training presentationzainudinyahya
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Sentient Science
 
Cwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenanceCwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenanceCapgemini
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichtenDaniel Westzaan
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Tina Zhang
 
Predictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environmentPredictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environmentCapgemini
 
What is predictive maintenance?
What is predictive maintenance?What is predictive maintenance?
What is predictive maintenance?Danko Nikolic
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive MaintenanceSaama
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionSenturus
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with Reoda GmbH
 
GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN ibtissam el hassani
 
Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenanceJames Shearer
 
DeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnelDeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnelEric Mauvais
 
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...polenumerique33
 

Viewers also liked (20)

Déjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictifDéjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictif
 
Prospect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladeshProspect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladesh
 
Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modelling
 
Le price training presentation
Le price training presentationLe price training presentation
Le price training presentation
 
Predictive analysis
Predictive analysisPredictive analysis
Predictive analysis
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
 
Cwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenanceCwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenance
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_
 
Predictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environmentPredictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environment
 
What is predictive maintenance?
What is predictive maintenance?What is predictive maintenance?
What is predictive maintenance?
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenance
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with R
 
Gpao 4 Juste à temps Kanban
Gpao 4 Juste à temps KanbanGpao 4 Juste à temps Kanban
Gpao 4 Juste à temps Kanban
 
GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN
 
Machinery Oil Analysis
Machinery Oil AnalysisMachinery Oil Analysis
Machinery Oil Analysis
 
Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
 
DeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnelDeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnel
 
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
 

Similar to Predictive Analysis

Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
Digital transformation in transport and logistics
Digital transformation in transport and logisticsDigital transformation in transport and logistics
Digital transformation in transport and logisticsPostNL België
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxPrincePatel272012
 
What is Machine Learning?
What is Machine Learning?What is Machine Learning?
What is Machine Learning?SwiftKeyComms
 
Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientificRevenue
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data miningDhilsath Fathima
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptxPallabiSahoo5
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 
Machine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkMachine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkInSemble
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptxImXaib
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4RichardGroom
 

Similar to Predictive Analysis (20)

Kevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
Digital transformation in transport and logistics
Digital transformation in transport and logisticsDigital transformation in transport and logistics
Digital transformation in transport and logistics
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptx
 
What is Machine Learning?
What is Machine Learning?What is Machine Learning?
What is Machine Learning?
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Basic Overview of Data Mining
Basic Overview of Data MiningBasic Overview of Data Mining
Basic Overview of Data Mining
 
Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talk
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Machine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkMachine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache Spark
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4
 

More from Michael Bystry

Peril and Promise of Social Media
Peril and Promise of Social MediaPeril and Promise of Social Media
Peril and Promise of Social MediaMichael Bystry
 
Creating Marketing Personas
Creating Marketing PersonasCreating Marketing Personas
Creating Marketing PersonasMichael Bystry
 
Why Become PRC Certified
Why Become PRC CertifiedWhy Become PRC Certified
Why Become PRC CertifiedMichael Bystry
 
Learning About America from the 2010 Census
Learning About America from the 2010 CensusLearning About America from the 2010 Census
Learning About America from the 2010 CensusMichael Bystry
 
Brave New World: The End of Survey Research
Brave New World: The End of Survey ResearchBrave New World: The End of Survey Research
Brave New World: The End of Survey ResearchMichael Bystry
 
Exploring Evoving Trends in Viewship
Exploring Evoving Trends in ViewshipExploring Evoving Trends in Viewship
Exploring Evoving Trends in ViewshipMichael Bystry
 
Online Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic ChannelOnline Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic ChannelMichael Bystry
 
Broadcast Television: Trends and Implications
Broadcast Television: Trends and ImplicationsBroadcast Television: Trends and Implications
Broadcast Television: Trends and ImplicationsMichael Bystry
 
Predicting College Tuition
Predicting College TuitionPredicting College Tuition
Predicting College TuitionMichael Bystry
 
Conjoint class project
Conjoint class projectConjoint class project
Conjoint class projectMichael Bystry
 

More from Michael Bystry (11)

Peril and Promise of Social Media
Peril and Promise of Social MediaPeril and Promise of Social Media
Peril and Promise of Social Media
 
Creating Marketing Personas
Creating Marketing PersonasCreating Marketing Personas
Creating Marketing Personas
 
Why Become PRC Certified
Why Become PRC CertifiedWhy Become PRC Certified
Why Become PRC Certified
 
Learning About America from the 2010 Census
Learning About America from the 2010 CensusLearning About America from the 2010 Census
Learning About America from the 2010 Census
 
Brave New World: The End of Survey Research
Brave New World: The End of Survey ResearchBrave New World: The End of Survey Research
Brave New World: The End of Survey Research
 
Exploring Evoving Trends in Viewship
Exploring Evoving Trends in ViewshipExploring Evoving Trends in Viewship
Exploring Evoving Trends in Viewship
 
Online Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic ChannelOnline Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic Channel
 
Broadcast Television: Trends and Implications
Broadcast Television: Trends and ImplicationsBroadcast Television: Trends and Implications
Broadcast Television: Trends and Implications
 
Predicting College Tuition
Predicting College TuitionPredicting College Tuition
Predicting College Tuition
 
On Campus DvD kiosks
On Campus DvD kiosksOn Campus DvD kiosks
On Campus DvD kiosks
 
Conjoint class project
Conjoint class projectConjoint class project
Conjoint class project
 

Recently uploaded

anas about venice for grade 6f about venice
anas about venice for grade 6f about veniceanas about venice for grade 6f about venice
anas about venice for grade 6f about veniceanasabutalha2013
 
Hyundai capital 2024 1quarter Earnings release
Hyundai capital 2024 1quarter Earnings releaseHyundai capital 2024 1quarter Earnings release
Hyundai capital 2024 1quarter Earnings releaseirhcs
 
Understanding UAE Labour Law: Key Points for Employers and Employees
Understanding UAE Labour Law: Key Points for Employers and EmployeesUnderstanding UAE Labour Law: Key Points for Employers and Employees
Understanding UAE Labour Law: Key Points for Employers and EmployeesDragon Dream Bar
 
Maximizing Efficiency Migrating AccountEdge Data to QuickBooks.pdf
Maximizing Efficiency Migrating AccountEdge Data to QuickBooks.pdfMaximizing Efficiency Migrating AccountEdge Data to QuickBooks.pdf
Maximizing Efficiency Migrating AccountEdge Data to QuickBooks.pdfPaulBryant58
 
The-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic managementThe-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic managementBojamma2
 
5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographerofm712785
 
sales plan presentation by mckinsey alum
sales plan presentation by mckinsey alumsales plan presentation by mckinsey alum
sales plan presentation by mckinsey alumzyqmx62fgm
 
IPTV Subscription in Ireland: Elevating Your Entertainment Experience
IPTV Subscription in Ireland: Elevating Your Entertainment ExperienceIPTV Subscription in Ireland: Elevating Your Entertainment Experience
IPTV Subscription in Ireland: Elevating Your Entertainment ExperienceDragon Dream Bar
 
falcon-invoice-discounting-a-premier-platform-for-investors-in-india
falcon-invoice-discounting-a-premier-platform-for-investors-in-indiafalcon-invoice-discounting-a-premier-platform-for-investors-in-india
falcon-invoice-discounting-a-premier-platform-for-investors-in-indiaFalcon Invoice Discounting
 
New Product Development.kjiy7ggbfdsddggo9lo
New Product Development.kjiy7ggbfdsddggo9loNew Product Development.kjiy7ggbfdsddggo9lo
New Product Development.kjiy7ggbfdsddggo9logalbokkahewagenitash
 
IPTV Subscription UK: Your Guide to Choosing the Best Service
IPTV Subscription UK: Your Guide to Choosing the Best ServiceIPTV Subscription UK: Your Guide to Choosing the Best Service
IPTV Subscription UK: Your Guide to Choosing the Best ServiceDragon Dream Bar
 
Digital Transformation in PLM - WHAT and HOW - for distribution.pdf
Digital Transformation in PLM - WHAT and HOW - for distribution.pdfDigital Transformation in PLM - WHAT and HOW - for distribution.pdf
Digital Transformation in PLM - WHAT and HOW - for distribution.pdfJos Voskuil
 
FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134LR1709MUSIC
 
What are the main advantages of using HR recruiter services.pdf
What are the main advantages of using HR recruiter services.pdfWhat are the main advantages of using HR recruiter services.pdf
What are the main advantages of using HR recruiter services.pdfHumanResourceDimensi1
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxCynthia Clay
 
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...Khaled Al Awadi
 
Matt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdf
Matt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdfMatt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdf
Matt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdfMatt Conway - Attorney
 
BeMetals Presentation_May_22_2024 .pdf
BeMetals Presentation_May_22_2024   .pdfBeMetals Presentation_May_22_2024   .pdf
BeMetals Presentation_May_22_2024 .pdfDerekIwanaka1
 
Global Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdfGlobal Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdfHenry Tapper
 

Recently uploaded (20)

anas about venice for grade 6f about venice
anas about venice for grade 6f about veniceanas about venice for grade 6f about venice
anas about venice for grade 6f about venice
 
Hyundai capital 2024 1quarter Earnings release
Hyundai capital 2024 1quarter Earnings releaseHyundai capital 2024 1quarter Earnings release
Hyundai capital 2024 1quarter Earnings release
 
Understanding UAE Labour Law: Key Points for Employers and Employees
Understanding UAE Labour Law: Key Points for Employers and EmployeesUnderstanding UAE Labour Law: Key Points for Employers and Employees
Understanding UAE Labour Law: Key Points for Employers and Employees
 
Maximizing Efficiency Migrating AccountEdge Data to QuickBooks.pdf
Maximizing Efficiency Migrating AccountEdge Data to QuickBooks.pdfMaximizing Efficiency Migrating AccountEdge Data to QuickBooks.pdf
Maximizing Efficiency Migrating AccountEdge Data to QuickBooks.pdf
 
The-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic managementThe-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic management
 
5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer
 
sales plan presentation by mckinsey alum
sales plan presentation by mckinsey alumsales plan presentation by mckinsey alum
sales plan presentation by mckinsey alum
 
IPTV Subscription in Ireland: Elevating Your Entertainment Experience
IPTV Subscription in Ireland: Elevating Your Entertainment ExperienceIPTV Subscription in Ireland: Elevating Your Entertainment Experience
IPTV Subscription in Ireland: Elevating Your Entertainment Experience
 
falcon-invoice-discounting-a-premier-platform-for-investors-in-india
falcon-invoice-discounting-a-premier-platform-for-investors-in-indiafalcon-invoice-discounting-a-premier-platform-for-investors-in-india
falcon-invoice-discounting-a-premier-platform-for-investors-in-india
 
New Product Development.kjiy7ggbfdsddggo9lo
New Product Development.kjiy7ggbfdsddggo9loNew Product Development.kjiy7ggbfdsddggo9lo
New Product Development.kjiy7ggbfdsddggo9lo
 
IPTV Subscription UK: Your Guide to Choosing the Best Service
IPTV Subscription UK: Your Guide to Choosing the Best ServiceIPTV Subscription UK: Your Guide to Choosing the Best Service
IPTV Subscription UK: Your Guide to Choosing the Best Service
 
Digital Transformation in PLM - WHAT and HOW - for distribution.pdf
Digital Transformation in PLM - WHAT and HOW - for distribution.pdfDigital Transformation in PLM - WHAT and HOW - for distribution.pdf
Digital Transformation in PLM - WHAT and HOW - for distribution.pdf
 
FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134
 
What are the main advantages of using HR recruiter services.pdf
What are the main advantages of using HR recruiter services.pdfWhat are the main advantages of using HR recruiter services.pdf
What are the main advantages of using HR recruiter services.pdf
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...NewBase   24 May  2024  Energy News issue - 1727 by Khaled Al Awadi_compresse...
NewBase 24 May 2024 Energy News issue - 1727 by Khaled Al Awadi_compresse...
 
Matt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdf
Matt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdfMatt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdf
Matt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdf
 
BeMetals Presentation_May_22_2024 .pdf
BeMetals Presentation_May_22_2024   .pdfBeMetals Presentation_May_22_2024   .pdf
BeMetals Presentation_May_22_2024 .pdf
 
Global Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdfGlobal Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdf
 
Sustainability: Balancing the Environment, Equity & Economy
Sustainability: Balancing the Environment, Equity & EconomySustainability: Balancing the Environment, Equity & Economy
Sustainability: Balancing the Environment, Equity & Economy
 

Predictive Analysis

  • 1. Predictive Analytics Peter Bruce THE INSTITUTE FOR STATISTICS EDUCATION at Statistics.com peter.bruce@statistics.com
  • 2. About Statistics.com THE INSTITUTE FOR STATISTICS EDUCATION • 100+ courses, introductory and advanced • Traditional statistics, data mining, machine learning, text mining, clinical trials, optimization, use of R • All online • Typically 4 weeks, scheduled dates • Don’t need to be online particular times/days • Private discussion forum with instructors - noted authors & experts
  • 3. A man walks into a Target® store…
  • 4. Predictive Analytics • In marketing, used for model driven targeted sales efforts • Also… will loan default, what diagnosis (given symptoms), is tax return fraudulent, …
  • 5. Market Research • Traditionally surveys, analysis, information gathering, strategy • Moving online increases the amount of data, speeds its flow, and makes it more accessible
  • 6. Washington Post (web) • 35 different reports tracking traffic daily • Midday report “are we on track for visitors?” • # visitors from key domains - .gov, .mil, .senate or .house
  • 7. Daily Mail (UK web) • A traditional ingredient is stories about animals – tracked on web • “The animals that do best are monkeys, dogs, and cats, in that order…” Martin Clark (editor)
  • 9. Predictive Analytics • Goes beyond the obvious, capturing complexity • Implemented for real-time behavior and decisions
  • 10. Pregnant? • Obvious retail clues – maternity clothes, baby food, baby clothes, crib … • These may be too late • Earlier clues not so obvious – lotions, supplements, and, esp., combinations and changes in purchase patterns • Data mining algorithms can capture these less obvious, more complex signals
  • 11. Training the Model • Bridal registry • Women of similar demographic not on bridal registry • Together, the training set – Known outcome – Purchase data over time
  • 12. Hypothetical Data Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90 Registry ? 1011 1 1 1 1 0 1 0 1012 1 0 1 0 1 0 1 1013 1 1 0 1 1 0 1 1014 0 1 1 0 1 1 0 1015 1 1 0 1 1 0 0 1016 0 0 1 0 1 0 1
  • 13. Classification Algorithms • K-nearest neighbors (involves 3 notions) – Distance measure – Centroid – Majority vote or average
  • 14. K-NN Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90 Registry ? 1 1 1 1 1 0 1 0 2 1 1 1 1 0 1 0 3 1 1 0 1 0 1 0 4 0 1 1 0 1 1 0 5 1 1 0 1 1 0 1 6 0 0 1 0 1 0 1 NEW 1 0 1 1 0 1 ?
  • 15. Classification Algorithms, cont. • Logistic Regression • CART • Discriminant Analysis • Neural Network • Naïve Bayes
  • 16. The Overfit Problem 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 Revenue Expenditure
  • 17. Complex function - overfit 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 Revenue Expenditure
  • 18. Therefore: Validate the Model • Partition the original data – Training – Validation • Fit the model to the training data • Assess performance using the validation data
  • 19. Performance Metrics • Continuous – RMSE • Categorical (often binary) – % accurate (confusion matrix) – Lift
  • 20. Confusion Matrix and Cutoff Control Training Data scoring - Summary Report Cut off Prob.Val. for Success (Updatable) 0.5 Classification Confusion Matrix Predicted Class Actual Class 1 0 1 43 8 0 6 247
  • 21. Lift • In classifying “pregnant” vs. “not-pregnant” classifying everyone as “not-pregnant” has very high overall accuracy • Need metric that reflects greater importance of the “pregnant” category, which is rare • Lift is the model’s improvement over average random selection
  • 22. Decile Lift Chart 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 Decilemean/Globalmean Deciles Decile-wise lift chart (validation dataset)
  • 23. Validate the Model • Compare one model to another • Avoid overfit • Solution: apply model to hold-out sample – Assess performance of different models – Fine tune parameters of individual models
  • 24. Partitioning • Randomly split the initial data into 2 or 3 groups – Training – Validation – Test • Repeated use of validation data to compare and fine tune models -> overfit to validation, in addition to training – “Test” partition used only once, at the end
  • 25. Software • SAS Enterprise Miner $$$$ • IBM SPSS Modeler (Clementine) $$$$ • XLMiner (Excel add-in) $ • Statistica Data Miner $$ • Salford Systems $$ • Rapid Miner $$ (open source free version) • R open source free
  • 26. Data Mining - More • Clustering (segmentation) • Profiling (explanatory models) • Time series • Affinity (recommender systems) • Text analytics (NLP, sentiment analysis)
  • 27. Skill Shortage • McKinsey “Big Data” report – Supply gap of 140,000-190,000 “deep analytical talent” • Emergence of “Analytics” masters programs (Northwestern, NC State, …)