Practical Predictive Analytics


Published on

Fun presentation on how to get started with data mining and predictive analytics. Covers CRISP-DM process and the most popular predictive analytics tools including: RapdMiner, R, Predixion, Weka, Statistica, SAS, IBM, Microsoft, Oracle, Spotfire, Tableau, and Alteryx.

Published in: Technology

Practical Predictive Analytics

  1. 1. © 2013 Impact Analytix, LLCPractical Predictive AnalyticsJen UnderwoodFounder & Principal ConsultantImpact Analytix,
  2. 2. © 2013 Impact Analytix, LLCAbout the SpeakerImpact Analytix, LLC is a boutique BI andpredictive analytics consulting firm that valuesprojects that truly make a differenceJen Underwood, Founder & Principal Consultant• ~20 years of business intelligence industry experience• Former Global Microsoft BI and Analytics Technical ProductManager and seasoned BI implementer• Passionate technology evangelist and volunteer, TDWI, PASS,SharePoint Conference, and Microsoft TechEd• Bachelor of Business Administration degreeUniversity of Wisconsin MilwaukeePost Graduate Certificate Computer Science - Data MiningUniversity of California, San Diego
  3. 3. © 2013 Impact Analytix, LLC“Decision making and the techniques and technologies to support andautomate it will be the next competitive battleground for organizations.Those who are using business rules, data mining, analytics and optimizationtoday are the shock troops of this next wave of business innovation.”- Tom Davenport, Competing on Analytics
  4. 4. © 2013 Impact Analytix, LLCAnalytic Maturity + Big Data Explosion + ToolsData Scientist: The Sexiest Jobof the 21st CenturyThe sexiest job of the 21st Century?Data Analyst(October 2012)(June 2013)
  5. 5. © 2013 Impact Analytix, LLCMega-TrendsSource: Rapid Growth of Big Data2. Cloud Computing andAdvanced Cloud Services3. On Demand Services4. Virtualization5. Consumerization of IT Increases
  6. 6. © 2013 Impact Analytix, LLCBig Data Hype Cycle
  7. 7. © 2013 Impact Analytix, LLCWhat is Predictive AnalyticsArea of statistics around capturing relationships between explanatory variables andpredicted variables from past occurrences and using it for predictionData Mining, automatically discovering interesting patterns in dataCan apply to unknowns in the Past, Present or FutureAccuracy and usability vary upon level of analysis and quality of assumptions
  8. 8. © 2013 Impact Analytix, LLCWhy use Predictive AnalyticsImprove decision making by enabling learning from past experienceWaaaaaaaaay too much data and too many variablesto manually analyze or use traditional statistics techniquesTraditional analytics and statistical methods fail due to complex non-linearand multi-variable combinationsStrategiccompetitive advantage
  9. 9. © 2013 Impact Analytix, LLCPredictiveAnalysisSeekProfitableCustomersCreditScoringAnticipateCustomerChurnPredictSales &InventoryDirectMarketingDetectFraudHow Predictive Analytics is used TodayLiterally everywhere
  10. 10. © 2013 Impact Analytix, LLCFunction Description Example CompaniesSupply Chain Simulate and optimize supply chain flows; reduce inventory and stock-outs.Dell, Wal-Mart, Amazon, UPSCustomer selection Identify customers with greatest profit potential; retain their loyalty. Harrah’s, Capital One, BarclaysPricing Identify the price that will maximize yield, or profit. Progressive, MarriottHuman capital Select the best employees for particular tasks or jobs, at particularcompensation levels.New England Patriots,Oakland A’s, Boston Red SoxProduct and servicequalityDetect quality problems early and minimize them. Honda, IntelFinancial performance Better understand the drivers of financial performance and the effectsof nonfinancial factors.MCI, VerizonResearch anddevelopmentImprove quality, efficacy, and, where applicable, safety of products andservices.Novartis, Amazon, YahooFraud and Crime Identify fraud, criminal activity, tax evasion and stolen credit cards Bank of America, IRS, FBIHow Predictive Analytics is used TodaySource:
  11. 11. © 2013 Impact Analytix, LLCWho? Highlighted Predictive Software PlayersMany other vendors
  12. 12. © 2013 Impact Analytix, LLCWho? In-Database Predictive AnalyticsMany other vendors
  13. 13. © 2013 Impact Analytix, LLCWho is Hot in 2013!Source: Nuggets 2013 Poll Results**Notably influencedby vendor ability toget their users to votebut the results are stillvery revealing
  14. 14. © 2013 Impact Analytix, LLCWho is Hot in 2013!Predixion Software, up 622%, to 2.7% share, 0.4% in 2012 (0.5% in 2011)Revolution Analytics, up 105%, to 2.8% share, 1.4% in 2012 (was 1.4% in 2011)Salford, up 98%, to 2.2% share, 1.1% in 2012 (was 10.6% in 2011)SAP, up 64%, to 1.4% share, 0.9% in 2012 (not asked in 2011)RapidMiner, up 47%, to 39.2% share, 26.7% in 2012 (was 27.7% in 2011)Tableau, up 43%, to 6.3% share, 4.4% in 2012 (was 2.6% in 2011)Microsoft SQL Server, up 39%, to 7.0% share, 5.0% in 2012 (was 4.9% in 2011)R, up 22%, to 37.4% share, 30.7% in 2012 (was 23.3% in 2011)Source:
  15. 15. © 2013 Impact Analytix, LLCHow Anyone Can Get Started• Don’t need to have big data or be a Data Scientist, PhD, or Statistician• Specialty courses and basic understanding of statistics is important• Download free software, buy a couple books, take course, get amentor, and start learning hands-on with your own data
  16. 16. © 2013 Impact Analytix, LLCKD nuggets – THE place to go
  17. 17. © 2013 Impact Analytix, LLCThe Basic StepsFollow the CRISP-DM ProcessChoose appropriate Business Question or ProblemGather, understand, and prepare the dataPredictive algorithms use “flattened” input data setsCleanse, and prepare data, add bins, classes, and various experimental transformsCreate ETL, analytics view in database or exported text fileLoad prepared data into analytics tool or use in-database algorithmsExplore data set, identify predictive influencers, evaluate various data mining models,further transform and iteratively experiment with variablesDeploy predictive model, integrate into reporting or application logicEncode rules, PMML, programming, DMX, and in-database prediction queries
  18. 18. © 2013 Impact Analytix, LLCPreparing and Sampling DataDon’t overlook the critical importance of properlychoosing, preparing and sampling data to train anddevelop high performing modelsMissing values, outliers, consistent labeling, bin sizing,transforms, what algorithms will be used and what datatypes are supportedPrincipal component analysis to reduce variables toavoid “over fitting” predictive modelsThink “MONEYBALL” for your business
  19. 19. © 2013 Impact Analytix, LLCPreparing and Sampling Data
  20. 20. © 2013 Impact Analytix, LLCMost Common Predictive Modeling TasksClassification: predicting an item class, “Decision Tree”, most popularClustering: finding natural groups or clusters in dataAssociation: what occurs together, “Market Basket”Deviation Detection: finding changes or outliersEstimation and Time Series: predicting a continuous valueLink Analysis: finding relationshipsWeb and Text Mining: extracting information from unstructured data
  21. 21. © 2013 Impact Analytix, LLCEvaluating Predictive ModelsScoring typically mean squared error orpercent correctly classifiedKeep business problem in mindErrors, True and False Positives, what doesan error really mean?Estimation, Classification, Gain/Lift, ROI,and ROC CurvesCommon issuesOver-fitting and Under-fittingModels do not truly represent the process
  22. 22. © 2013 Impact Analytix, LLCIntegrating Predictive ModelsAnalytic tool specific integration optionsIn-Database Predictive UDF FunctionsDMX Predictive QueriesPMML to exchange modelsProgramming with APIs
  23. 23. © 2013 Impact Analytix, LLCDemos
  24. 24. © 2013 Impact Analytix, LLCRapid Miner
  25. 25. © 2013 Impact Analytix, LLCR and Rattle
  26. 26. © 2013 Impact Analytix, LLCMicrosoft Excel Data Mining Add-Ins
  27. 27. © 2013 Impact Analytix, LLCPredixion
  28. 28. © 2013 Impact Analytix, LLCPentaho Weka
  29. 29. © 2013 Impact Analytix, LLCStatsoft Statistica
  30. 30. © 2013 Impact Analytix, LLCSAS Enterprise Miner
  31. 31. © 2013 Impact Analytix, LLCSAS JMP
  32. 32. © 2013 Impact Analytix, LLCIBM SPSS Modeler
  33. 33. © 2013 Impact Analytix, LLCOracle Data Miner
  34. 34. © 2013 Impact Analytix, LLCSAP Lumira Predictive Analytics
  35. 35. © 2013 Impact Analytix, LLCSpotfire Miner
  36. 36. © 2013 Impact Analytix, LLCTableau
  37. 37. © 2013 Impact Analytix, LLCAlteryx
  38. 38. © 2013 Impact Analytix, LLCAdditional Resources http://www.kdnuggets.comRapidMiner http://rapid-i.comR Statistical Computing http://www.r-project.orgRevolution Analytics http://ww.revolutionanalytics.comMicrosoft http://www.teradata.comTableau http://www.tableausoftware.comSpotfire http://spotfire.tibco.comSAS SPSS Open Source Data Mining
  39. 39. © 2013 Impact Analytix, LLC© 2013 Impact Analytix, LLC