Best Practices for Big Data Analytics with Machine Learning by Datameer

3,456 views

Published on

Don't forget! You can watch the full Datameer recording here:
http://info.datameer.com/Online-Slideshare-Big-Data-Analytics-Machine-Learning-OnDemand.html

Learn through industry use cases, how to empower users to identify patterns & relationships for recommendations using big data analytics.

Published in: Technology, Education

Best Practices for Big Data Analytics with Machine Learning by Datameer

  1. 1. Best Practices for Big Data Analytics with Machine Learning © 2013 Datameer, Inc. All rights reserved.
  2. 2. About our Speakers Dr. Alex Guazzelli Zementis Vice President, Analytics (@DrAlexGuazzelli) Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language. At Zementis, Dr. Guazzelli is responsible for developing core technology and analytical solutions for Big Data and real-time scoring. Most recently, Dr. Guazzelli started teaching a class on standards for predictive analytics at UC San Diego Extension.
  3. 3. About our Speakers Karen Hsu Datameer Senior Director, Product Marketing (@Karenhsumar) •  Over 15 years of enterprise software experience •  Co-authored 4 patents •  Worked in a variety of engineering, marketing and sales roles •  Bachelors of Science degree in Management Science and Engineering from Stanford University •  •  •  Came from Infomatica Worked with start-ups Infomatica purchased to bring data solutions to market •  Data quality •  Master data management •  B2B •  Data security solutions
  4. 4. Agenda •  Considerations •  Best Practices •  Demonstration •  Q&A
  5. 5. Considerations © 2013 Datameer, Inc. All rights reserved.
  6. 6. Considerations Target Users Business IT Data Scientist Questions Descriptive! Predictive! Prescriptive!
  7. 7. Target Users Business Professional ▪  Visual Dependencies Clustering Decision Trees + More!
  8. 8. Target Users IT ▪  Flexible, powerful
  9. 9. Target Users Data Scientist ▪  Algorithms ▪  SAS, SPSS, R
  10. 10. Questions Descriptive! Predictive! Prescriptive! ▪  Descriptive machine learning… –  Tells you what has happened
  11. 11. Questions Descriptive! Predictive! Prescriptive! ▪  Predictive machine learning… –  Answers the question what will happen
  12. 12. Questions Descriptive! Predictive! Prescriptive! ▪  Prescriptive machine learning… –  What will happen, when it will happen, why it will happen –  Predict what will happen and prescribe how to take advantage of this future
  13. 13. Best Practices © 2013 Datameer, Inc. All rights reserved.
  14. 14. Lean Analytics 1. Integrate Identify Use Case 4. Visualize 2. Prepare 3. Analyze Deploy
  15. 15. Union Cleanse Join Bin Normalize Profile Transform Outliers Missing Values Invalid values Data Preparation Enrich
  16. 16. Descriptive Analytics Drag & Drop Smart Analytics
  17. 17. Predictive Analytics Predictive analytics is able to discover hidden patterns in historical data that the human expert may not see. It is in fact the result of mathematics applied to data. As such, it benefits from clever mathematical techniques as well as good data. Predictive Analytics helps you discover patterns in the past, which can signal what is ahead. Descriptive vs. Predictive Analytics "  "  Descriptive Analytics answers “What happened?” Predictive Analytics answers “What will happen next?” ? ?
  18. 18. Example: Predicting Churn Matt - Churned 2 days ago Scott - “Liked” our company last week John - ??
  19. 19. Churn-related features Matt 3 complaints in last 6 months Opened 2 support tickets in last 4 weeks Spent a total of $1,234 buying merchandise Spent a total of $123 in services Purchased 2 items in last 4 weeks Is 34 years old Is a male Lives in Los Angeles ... Scott No complaints in last 6 months Opened 1 support ticket in last 4 weeks Spent a total of $9,876 buying merchandise Spent a total of $987 in services Purchased 12 items in last 4 weeks Is 54 years old Is a male Lives in Chicago ...
  20. 20. Big Data An ever expanding ocean of data containing people and sensor data (lots and lots of it): "  "  "  "  "  "  "  Transaction records Social media Climate information Mobile GPS signals Healthcare Smart Grid Digital Breadcrumbs Breadth and Depth 90% of the data today created in last 2 years
  21. 21. Churn-related “Big Data” features Matt 12 friends listed as customers 2 complaints from friends in last 6 months Average age of friends is 41 years old 2 friends churned in last 30 days No purchases for same items as friends 1 website visit in last 7 days 2 website pages opened during last visit Opened 3 newsletters in last 6 months ... Scott 34 friends listed as customers 1 complaint from friends in last 6 months Average age of friends is 62 years old No friends churned in last 30 days Purchased same 2 items as friends in last 2 months 3 website visits in last 7 days 5 website pages opened during last visit Opened 12 newsletters in last 6 months ...
  22. 22. Building a predictive model ... Model Training Predictive Model Churned Not-churned Churn-related features Neural Networks Linear/Logistic Regression Support Vector Machines Scorecards Decision Trees Clustering Association Rules K-Nearest Neighbors Naive Bayes Classifiers ... Input Layer Data Hidden Layer Output Layer Prediction
  23. 23. Why not several models? Model Ensemble Model 1 Raw Inputs Data PreProcessing Model 2 Prediction . . . Model n Scores from all models are computed Voting Majority Voting, Weighted Voting, Weighted Average, etc.
  24. 24. End Goal: Predicting churn ... Model Deployment and Execution in Big Data Predictive Churn Model Churn-related Features Churn Risk Score
  25. 25. From Model Building to Model Deployment (Traditionally ...) SAS, R, IBM SPSS, Perl, Python Scientist’s Desktop Java, .NET C, SQL Lost in Translation SAS, R, IBM SPSS … Production Environment Great for model building but not for scoring, even more so when it comes to Hadoop
  26. 26. From Model Building to Model Deployment (with PMML) Model Deployment and Execution Model Building "  Angoss "  BigML "  FICO Model Builder "  IBM SPSS "  KNIME "  KXEN "  Microstrategy "  Open Data "  Pervasive DataRush "  RapidMiner "  R / Rattle "  SAS "  SAP Business Objects "  Salford Systems "  StatSoft STASTISTICA "  SQL Server "  TIBCO Spotfire "  Custom Code, etc. Datameer Server PMML   PMML   PMML   (models)   (models)   (models)   PMML Deploy in minutes ...                 Universal  PMML   Plug-­‐in  (UPPI)  
  27. 27. Predictive Model Markup Language "   PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications. "   It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models. "   PMML eliminates need for custom model deployment and ensures reliability. Models Data Transformations PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing)
  28. 28. UPPI: Supported Techniques "   Neural Networks (neural gas, radial-basis and backpropagation) "   Support Vector Machines (for classification and regression) "   Naive Bayes Classifier (for continuous and categorical inputs) "   Rule Set Models "   Clustering Models (2-step clustering, distribution and center-based) "   Decision Trees (for classification and regression) "   General Regression Models (Cox, General and Generalized Linear Models) "   Regression Models (Linear, Logistic and Polynomial Regression Models) "   Scorecards (with support for Reason Codes) "   Restricted Boltzmann Machines "   Association Rules "   Multiple Models (with the possibility of having models spread over multiple PMML files) "   Model Ensemble (including Random Forest Models and Boosted Trees) "   Model Segmentation "   Model Chaining "   Model Composition "   Model Cascade © Zementis, Inc. - Confidential
  29. 29. Demonstration Flow Descriptive Karen Predictive Modeling Alex Predictive Production Prescriptive Karen Karen
  30. 30. Descriptive Analytics © 2013 Datameer, Inc. All rights reserved.
  31. 31. Descriptive Analytics ▪  Answers: What caused people to churn? ▪  Clustering ▪  Column Dependencies ▪  Decision Tree
  32. 32. Demonstration Flow Descriptive Karen Predictive Modeling Alex Predictive Production Prescriptive Karen Karen
  33. 33. Predictive Analytics © 2013 Datameer, Inc. All rights reserved.
  34. 34. Predictive Analytics ▪  Who will churn?
  35. 35. Demonstration Flow Descriptive Karen Predictive Modeling Alex Predictive Production Prescriptive Karen Karen
  36. 36. Prescriptive Analytics © 2013 Datameer, Inc. All rights reserved.
  37. 37. Prescriptive Analytics ▪  Who will churn? Why will they churn? ▪  What can we do to support that outcome?
  38. 38. Demonstration Flow Descriptive Karen Predictive Modeling Alex Predictive Production Prescriptive Karen Karen
  39. 39. Q&A
  40. 40. Next Steps: More about Datameer and Big Data www.datameer.com More about Zementis www.zementis.com Contact us: Alex Guazzeli aguazzeli@zementis.com Karen Hsu khsu@datameer.com Page 40

×