Your SlideShare is downloading. ×
0
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Best practices machine learning final
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Best practices machine learning final

474

Published on

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
474
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Before I go into the demonstrations, I want to orient you to the environment in which we’ll do this demonstration. Hortonworks sandbox, Datameer on topSee datameer (administration->hadoop cluster) and running on hadoop clusterSee administration in hortonworks (Pig, …)Go to job browser (take out hue from username) and see the jobs and that running Datameer jobs (point out maps and reduces)You can get all of this from the Hortonworks site and datameer.
  • Neural networks are known for having good prediction quality. But they’re bad in being understand and why the predicions are happening. But now we understand why neural network did to understand them better.
  • Transcript

    • 1. Best Practices for Big Data Analytics with Machine Learning © 2013 Datameer, Inc. All rights reserved.
    • 2. About our Speakers Dr. Alex Guazzelli Zementis Vice President, Analytics (@DrAlexGuazzelli) Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language. At Zementis, Dr. Guazzelli is responsible for developing core technology and analytical solutions for Big Data and real-time scoring. Most recently, Dr. Guazzelli started teaching a class on standards for predictive analytics at UC San Diego Extension.
    • 3. About our Speakers Karen Hsu Datameer Senior Director, Product Marketing (@Karenhsumar) • Over 15 years of enterprise software experience • • • Co-authored 4 patents • • Bachelors of Science degree in Management Science and Engineering from Stanford University Worked in a variety of engineering, marketing and sales roles • Came from Infomatica Worked with start-ups Infomatica purchased to bring data solutions to market • Data quality • Master data management • B2B • Data security solutions
    • 4. Agenda • Considerations • Best Practices • Demonstration • Q&A
    • 5. Considerations © 2013 Datameer, Inc. All rights reserved.
    • 6. Considerations Target Users Business IT Data Scientist Questions Descriptive Predictive Prescriptive
    • 7. Target Users Business Professional ▪ Visual Dependencies Clustering Decision Trees + More!
    • 8. Target Users IT ▪ Flexible, powerful
    • 9. Target Users Data Scientist ▪ Algorithms ▪ SAS, SPSS, R
    • 10. Questions Descriptive Predictive Prescriptive ▪ Descriptive machine learning… – Tells you what has happened
    • 11. Questions Descriptive Predictive Prescriptive ▪ Predictive machine learning… – Answers the question what will happen
    • 12. Questions Descriptive Predictive Prescriptive ▪ Prescriptive machine learning… – What will happen, when it will happen, why it will happen – Predict what will happen and prescribe how to take advantage of this future
    • 13. Best Practices © 2013 Datameer, Inc. All rights reserved.
    • 14. Lean Analytics 1. Integrate Identify Use Case 4. Visualize 2. Prepare 3. Analyze Deploy
    • 15. Data Preparation
    • 16. Descriptive Analytics
    • 17. Predictive Analytics Predictive analytics is able to discover hidden patterns in historical data that the human expert may not see. It is in fact the result of mathematics applied to data. As such, it benefits from clever mathematical techniques as well as good data. Predictive Analytics helps you discover patterns in the past, which can signal what is ahead. Descriptive vs. Predictive Analytics Descriptive Analytics answers “What happened?” Predictive Analytics answers “What will happen next?” ? ?
    • 18. Example: Predicting Churn Matt - Churned 2 days ago Scott - “Liked” our company last week John - ??
    • 19. Churn-related features Matt 3 complaints in last 6 months Opened 2 support tickets in last 4 weeks Spent a total of $1,234 buying merchandise Spent a total of $123 in services Purchased 2 items in last 4 weeks Is 34 years old Is a male Lives in Los Angeles ... Scott No complaints in last 6 months Opened 1 support ticket in last 4 weeks Spent a total of $9,876 buying merchandise Spent a total of $987 in services Purchased 12 items in last 4 weeks Is 54 years old Is a male Lives in Chicago ...
    • 20. Big Data An ever expanding ocean of data containing people and sensor data (lots and lots of it): Transaction records Social media Climate information Mobile GPS signals Healthcare Smart Grid Digital Breadcrumbs Breadth and Depth 90% of the data today created in last 2 years
    • 21. Churn-related “Big Data” features Matt 12 friends listed as customers 2 complaints from friends in last 6 months Average age of friends is 41 years old 2 friends churned in last 30 days No purchases for same items as friends 1 website visit in last 7 days 2 website pages opened during last visit Opened 3 newsletters in last 6 months ... Scott 34 friends listed as customers 1 complaint from friends in last 6 months Average age of friends is 62 years old No friends churned in last 30 days Purchased same 2 items as friends in last 2 months 3 website visits in last 7 days 5 website pages opened during last visit Opened 12 newsletters in last 6 months ...
    • 22. Building a predictive model ... Model Training Predictive Model Churned Not-churned Churn-related features Neural Networks Linear/Logistic Regression Support Vector Machines Scorecards Decision Trees Clustering Association Rules K-Nearest Neighbors Naive Bayes Classifiers ... Input Layer Data Hidden Layer Output Layer Prediction
    • 23. Why not several models? Model Ensemble Model 1 Raw Inputs Data PreProcessing Model 2 Voting Prediction . . . Model n Scores from all models are computed Majority Voting, Weighted Voting, Weighted Average, etc.
    • 24. End Goal: Predicting churn ... Model Deployment and Execution in Big Data Predictive Churn Model Churn-related Features Churn Risk Score
    • 25. From Model Building to Model Deployment (Traditionally ...) SAS, R, IBM SPSS, Perl, Python Scientist’s Desktop Java, .NET C, SQL Lost in Translation SAS, R, IBM SPSS … Production Environment Great for model building but not for scoring, even more so when it comes to Hadoop
    • 26. From Model Building to Model Deployment (with PMML) Model Deployment and Execution Model Building Angoss BigML FICO Model Builder Datameer Server IBM SPSS KNIME KXEN Microstrategy PMML PMML PMML PMML (models) (models) (models) Open Data Pervasive DataRush Deploy in minutes ... RapidMiner R / Rattle SAS SAP Business Objects Salford Systems StatSoft STASTISTICA SQL Server TIBCO Spotfire Custom Code, etc. Universal PMML Plug-in (UPPI)
    • 27. Predictive Model Markup Language PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications. It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models. PMML eliminates need for custom model deployment and ensures reliability. Models Data Transformations PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing)
    • 28. UPPI: Supported Techniques Neural Networks (neural gas, radial-basis and backpropagation) Support Vector Machines (for classification and regression) Naive Bayes Classifier (for continuous and categorical inputs) Rule Set Models Clustering Models (2-step clustering, distribution and center-based) Decision Trees (for classification and regression) General Regression Models (Cox, General and Generalized Linear Models) Regression Models (Linear, Logistic and Polynomial Regression Models) Scorecards (with support for Reason Codes) Restricted Boltzmann Machines Association Rules Multiple Models (with the possibility of having models spread over multiple PMML files) Model Ensemble (including Random Forest Models and Boosted Trees) Model Segmentation Model Chaining Model Composition Model Cascade © Zementis, Inc. - Confidential
    • 29. Demonstration Flow Descriptive Karen Predictive Modeling Alex Predictive Production Prescriptive Karen Karen
    • 30. Descriptive Analytics © 2013 Datameer, Inc. All rights reserved.
    • 31. Descriptive Analytics ▪ Answers: What caused people to churn? ▪ Clustering ▪ Column Dependencies ▪ Decision Tree
    • 32. Demonstration Flow Descriptive Karen Predictive Modeling Alex Predictive Production Prescriptive Karen Karen
    • 33. Predictive Analytics © 2013 Datameer, Inc. All rights reserved.
    • 34. Predictive Analytics ▪ Who will churn?
    • 35. Demonstration Flow Descriptive Karen Predictive Modeling Alex Predictive Production Prescriptive Karen Karen
    • 36. Prescriptive Analytics © 2013 Datameer, Inc. All rights reserved.
    • 37. Prescriptive Analytics ▪ Who will churn? Why will they churn? ▪ What can we do to support that outcome?
    • 38. Demonstration Flow Descriptive Karen Predictive Modeling Alex Predictive Production Prescriptive Karen Karen
    • 39. Q&A
    • 40. Next Steps: More about Datameer and Big Data www.datameer.com More about Zementis www.zementis.com Contact us: Alex Guazzeli aguazzeli@zementis.com Karen Hsu khsu@datameer.com Page 40

    ×