Before I go into the demonstrations, I want to orient you to the environment in which we’ll do this demonstration. Hortonworks sandbox, Datameer on topSee datameer (administration->hadoop cluster) and running on hadoop clusterSee administration in hortonworks (Pig, …)Go to job browser (take out hue from username) and see the jobs and that running Datameer jobs (point out maps and reduces)You can get all of this from the Hortonworks site and datameer.
Neural networks are known for having good prediction quality. But they’re bad in being understand and why the predicions are happening. But now we understand why neural network did to understand them better.
About our Speakers
Dr. Alex Guazzelli
Zementis Vice President, Analytics (@DrAlexGuazzelli)
Dr. Alex Guazzelli has co-authored the first book on PMML, the
Predictive Model Markup Language. At Zementis, Dr. Guazzelli is
responsible for developing core technology and analytical
solutions for Big Data and real-time scoring. Most recently, Dr.
Guazzelli started teaching a class on standards for predictive
analytics at UC San Diego Extension.
About our Speakers
Datameer Senior Director, Product Marketing (@Karenhsumar)
Over 15 years of enterprise software
Co-authored 4 patents
Bachelors of Science degree in
Management Science and Engineering
from Stanford University
Worked in a variety of engineering,
marketing and sales roles
Came from Infomatica
Worked with start-ups
Infomatica purchased to bring data
solutions to market
Master data management
Data security solutions
▪ SAS, SPSS, R
▪ Descriptive machine learning…
– Tells you what has happened
▪ Predictive machine learning…
– Answers the question what will happen
▪ Prescriptive machine learning…
– What will happen, when it will happen, why
it will happen
– Predict what will happen and prescribe how
to take advantage of this future
Predictive analytics is able to discover hidden patterns in historical data that the
human expert may not see. It is in fact the result of mathematics applied to data.
As such, it benefits from clever mathematical techniques as well as good data.
Predictive Analytics helps
you discover patterns in the
past, which can signal what
Descriptive vs. Predictive Analytics
Descriptive Analytics answers “What happened?”
Predictive Analytics answers “What will happen next?”
Example: Predicting Churn
Matt - Churned 2 days ago
Scott - “Liked” our company last week
John - ??
3 complaints in last 6 months
Opened 2 support tickets in last 4 weeks
Spent a total of $1,234 buying merchandise
Spent a total of $123 in services
Purchased 2 items in last 4 weeks
Is 34 years old
Is a male
Lives in Los Angeles
No complaints in last 6 months
Opened 1 support ticket in last 4 weeks
Spent a total of $9,876 buying merchandise
Spent a total of $987 in services
Purchased 12 items in last 4 weeks
Is 54 years old
Is a male
Lives in Chicago
An ever expanding ocean of data containing
people and sensor data (lots and lots of it):
Mobile GPS signals
Breadth and Depth
90% of the data today
created in last 2 years
Churn-related “Big Data” features
12 friends listed as customers
2 complaints from friends in last 6 months
Average age of friends is 41 years old
2 friends churned in last 30 days
No purchases for same items as friends
1 website visit in last 7 days
2 website pages opened during last visit
Opened 3 newsletters in last 6 months
34 friends listed as customers
1 complaint from friends in last 6 months
Average age of friends is 62 years old
No friends churned in last 30 days
Purchased same 2 items as friends in last 2 months
3 website visits in last 7 days
5 website pages opened during last visit
Opened 12 newsletters in last 6 months
Building a predictive model ...
Support Vector Machines
Naive Bayes Classifiers
Why not several models?
Scores from all
End Goal: Predicting churn ...
Model Deployment and Execution in
From Model Building to Model Deployment
SAS, R, IBM
SAS, R, IBM SPSS …
Great for model building
but not for scoring, even
more so when it comes
From Model Building to Model Deployment (with
FICO Model Builder
Deploy in minutes ...
R / Rattle
SAP Business Objects
Custom Code, etc.
Predictive Model Markup Language
PMML is an XML-based language used to define statistical and data mining
models and to share these between compliant applications.
It is a mature standard developed by the DMG (Data Mining Group) to avoid
proprietary issues and incompatibilities and to deploy models.
PMML eliminates need for custom model deployment and ensures reliability.
PMML defines a standard not only to represent data-mining
models, but also data handling and data transformations
(pre- and post-processing)