Agile deployment predictive analytics on hadoop

Agile Deployment of
Predictive Analytics on
Hadoop

Faster Insights through Open Standards
Hadoop Summit 2012

© 2012 Datameer, Inc. All rights reserved.

© 2012 Datameer, Inc. All rights reserved. Page 1

Today s Session

Ulrich Rueckert Michael Zeller
Data Scientist CEO
Datameer Zementis

After this session, you will be able to…

1.  Effectively deliver predictive solutions combining:
a.  R, KNIME & Others [Model Development]
b.  Zementis Universal PMML Plug-in [Model Deployment & Execution]
c.  Datameer [Scalable Hadoop Infrastructure]

2.  Identify PMML as a vendor-neutral & open standard to:
a.  Incorporate predictive models from virtually any commercial vendor or open source tool
b.  Apply such models on Big Data

3.  Leverage a lightweight, agile deployment process for predictive analytics to:
a.  Accelerate time-to-market
b.  Lower cost and complexity
c.  Reuse existing predictive assets


Who is Datameer?

§  “Business Intelligence on top of Hadoop”
§  Established 2009 by Hadoop and enterprise software veterans
§  Offices in Silicon Valley, New York and Germany

§  Some customers:


Who is Zementis?

§  Focus on Operational Predictive Analytics
§  Offices in San Diego and Hong Kong
§  Predictive Analytics Software Technology:
•  ADAPA® Decision Engine (Predictive Models and Rules)
•  ADAPA Add-in for Excel
•  PMML Converter
•  Universal PMML Plug-in (UPPI)

§  Global Partner Network


Big Data and Analytics

§  People and Sensor Data
•  Transaction records
•  Social media
•  Climate information 90% of the data today
created in the last 2 years
•  Mobile GPS signals
•  Healthcare
•  Smart Grid

§  Benefits from Analytics
•  Descriptive Analytics answers What happened?
•  Predictive Analytics answers What will happen next?


Operational Predictive Analytics

Score Distribution
1st Lien Stand-Alone Loans

14% Goods
Bads
12%
Poly. (Goods)
Poly. (Bads)
% Within Class

10%

8%

6%

4%

2%

0%
50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000
% of Delinquent Loans per Month
Score
90

80
% of Delinquent Loans

70
700
60
750
50 800
40 850
900
30
950
20

10

0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

Months


From Model Building to Deployment

Model Building Model Deployment
Integration / Execution

Datameer Server

PMML

PMML

PMML

(models)

(models)

(models)

PMML

UPPI

Simple Deployment & Execution
1.  Upload PMML file(s) in DAS
2.  PMML turns into custom function
3.  Seamlessly score data in Datameer


PMML
Predictive Model Markup Language

•  PMML is an XML-based language used to define
statistical and data mining models and to share these
between compliant applications.

•  Mature standard developed by the DMG (Data Mining
Group) to avoid proprietary issues and incompatibilities
and to deploy models.
Transformations
•  Supported by all leading data mining tools, commercial
and open-source.

•  Allows for the clear separation of tasks: Model
development vs. model deployment.

•  Eliminates the need for custom code and proprietary
PMML book available on model deployment solutions.
Amazon.com
•  Uniform deployment platform ensures scalability and
reliability of model execution.

PMML: Predictive Model Management
Integrating across all systems and processes

Business Process

PMML

IBM SmartCloud
Applications Amazon EC2
CRM, ERP, EXCEL, etc.


PMML: One Standard, One Process

Divisions

Service Providers
External Vendors

PMML

Applications

Demo Setup

§  End-to-end Model Development Lifecycle
§  PMML Standard as the Glue

Real-time Process
Understand
Improvement and ROI Model
Data Analysis Client s Data
Deployment

Universal

PMML

Plug-‐In

Development
Demonstrate Model Design Build Model(s) to
and Test
Model Performance Unlock Hidden Value


Demo: Annual Marketing Campaign

§  Which customers should we
target? 2011 2012
Campaign Customer
§  Split 2011 results in training Results List

and test set
§  Learn model on training set Subset for
Testing

§  Apply model on test set Fine-Tuned
Prediction
Model
§  Fine-tune model until Subset for Prediction

evaluation shows success Training Model

§  Apply final model on 2012
customer list Model
Evaluation
Campaign
Candidates


Summary

•  Open Standards vs. •  Minimize Data Movement •  Leverage Datameer UI
Proprietary Code •  Massively Parallel Execution •  Deploy in Minutes vs. Months
•  Best-of-Breed Tool Set •  Scale with Business Demand •  No Coding Skills Required

Avoid Vendor Ease of Use
Lock-in Hadoop-based Fast ROI
Scoring Paradigm

Online Resources

§  Learn More About PMML
§  Data Mining Group website http://www.dmg.org
§  Join LinkedIn PMML Discussion Group http://www.linkedin.com/groupRegistration?gid=2328634
§  Articles, on-line videos, blogs http://www.zementis.com/community.htm

§  Product Info
§  On Demand Webinar http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/

§  UPPI for Datameer http://www.zementis.com/DAS-plugin.htm


Agile deployment predictive analytics on hadoop

More Related Content

Viewers also liked

Similar to Agile deployment predictive analytics on hadoop

More from DataWorks Summit

Recently uploaded

Agile deployment predictive analytics on hadoop