More Related Content
Similar to Agile deployment predictive analytics on hadoop
Similar to Agile deployment predictive analytics on hadoop (20)
More from DataWorks Summit
More from DataWorks Summit (20)
Agile deployment predictive analytics on hadoop
- 1. Agile Deployment of
Predictive Analytics on
Hadoop
Faster Insights through Open Standards
Hadoop Summit 2012
© 2012 Datameer, Inc. All rights reserved.
© 2012 Datameer, Inc. All rights reserved. Page 1
- 2. Today s Session
Ulrich Rueckert Michael Zeller
Data Scientist CEO
Datameer Zementis
After this session, you will be able to…
1. Effectively deliver predictive solutions combining:
a. R, KNIME & Others [Model Development]
b. Zementis Universal PMML Plug-in [Model Deployment & Execution]
c. Datameer [Scalable Hadoop Infrastructure]
2. Identify PMML as a vendor-neutral & open standard to:
a. Incorporate predictive models from virtually any commercial vendor or open source tool
b. Apply such models on Big Data
3. Leverage a lightweight, agile deployment process for predictive analytics to:
a. Accelerate time-to-market
b. Lower cost and complexity
c. Reuse existing predictive assets
© 2012 Datameer, Inc. All rights reserved. Page 2
- 3. Who is Datameer?
§ “Business Intelligence on top of Hadoop”
§ Established 2009 by Hadoop and enterprise software veterans
§ Offices in Silicon Valley, New York and Germany
§ Some customers:
© 2012 Datameer, Inc. All rights reserved. Page 3
- 4. Who is Zementis?
§ Focus on Operational Predictive Analytics
§ Offices in San Diego and Hong Kong
§ Predictive Analytics Software Technology:
• ADAPA® Decision Engine (Predictive Models and Rules)
• ADAPA Add-in for Excel
• PMML Converter
• Universal PMML Plug-in (UPPI)
§ Global Partner Network
© 2012 Datameer, Inc. All rights reserved. Page 4
- 5. Big Data and Analytics
§ People and Sensor Data
• Transaction records
• Social media
• Climate information 90% of the data today
created in the last 2 years
• Mobile GPS signals
• Healthcare
• Smart Grid
§ Benefits from Analytics
• Descriptive Analytics answers What happened?
• Predictive Analytics answers What will happen next?
© 2012 Datameer, Inc. All rights reserved. Page 5
- 6. Operational Predictive Analytics
Score Distribution
1st Lien Stand-Alone Loans
14% Goods
Bads
12%
Poly. (Goods)
Poly. (Bads)
% Within Class
10%
8%
6%
4%
2%
0%
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
% of Delinquent Loans per Month
Score
90
80
% of Delinquent Loans
70
700
60
750
50 800
40 850
900
30
950
20
10
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Months
© 2012 Datameer, Inc. All rights reserved. Page 6
- 7. From Model Building to Deployment
Model Building Model Deployment
Integration / Execution
Datameer Server
PMML
PMML
PMML
(models)
(models)
(models)
PMML
UPPI
Simple Deployment & Execution
1. Upload PMML file(s) in DAS
2. PMML turns into custom function
3. Seamlessly score data in Datameer
© 2012 Datameer, Inc. All rights reserved. Page 7
- 8. PMML
Predictive Model Markup Language
• PMML is an XML-based language used to define
statistical and data mining models and to share these
between compliant applications.
• Mature standard developed by the DMG (Data Mining
Group) to avoid proprietary issues and incompatibilities
and to deploy models.
Transformations
• Supported by all leading data mining tools, commercial
and open-source.
• Allows for the clear separation of tasks: Model
development vs. model deployment.
• Eliminates the need for custom code and proprietary
PMML book available on model deployment solutions.
Amazon.com
• Uniform deployment platform ensures scalability and
reliability of model execution.
© 2012 Datameer, Inc. All rights reserved. Page 8
- 9. PMML: Predictive Model Management
Integrating across all systems and processes
Business Process
PMML
IBM SmartCloud
Applications Amazon EC2
CRM, ERP, EXCEL, etc.
© 2012 Datameer, Inc. All rights reserved. Page 9
- 10. PMML: One Standard, One Process
Divisions
Service Providers
External Vendors
PMML
Applications
© 2012 Datameer, Inc. All rights reserved. Page 10
- 11. Demo Setup
§ End-to-end Model Development Lifecycle
§ PMML Standard as the Glue
Real-time Process
Understand
Improvement and ROI Model
Data Analysis Client s Data
Deployment
Universal
PMML
Plug-‐In
Development
Demonstrate Model Design Build Model(s) to
and Test
Model Performance Unlock Hidden Value
© 2012 Datameer, Inc. All rights reserved. Page 11
- 12. Demo: Annual Marketing Campaign
§ Which customers should we
target? 2011 2012
Campaign Customer
§ Split 2011 results in training Results List
and test set
§ Learn model on training set Subset for
Testing
§ Apply model on test set Fine-Tuned
Prediction
Model
§ Fine-tune model until Subset for Prediction
evaluation shows success Training Model
§ Apply final model on 2012
customer list Model
Evaluation
Campaign
Candidates
© 2012 Datameer, Inc. All rights reserved. Page 12
- 13. Summary
• Open Standards vs. • Minimize Data Movement • Leverage Datameer UI
Proprietary Code • Massively Parallel Execution • Deploy in Minutes vs. Months
• Best-of-Breed Tool Set • Scale with Business Demand • No Coding Skills Required
Avoid Vendor Ease of Use
Lock-in Hadoop-based Fast ROI
Scoring Paradigm
© 2012 Datameer, Inc. All rights reserved. Page 13
- 14. Online Resources
§ Learn More About PMML
§ Data Mining Group website http://www.dmg.org
§ Join LinkedIn PMML Discussion Group http://www.linkedin.com/groupRegistration?gid=2328634
§ Articles, on-line videos, blogs http://www.zementis.com/community.htm
§ Product Info
§ On Demand Webinar http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/
§ UPPI for Datameer http://www.zementis.com/DAS-plugin.htm
© 2012 Datameer, Inc. All rights reserved. Page 14