Agile deployment predictive analytics on hadoop

Uploaded on


More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Agile Deployment of Predictive Analytics on Hadoop Faster Insights through Open Standards Hadoop Summit 2012 © 2012 Datameer, Inc. All rights reserved.© 2012 Datameer, Inc. All rights reserved. Page 1
  • 2. Today s Session Ulrich Rueckert Michael Zeller Data Scientist CEO Datameer Zementis After this session, you will be able to… 1.  Effectively deliver predictive solutions combining: a.  R, KNIME & Others [Model Development] b.  Zementis Universal PMML Plug-in [Model Deployment & Execution] c.  Datameer [Scalable Hadoop Infrastructure] 2.  Identify PMML as a vendor-neutral & open standard to: a.  Incorporate predictive models from virtually any commercial vendor or open source tool b.  Apply such models on Big Data 3.  Leverage a lightweight, agile deployment process for predictive analytics to: a.  Accelerate time-to-market b.  Lower cost and complexity c.  Reuse existing predictive assets© 2012 Datameer, Inc. All rights reserved. Page 2
  • 3. Who is Datameer? §  “Business Intelligence on top of Hadoop” §  Established 2009 by Hadoop and enterprise software veterans §  Offices in Silicon Valley, New York and Germany §  Some customers:© 2012 Datameer, Inc. All rights reserved. Page 3
  • 4. Who is Zementis? §  Focus on Operational Predictive Analytics §  Offices in San Diego and Hong Kong §  Predictive Analytics Software Technology: •  ADAPA® Decision Engine (Predictive Models and Rules) •  ADAPA Add-in for Excel •  PMML Converter •  Universal PMML Plug-in (UPPI) §  Global Partner Network© 2012 Datameer, Inc. All rights reserved. Page 4
  • 5. Big Data and Analytics §  People and Sensor Data •  Transaction records •  Social media •  Climate information 90% of the data today created in the last 2 years •  Mobile GPS signals •  Healthcare •  Smart Grid §  Benefits from Analytics •  Descriptive Analytics answers What happened? •  Predictive Analytics answers What will happen next?© 2012 Datameer, Inc. All rights reserved. Page 5
  • 6. Operational Predictive Analytics Score Distribution 1st Lien Stand-Alone Loans 14% Goods Bads 12% Poly. (Goods) Poly. (Bads) % Within Class 10% 8% 6% 4% 2% 0% 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 % of Delinquent Loans per Month Score 90 80 % of Delinquent Loans 70 700 60 750 50 800 40 850 900 30 950 20 10 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Months© 2012 Datameer, Inc. All rights reserved. Page 6
  • 7. From Model Building to Deployment Model Building Model Deployment Integration / Execution Datameer Server     PMML   PMML   PMML   (models)     (models)   (models)   PMML         UPPI       Simple Deployment & Execution 1.  Upload PMML file(s) in DAS 2.  PMML turns into custom function 3.  Seamlessly score data in Datameer© 2012 Datameer, Inc. All rights reserved. Page 7
  • 8. PMMLPredictive Model Markup Language •  PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications. •  Mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models. Transformations •  Supported by all leading data mining tools, commercial and open-source. •  Allows for the clear separation of tasks: Model development vs. model deployment. •  Eliminates the need for custom code and proprietary PMML book available on model deployment solutions. •  Uniform deployment platform ensures scalability and reliability of model execution.© 2012 Datameer, Inc. All rights reserved. Page 8
  • 9. PMML: Predictive Model Management Integrating across all systems and processes Business Process PMML IBM SmartCloud Applications Amazon EC2 CRM, ERP, EXCEL, etc.© 2012 Datameer, Inc. All rights reserved. Page 9
  • 10. PMML: One Standard, One Process Divisions Service Providers External Vendors PMML Applications© 2012 Datameer, Inc. All rights reserved. Page 10
  • 11. Demo Setup §  End-to-end Model Development Lifecycle §  PMML Standard as the GlueReal-time Process UnderstandImprovement and ROI Model Data Analysis Client s Data Deployment Universal   PMML     Plug-­‐In   DevelopmentDemonstrate Model Design Build Model(s) to and TestModel Performance Unlock Hidden Value © 2012 Datameer, Inc. All rights reserved. Page 11
  • 12. Demo: Annual Marketing Campaign §  Which customers should we target? 2011 2012 Campaign Customer §  Split 2011 results in training Results List and test set §  Learn model on training set Subset for Testing §  Apply model on test set Fine-Tuned Prediction Model §  Fine-tune model until Subset for Prediction evaluation shows success Training Model §  Apply final model on 2012 customer list Model Evaluation Campaign Candidates© 2012 Datameer, Inc. All rights reserved. Page 12
  • 13. Summary•  Open Standards vs. •  Minimize Data Movement •  Leverage Datameer UI Proprietary Code •  Massively Parallel Execution •  Deploy in Minutes vs. Months•  Best-of-Breed Tool Set •  Scale with Business Demand •  No Coding Skills Required Avoid Vendor Ease of Use Lock-in Hadoop-based Fast ROI Scoring Paradigm© 2012 Datameer, Inc. All rights reserved. Page 13
  • 14. Online Resources §  Learn More About PMML §  Data Mining Group website §  Join LinkedIn PMML Discussion Group §  Articles, on-line videos, blogs §  Product Info §  On Demand Webinar §  UPPI for Datameer© 2012 Datameer, Inc. All rights reserved. Page 14