Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kamanja: Driving Business Value through Real-Time Decisioning Solutions

1,525 views

Published on

This is a first presentation of Kamanja, a new open-source real-time software product, which integrates with other big-data systems. See also links: http://www.meetup.com/SF-Bay-ACM/events/223615901/ and http://Kamanja.org to download, for docs or community support. For the YouTube video, see https://www.youtube.com/watch?v=g9d87rvcSNk (you may want to start at minute 33).

Published in: Data & Analytics
  • Be the first to comment

Kamanja: Driving Business Value through Real-Time Decisioning Solutions

  1. 1. © 2015 ligaDATA, Inc. All Rights Reserved. Driving Business Value Through 
 Real-Time Decisioning Solutions July 2015 Download, Forums, Docs, Events http://Kamanja.org ligaDATA
  2. 2. 2 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Outline Motivation Case Study – Modeling Department Review Mining and Big Data Tools Solution: Predictive Markup Modeling Language (PMML) Reviewing Big Data Space and Real Time Kamanja Integration (Open Source PMML) Use Cases, Demo, Architecture
  3. 3. 3 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Audience Survey (show of hands) Data Mining Experience __% Read or heard about __% Class or competition __% Put a model into production __% Have put 10+ models in production __% Have put 75+ models in production Big Data Experience __% Read or heard about __% Class or exploration project __% Put a system into production __% System with 3+ OSS in prod __% System with 6+ OSS or PB+ in production Extensive Data Mining AND Big Data Experience __% with 10+ models AND 3+ OSS __% with 75+ models AND 6+ OSS / PB+ Overlap on extensive experience is rare This is what Kamanja helps with
  4. 4. 4 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case Study of a Modeling Department Financial Fraud Detection CONTEXT •  3 modelers, 2 data infrastructure people in department •  Over 3 dozen predictive models in production, high $$$$ and visibility •  Separate Operations group deploying models PROBLEM •  Models were getting stale •  “Spinning Plates” between short term solutions •  2 months for a full model training investigation •  2 months to put a model into production (OUCH) Had to completely re-code the preprocessing and model scoring Operations had One process to deploy a regression Operations had a different process to deploy a decision tree
  5. 5. 5 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case Study of a Modeling Department Financial Fraud Detection INDUSTRY REVIEW to answer: •  How common is it to use many algorithms or tools in a project? •  What is an easier way to deploy models?
  6. 6. 6 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA http://www.kdnuggets.com/polls/2015/analytics-data-mining-data-science-software-used.html In the industry, many algorithms and tools are used Need to simplify DEPLOYMENT
  7. 7. 7 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html Independent use of tools
  8. 8. 8 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html Tools used in combination
  9. 9. 9 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Scoring Engine (Kamanja) PMML Diagram Predictive Modeling Markup Language Training & test data (batch) Data Mining Tool File, Save As PMML PMML File PMML Producer PMML FileScoring data (real time streaming) Output data has new score field Training Project Phase Production Scoring Project Phase Full model specification PMML Consumer
  10. 10. 10 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Given industry fragmentation, PMML is a solution PMML Producers (18 companies) •  R (Rattle, PMML)* •  RapidMiner •  KNIME* PMML Consumers (12 co) •  Zementis •  IBM SPSS •  KNIME •  Microstrategy •  SAS •  Kamanja* (Open Source) •  Spark (MLib)* * = Open Source •  Weka* •  SAS Enterprise Miner PREDICTIVE Naïve Bayes Neural Net Regression Rules Scorecard Sequence SVM Time Series Trees DESCRIPTIVE / OTH Association Rules Cluster, K-Nearest Nb Text Models model ensembles & composition (i.e. Gradient Boosting)
  11. 11. 11 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case Study of a Modeling Department Financial Fraud Detection SOLUTION OBJECTIVES 1) Support a wider variety of algorithms and software (increase accuracy) 2) Decrease time on putting models into production (incr analysis time)
  12. 12. 12 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Case Study of a Modeling Department Financial Fraud Detection SOLUTION OBJECTIVES 1) Support a wider variety of algorithms and software (increase accuracy) 2) Decrease time on putting models into production (incr analysis time) SOLUTION 1) Train models in SAS Enterprise Miner, R (PMML Producers) 2) Score models with a RESTful call to a PMML Consumer (Zementis) Predictive Modeling Markup Language (PMML) is a type of XML RESULT 1) By supporting more software & algorithms – MORE ACCURATE! 2) PUT MODELS INTO PRODUCTION from 8 weeks to down to 2-5 days! Greatly increased throughput of training new models!
  13. 13. 13 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Outline Motivation Case Study – Modeling Department Review Mining and Big Data Tools Solution: Predictive Markup Modeling Language (PMML) Other Uses of Real Time Decisioning Reviewing Big Data Space and Real Time Kamanja Integration (Open Source PMML) Use Cases, Demo, Architecture
  14. 14. 14 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Other Uses of PMML or Real-Time Decisioning Complex Event Processing (CEP) •  Possibly 100’s of concurrent data streams •  Apply rule logic, select, aggregate •  Select action on elements in stream Enterprise Applications, During … •  customer call or chat: recommendations to improve service •  card transaction: offer credit increase •  web application: pre-approval •  web transaction: recommend other product(s) •  MOOC: customize training speed for the student (Custom Java Model)
  15. 15. © 2015 ligaDATA, Inc. All Rights Reserved. 15 ligaDATA Real Time Computing OSS Technology Stack Integration with Kamanja Kamanja (PMML/Java/Scala Consumer) High level languages / abstractions Compute Fabric Cloud, EC2 Internal Cloud Security Kerberos Real Time Streaming Kafka, MQ Spark* ligaDATA Data Store HBase, Cassandra, InfluxDB HDFS (Create adaptors to integrate others) Resource Management Zookeeper, Yarn*, Mesos* High Level Languages / Abstractions MLlib* (PMML Producers)
  16. 16. © 2015 ligaDATA, Inc. All Rights Reserved. 16 ligaDATA Real Time Open Source Systems (OSS) Kamanja and Spark are good Compliments Clarify with a feature list, Use case to work together
  17. 17. 17 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Higher Requirements in 
 Financial Services or Health Care Compared to social media or web apps © 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIAL 17 Legal Compliance to meet exacting technical standards •  Losing (or duplicating) a bank transaction •  Losing a medical record •  Executives or employees can GO TO JAIL What is different about these industries? •  Regulatory requirements requires 100% data protection •  Security •  Auditability •  Lineage •  ZERO data loss ligaDATA
  18. 18. 18 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA •  Migrated from mass generalized communications to real time personalized alerts •  Increased messaging effectiveness of 400% lift in conversion to digital •  Full integration with Mainframe •  Leverage streaming transaction and customer account data Business Objectives •  Reduce operating cost of Calling Centers •  Increase customer adoption of digital channels •  Interact with Customer at point of transaction IT Objectives •  Implement cost effective and scalable platform •  Satisfy financial services security and 
 compliance req.’s •  Integrate with existing core systems Driving Digital Adoption: the Bank’s Call Center © 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIAL 18 Results ligaDATA
  19. 19. © 2015 ligaDATA, Inc. All Rights Reserved. 19 ligaDATA Medical Company use of Kamanja Lines of Business Run Medco models supplying client's intelligence based upon model findings 
 (using multi tenant deployment when appropriate) Run Customer models 
 on Medco hardware
 (on Medco owned 
 customer private net) Consult/partner with Medco customers providing software solutions to be run on Customer net
  20. 20. © 2015 ligaDATA, Inc. All Rights Reserved. 20 ligaDATA •  Clinicians (knowledge experts) develop heuristic based rule set models •  The initial model was COPD (Chronic Obstructive Pulmonary Disease) risk assessment •  Models are expressed with a Domain Specific Language (DSL) they developed •  DSL models are transformed to PMML for Kamanja •  Models consume current + prior related messages over “look back period” Save the “assertions” of a patient in the database (beyond standard PMML) •  Medco plans to integrate the DSL with their ontology data modeling effort •  Goal is to generate new models as their “medical world” ontology evolves Medical Company use of Kamanja
  21. 21. © 2015 ligaDATA, Inc. All Rights Reserved. 21 ligaDATA DECISION GATEWAYS Representative Kamanja 
 Solution Architecture MAINFRAME DB2 SERVER INTEGRATION CDC Kafka Inbound Queue Kamanja Security Management Error Management Metadata Service/Cache Storage Service/Cache Message Construction Decision 
 Engine Output 
 Handler Transform Compute MakeObj Parallel DAG Executor DAG Optimizer DAG Generator Change Listner Output Generator Output Disctributor DBs Apps Notification Engine DATA SOURCES DW Customer Preferences Kafka Outbound Queue HBase - History HDFS – Long term storage Zookeeper – Resource Management ligaDATA
  22. 22. © 2015 ligaDATA, Inc. All Rights Reserved. 22 ligaDATA Performance
 Characteristics © 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIAL 22 Performance •  Throughput of million messages/second •  Uses commodity hardware Scalability •  Linear scalability vertically and horizontally •  Data partitioning support •  Runtime multi-model optimizations to supports thousands of models •  Consistent performance on hundreds of models and thousands of rules Built for IoT data volumes ligaDATA
  23. 23. Data Transformer Data History (Cassandra, HBase) Metadata (expanded in next slides) Model Runtime Output Dispatcher Kafka Queue Input Adapter Output Adapter Next Process Kamanja Engine Kamanja Execution Flow on a Node Storage Adapter ligaDATA
  24. 24. Metadata Functions (in PMML, Scala as User Defined Func (UDF)) Models (PMML Rule Set, i.e. fraud, attrition) Messages (from input queue, real time records) Containers (i.e. a record or lookup table to provide context, priors) Types (i.e. array of patients, Dr’s, types of containers) Concepts (PMML created fields, preprocessing, scores) Metadata API Elements ligaDATA
  25. 25. Metadata Metadata API Scoring Engine Manager (within a model) Model Manager (activate, control a DAG of many models) PMML Producer, or application Admin App, used by DevOps Activate PMML Model or DAG Rest API Metadata API Subsystems Configuration (Cluster, Engine, Model Compilation) Kamanja Engine ligaDATA
  26. 26. Model Runtime Kamanja Runtime Model Execution Transformer Data History / Metadata, (HBase, Cassandra, ..) Msg Storage Adapter(s) Metadata Instance Model Object Model Factory 1) Message rec by runtime engine 2) Metadata is checked To see what model is Interested in the message 3) Model object Is instantiated 5) Msgs committed to history 4) Model is executed on the Message obj 6) Output of the model is returned to the engine ligaDATA
  27. 27. If the node that crashes is a Kamanja Slave node •  The Kamanja Leader Node rebalances over all Kamanja nodes •  Each message is processed EXACTLY ONCE •  A Bank needs to process a transaction ONCE AND ONLY ONCE •  Look at the state of every message through each step If the Kamanja Leader node goes down, •  The next node on the list becomes the Leader, then rebalance COMPARE TO: •  Spark and Storm would execute each message AT LEAST ONCE (but may process a message 2, 3 or 4 times…). •  The expectation is for the application to handle possible dup. What happens when a node goes down? ligaDATA
  28. 28. © 2015 ligaDATA, Inc. All Rights Reserved. 28 ligaDATA Kamanja Integration Points •  We provide with an enterprise friendly license (No GPL License virus to infect the entire system) •  Adaptors: for any data flow Kafka, IBMs MQ, Hbase, Cassandra, InfluxDB, Zookeeper, Spark •  User Defined Functions: Provide a JAR file or Scala function •  Custom Java Model Can skip PMML, leverage Adaptors and UDFs Import generated Java code
  29. 29. 29 © 2015 ligaDATA, Inc. All Rights Reserved. ligaDATA Deploy Predictive Models and Rules in 1/100th the time it takes today •  Kamanja is an open source, real time decisioning engine •  Hardened to meet strictest requirements of Financial Services, Healthcare and scalable to handle IoT •  Kamanja Enables Developers and Data Scientists to reduce time to deploy Rules and Predictive Models •  Kamanja integrates with your Big Data ecosystem
  30. 30. © 2015 ligaDATA, Inc. All Rights Reserved. 30 ligaDATA Planned Kamanja Differentiation •  Model management, enable DevOps for models DevOps: automated testing, validation, deployment and rollback A/B testing to competitively roll out model update, scheduling •  Enterprise Level Security and Multiple-Tenancy Integration using Kerberos Role based security for model management Security at field level for models, “need to know/access” •  Multi tenancy partition internal groups in different tenancies Data isolation, resource management, SLA support •  Data Integration Built-in integrations for social data and third party data Can consume 100s of different event and document types
  31. 31. © 2015 ligaDATA, Inc. All Rights Reserved. 31 ligaDATA Planned Kamanja Differentiation •  Performance and Scale Dynamic scaling – enlarge and shrink as needed, based on load Leap in performance by generating native code (vs. Java) Cost aware execution in cloud environment •  Extensive integrations with enterprise queue, storage and indexing MQ, HBase, Cassandra, RDBMS, Elastic Search, Zookeeper •  Domain specific libraries and model templates to speed up preprocessing, business logic and algorithms
  32. 32. © 2015 ligaDATA, Inc. All Rights Reserved. Try out
 Kamanja © 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIAL Download, Forums, Docs, Events http://Kamanja.org ligaDATA

×