Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Powering Real­time Decision Engines in Finance and Healthcare using Open Source Software


Published on

Financial services and healthcare companies could be the biggest beneficiaries of big data. Their real­time decision engines can be vastly improved by leveraging the latest advances in big data analytics. However, these companies are challenged in leveraging Open Software Systems (OSS). This presentation covers how, in collaboration with financial services and healthcare institutions, we built an OSS project to deliver a real­time decisioning engine for their respective applications. I will address two key issues. First, I will describe the strategy behind our hiring process to attract millennial big data developers and the results of this endeavor. Second, I will recount the collaboration effort that we had with our large clients and the various milestones we achieved during that process. I will explain the goals regarding big data analysis that our large clients presented to us and how we accomplished those goals. In particular, I will discuss how we leveraged open source to deliver a real­time decisioning software product called Kamanja to these institutions. An advantage of developing applications in Kamanja is that it is already integrated with Hadoop, Kafka for real­time data streaming, HBase and Cassandra for NoSQL data storage. I will talk about how these companies benefited from Kamanja and some of challenges we had in the design of this software. I will provide quantifiable improvements in key metrics driven by Kamanja and interesting, unsolved problems/challenges that need to be addressed for faster and wider adoption of OSS by these companies.

Published in: Data & Analytics
  • Be the first to like this

Powering Real­time Decision Engines in Finance and Healthcare using Open Source Software

  1. 1. © 2015 ligaDATA, Inc. All Rights Reserved. Powering Real-time Decisioning for Financial & Healthcare using Open Source August 2015 Community @
  2. 2. 2 © 2015 ligaDATA, Inc. All Rights Reserved. In ’14 the bank embarked on transforming how they leverage their data using Open Source & Big Data technologies.
  3. 3. 3 © 2015 ligaDATA, Inc. All Rights Reserved. To achieve this goal with
 the bank we needed to: 1.  Create a framework 
 to adopt Open Source Software 2.  Need a catalyst to attract and retain 
 the talent
  4. 4. © 2015 ligaDATA, Inc. All Rights Reserved. 4 Marissa Meyer of Yahoo won’t have to go in front of the senate to explain why 100,000 records were lost – Barbara Desoer of CitiBank would. What is different about Financial Services? ü  Regulatory requirements requires 100% data protection ü  Security & Data governance ü  Auditability ü  Lineage ü  ZERO data loss ü  Integration with legacy ecosystem ü  Skillset Open Source in Financial Services
 Good enough for Internet companies isn't good enough!
  5. 5. © 2015 ligaDATA, Inc. All Rights Reserved. 5 A modified “Crossing the Chasm” view for OSS OSS – Adoption Chasm Why Financial Services have not adopted OSS more aggressively? Creators Contributors Users Creators
 Technology Organizations, Rich resources, Solving a problem, Creating a competitive advantage Contributors
 Technology Organizations, taking a risk while Solving a problem Users Lower Technology Skillset, Low risk tolerance, Solving a problem
  6. 6. © 2015 ligaDATA, Inc. All Rights Reserved. 6 Establish the BOSS framework for the consumption and contribution to open source software (OSS) at scale in the Bank . Bank Open Source Software (BOSS) Contribution to OSS by enhancing existing open source projects, documentation, fixes, enhancements Initiation of a new OSS project, championing and facilitating OSS community development and consumption Evaluation & Consumption of OSS Maturing Capability Consumption Contribution Bank Current Focus Step Change Pioneering Target BOSS optimises Consumption, enables Contribution and Creation •  Input from stakeholders, internal and external influenced BOSS framework definition •  OSS advisory board to steer and drive •  Pre-approved licenses types per use case (consumption and contribution) •  Invest in enabling technology, GitHub, Black Duck, Sonatype •  No new governance steps, leverage and streamline existing controls instead of creating new ones Creation
  7. 7. © 2015 ligaDATA, Inc. All Rights Reserved. 7 BOSS framework is designed based on guidance and feedback received from key representatives within the Bank and from leading open source contributors and fellow banks . Technology Internal External BOSS – Collective Thought Process Retail Investment Cards Legal Risk Security Sourcing Business Units Control Functions Data Design Infra
  8. 8. © 2015 ligaDATA, Inc. All Rights Reserved. 8 Millennial developers … •  Grew up using OSS •  Unaware of Closed Source software •  Want to engage, share and contribute Real-time using Kamanja was selected as a capability big enough, important enough to build a Center of Excellence around it. Attracting and Retaining talent
  9. 9. © 2015 ligaDATA, Inc. All Rights Reserved. 9 Individual Events Decisioning, Detection In-context 
 and online Cross section
 of events Analytics, 
 MI Offline, 
 Longer cycle Deriving Decisions
 from Big Data BATCH REAL-TIME
  10. 10. © 2015 ligaDATA, Inc. All Rights Reserved. 10 customer-centric product design require Real-time decisions Triggers Scoring Notifications Alerts Transactional Updates Deriving an Opportunity or Threat E N D - T O - E N D C A P A B I L I T Y Tracking & Analyzing (processing) Streams of Information
 (real-time) About Things That Happen (events) Actions Real-time
  11. 11. 11 © 2015 ligaDATA, Inc. All Rights Reserved. LigaDATA introduced Kamanja – 
 an open source real-time decisioning project, hardened for Financial Services & Healthcare requirements and scalable to IoT level data volumes enabling low latency use cases. Customer 
 retention Risk Analysis Customer Contact Cyber Crime Fraud Security & Compliance Audit & Governance U S E C A S E S Marketing Telephony Interception Real-Time Offer
  12. 12. 12 © 2015 ligaDATA, Inc. All Rights Reserved. Uses of 
 Real-Time Decisioning Complex Event Processing (CEP) •  A few to possibly 100’s of concurrent data streams •  Apply rule logic, select, aggregate •  Decide action on elements in stream Enterprise Applications, During … •  customer call or chat: recommendations to improve service •  card transaction: offer credit increase •  web application: pre-approval •  web transaction: recommend other product(s)
  13. 13. 13 © 2015 ligaDATA, Inc. All Rights Reserved. Case Study of a Modeling Department Monitor $80B of consumer bank transactions / year to detect fraud (between 1,400 banks) PAIN POINT: ~2 months to deploy (model group was different from deployment group) INDUSTRY REVIEW to answer: •  How common is it to use many algorithms or tools in a project? •  What is an easier way to deploy models?
  14. 14. 14 © 2015 ligaDATA, Inc. All Rights Reserved. Independent use of tools
  15. 15. 15 © 2015 ligaDATA, Inc. All Rights Reserved. Tools used in combination
  16. 16. 16 © 2015 ligaDATA, Inc. All Rights Reserved. Scoring Engine (Kamanja) PMML Diagram
 Predictive Modeling Markup Language Training & test data (batch) Data Mining Tool File, Save As PMML PMML File PMML Producer PMML FileScoring data (real time streaming) Output data has new score field Training Project Phase Production Scoring Project Phase Full model specification PMML Consumer
  17. 17. 17 © 2015 ligaDATA, Inc. All Rights Reserved. Given industry fragmentation, PMML is a solution PMML Producers (18 companies) •  R (Rattle, PMML) •  RapidMiner •  KNIME PMML Consumers (12 co) •  Zementis •  SAS •  IBM SPSS •  KNIME •  Microstrategy •  Kamanja •  JPMML •  Spark (MLlib) (Open Source) •  Weka •  SAS Enterprise Miner PREDICTIVE Naïve Bayes Neural Net Regression Rules Scorecard Sequence SVM Time Series Trees DESCRIPTIVE / OTH Association Rules Cluster, K-Nearest Nb Text Models model ensembles & composition (i.e. Gradient Boosting)
  18. 18. © 2015 ligaDATA, Inc. All Rights Reserved. 18 Real Time Computing OSS Technology Stack Integration with Kamanja Kamanja (PMML/Java/Scala Consumer) High level languages / abstractions Compute Fabric Cloud, EC2 Internal Cloud Security Kerberos Real Time Streaming Kafka, MQ Spark* ligaDATA Data Store HBase, Cassandra, InfluxDB HDFS (Create adaptors to integrate others) Resource Management Zookeeper, Yarn*, Mesos* High Level Languages / Abstractions MLlib* (PMML Producers)
  19. 19. © 2015 ligaDATA, Inc. All Rights Reserved. 19 Performance
 Characteristics © 2015 ligaDATA, Inc. All Rights Reserved. 19 Performance •  Throughput of million messages/second •  Uses commodity hardware Scalability •  Linear scalability -- horizontally •  Data partitioning support •  Runtime multi-model optimizations to support thousands of models •  Consistent performance on hundreds of models and thousands of rules Built for IoT data volumes
  20. 20. © 2015 ligaDATA, Inc. All Rights Reserved. 20 •  Clinicians (knowledge experts) develop heuristic based rule set models •  The initial model was COPD (Chronic Obstructive Pulmonary Disease) risk assessment •  Support of referenced Beneficiary, HL7, Inpatient Claim, and Outpatient Claim •  Models are expressed with a domain specific language (DSL) they developed •  DSL models are transformed to PMML for Kamanja •  Models consume current + prior related messages over “look back period” Save the “assertions” of a patient in the database (beyond standard PMML) “State” can evolve over time •  The “Medical Company” plans to integrate the DSL with their ontology data modeling effort •  Goal is to generate new models as their “medical world” ontology evolves Medical Company use of Kamanja
  21. 21. © 2015 ligaDATA, Inc. All Rights Reserved. Try out © 2015 ligaDATA, Inc. All Rights Reserved. CONFIDENTIAL Community @