Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Greenplum

2,555 views

Published on

In this session we explore a case study of a large-scale government fraud detection program that prevents billions of dollars in fraudulent payments each year leveraging the beta release of the GemFire+Greenplum Connector, which is planned for release in GemFire 9. Topics will include an overview of the system architecture and a review of the new GemFire+Greenplum Connector features that simplify use cases requiring a blend of massively parallel database capabilities and accelerated in-memory data processing.

Published in: Technology
  • Be the first to comment

#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Greenplum

  1. 1. 2© 2016 Pivotal Software, Inc. All rights reserved. 2© 2016 Pivotal Software, Inc. All rights reserved. Large Scale Fraud Analytics GemFire Greenplum Connector (G2C)
  2. 2. 3© 2016 Pivotal Software, Inc. All rights reserved. Background Ÿ  Government fraud revenue retention program Ÿ  Detecting & retaining ~$5B annually –  Primary focus on identity theft –  Processes up to 8 million cases per day –  Current & historic data size ~60 TB (compressed) Ÿ  Modifying architecture to integrate GemFire for scalable Java-based business logic, web service integration, and event driven design
  3. 3. 4© 2016 Pivotal Software, Inc. All rights reserved. Fraud Systems Simplified Prepare •  Ingest •  Restructure (ETL) Score •  Model Evaluation Disposition •  Business Logic •  Prioritization Respond •  Investigation •  Stop Payments Business Logic Engine ETL Reporting In-db Analytics Application Services
  4. 4. 5© 2016 Pivotal Software, Inc. All rights reserved. Case Study Architecture – Scaling Up GemFire Greenplum Spring Boot App Services Informatica w/ PWX (ETL) Business Objects (Reporting) Legacy Logic Implementation Logic Engine In-db Analytics Greenplum Prepare •  Ingest •  Restructure (ETL) Score •  Model Evaluation Disposition •  Business Logic •  Prioritization Respond •  Investigation •  Stop Payments
  5. 5. 6© 2016 Pivotal Software, Inc. All rights reserved. Pivotal Greenplum (GPDB) Ÿ  Postgres Community OSS –  Original fork of 8.2.15 –  Massively parallel processing database Ÿ  Master coordinates queries across segments databases Ÿ  Supports in-database model evaluation –  MadLib, PL/R, SAS GPDB Logical GPDB Physical GPDB Software Master Segments
  6. 6. 7© 2016 Pivotal Software, Inc. All rights reserved. Initial Implementation Ÿ  Fraud model results evaluated by business logic engine Ÿ  Flat file data extraction –  Significant custom code to construct required object model –  Table à CSV à POJO Ÿ  Shared element in an otherwise distributed system –  Performance considerations GPDB Legacy Logic Implementation
  7. 7. 8© 2016 Pivotal Software, Inc. All rights reserved. Architecture Adjustments Ÿ  New requirements introduced external integrations –  Drives desire for web-services Ÿ  Desire to improve performance & simplify codebase Ÿ  Expanding business logic –  Logic engine run as a GemFire function GemFire GPDB Legacy Logic Implementation Spring Boot (App Services)
  8. 8. 9© 2016 Pivotal Software, Inc. All rights reserved. 9© 2016 Pivotal Software, Inc. All rights reserved. GemFire Greenplum Connector
  9. 9. 10© 2016 Pivotal Software, Inc. All rights reserved. Context Greenplum! ANSI SQL Analytical Parallel Configurable Data Load GemFire!App 1App 1App 1 App 1App 1App 2 Native API Rest / HTTP Transactional Custom Apps Transactional data write behind Data Science, Analytics & ML
  10. 10. 11© 2016 Pivotal Software, Inc. All rights reserved. GemFire Greenplum Connector (G2C) Ÿ  Extension package for GemFire Ÿ  Provides simple import and export of data between GemFire regions & Greenplum tables –  Parallel data motion leveraging Greenplum’s external table interface Ÿ  Simple mapping between table rows and PdxInstance –  Flat object relational mapping –  Set of predefined type conversions –  Configurable GemFire data collocation
  11. 11. 12© 2016 Pivotal Software, Inc. All rights reserved. Greenplum Master Segments GemFire G2C Data Interfaces JDBC / ODBC Data Node Data Node Control Logic
  12. 12. 13© 2016 Pivotal Software, Inc. All rights reserved. GpdbService is the primary entry point for explicitly invoked data motion 1.  Import - loads the full table contents from Greenplum 2.  Export - sends region contents to Greenplum Sample Data Import / Export Cache cache = CacheFactory.getAnyInstance(); GpdbService gpdb = GpdbService.getInstance(cache); long count; count = gpdb.importRegion(region); count = gpdb.exportRegion(region); 1 2
  13. 13. 14© 2016 Pivotal Software, Inc. All rights reserved. Basic Cache Configuration Configured via GemFire extension framework •  1) Each region maps to a jndi data source back by Greenplum •  2) Link an entity type and table •  3) Declare a field to be used as the key •  Compound keys supported •  4) Define a mapping between the table columns •  Default auto-configuration •  Optional name and column attributes for naming convention changes •  Class used to control type conversion •  Set of built in types <region name="Parent"> <region-attributes refid="PARTITION"> <partition-attributes/> </region-attributes> <gpdb:store datasource="datasource"> <gpdb:types> <gpdb:pdx name="io.pivotal...entity.Parent" table="parent"> <gpdb:id field="id" /> <gpdb:fields> <gpdb:field name="name" /> <gpdb:field name="id" column="id" /> <gpdb:field name="income" class="java.math.BigDecimal" /> </gpdb:fields> </gpdb:pdx> </gpdb:types> </gpdb:store> </region> 2 1 3 4
  14. 14. 15© 2016 Pivotal Software, Inc. All rights reserved. Configuring Collocation Parent-child foreign key relationships supported through collocation 1.  Compound keys configurations result in a HashMap based key in GemFire 2.  Provided partition resolver works with compound keys <region name="Child"> <...> <partition-resolver> <class-name> io.pivotal.gemfire.gpdb.IdPartitionResolver </class-name> <parameter name="field"> <string>parentId</string> </parameter> </...> <gpdb:id> <gpdb:field ref="parentId" /> <gpdb:field ref="id" /> </gpdb:id> <gpdb:fields> <gpdb:field name="parentId"/> <gpdb:field name="id" /> </...> 1 2
  15. 15. 16© 2016 Pivotal Software, Inc. All rights reserved. Configuring Automatic Synchronization ●  Data exported to Greenplum via asynchronous eventing ○  Time and batch size triggers available ●  Causes each GemFire member to independently interact with Greenplum ○  Configure GPDB resource queues accordingly <region name="Child"> <...> <gpdb:store datasource="datasource"> <gpdb:synchronize mode="automatic" time-interval="3000" persistent="false" /> <gpdb:types> <...>
  16. 16. 17© 2016 Pivotal Software, Inc. All rights reserved. Case Study G2C Configuration Details Ÿ  Existing required domain objects –  Multiple many-to-one groupings Ÿ  Wide tables / objects (500+ fields) Ÿ  Data Collocation configured on caseId Ÿ  Source tables wrapped in views CaseWrapper -  caseId -  … ModelScores -  caseId -  … Documents -  caseId -  … PriorHistory -  caseId -  … OtherData… -  caseId -  … * * * * 1 LogicResults -  caseId -  …
  17. 17. 18© 2016 Pivotal Software, Inc. All rights reserved. Simple Loading – Single Table per Object :LoadTrigger :GPDBService :Region :AsyncEventLister :LogicEngine results:Region Import() put() processEvents() process() put()
  18. 18. 19© 2016 Pivotal Software, Inc. All rights reserved. Complex Loading – Multiple Tables per Object :MergeLoader :GPDBService :Region :LogicEngine results:Region Import() put() process() put() par assemble() :LoadTrigger executeFunction()
  19. 19. 20© 2016 Pivotal Software, Inc. All rights reserved. Impacts & Results Ÿ  Simplified implementation & code reduction Ÿ  Maintained or improved data motion rates –  Case study CPU bound –  Additional improvements in the backlog Ÿ  Improved system throughput
  20. 20. 21© 2016 Pivotal Software, Inc. All rights reserved. 21© 2016 Pivotal Software, Inc. All rights reserved. Questions?
  21. 21. Join the Apache Geode Community! •  Check out: http://geode.incubator.apache.org •  Subscribe: user-subscribe@geode.incubator.apache.org •  Download: http://geode.incubator.apache.org/releases/

×