Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016


Published on

Learn how Equifax leverages Couchbase in their Couchbase Connect 2016 presentation.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

  1. 1. Confidential and Proprietary CONNECTING THE DOTS WITH COUCHBASE Nov 2016 Jay Duraisamy Gijun Lee
  2. 2. Confidential and Proprietary 2 Presenters Jay Duraisamy – VP Technology • Currently leading Data Platforms group within Equifax, a core platform organization that supports US Consumer Information Solutions group ($1.3 billion). The group is responsible for petabyte scale infrastructure (5PB's and growing) for both offline (MPP, and Big Data) and Online (incl. NoSQL). • 18 years of Industry experience in building teams and platforms leveraging expertise in software architecture and design philosophies. Worked as a developer, lead and architect on B2B, B2C and Big Data technologies. Graduate degree from Indian Institute of Technology and MBA from Goizueta Business School from Emory University, Atlanta. • Enjoy Jogging, Reading and spending time with his twin daughters! Gijun Lee - Application Developer IV • Currently working on Equifax B2B data platform that supports US consumer data analytics & processing both offline & online. Recently developed B2B Rest service that serves financial history of U.S. consumers on Couchbase in Java. • 16 years of design/development experience in financial applications including infrastructure, online/offline data analytics & processing in C/C++ & Java on Linux/Unix platform. Huge interest in NoSQL & Hadoop platform in Big Data space. Master of Science in Computer Science from University of Arkansas at Fayetteville. • Enjoy hiking, watching movies, and travelling with family.
  3. 3. Confidential and Proprietary 3 Equifax & The Business of Big Data • An Information Technology company that operates in 24 countries. • A consumer credit company grown into a leading provider of insights and knowledge that helps its customers make informed decisions. • The company organizes, assimilates and analyzes data on more than 820 million consumers and more than 91 million businesses worldwide, and its database includes employee data contributed from more than 5,000 employers. • Big Data before Big Data • First MPP/Grid Computing in 2003, currently in production • Focused on high throughput systems to deliver terabytes of data and Insights to FI’s and Banks • Petabytes in Scale • Talent that can distinguish and gain between low latency and high throughput trade offs. Big Data Why NoSQL?
  4. 4. Confidential and Proprietary 4 Big Data Online & The Teamwork
  5. 5. Confidential and Proprietary Technology Requirements PLAN EVALUATE Q1 ‘16 Q4 ’15 Q1’16 Q2’16 INTEGRATEBUILD Potential Timeline LAUNCH Next steps 1. Keep in mind of the tight SLA (5ms) and timeline for Q2’16 launch 2. Evaluate Technologies – Redis, Mongo and Couchbase 3. Grade the Technical support from the Partners during the evaluation 4. Choose the winning Technology Partner and Negotiate the Software agreement 5. Build, Integrate, Deploy and Run  Key Value Store • Key to retrieve data, no complex queries • NoSQL document – Complex data objects with no normalization  Ever Growing Data • Current use case is little over a TB, but plan for other use cases • Scale for Multi-terra bytes of data expected in the future.  High Performance & Availability • System uptime and replication for fault tolerance. • DR Capabilities  Others • Application Development friendly • Integration with Hadoop, Spark and Elastic Search
  6. 6. Confidential and Proprietary 6 The Winner is …  In Memory and Disk  Key Value Store - ForestDB  Distributed Documented Database  Automatic Replication  Integrated Caching  Primary and Secondary Indexes  Spatial Querying  LDAP integration and admin auditing  Master-Master and Master-Slave Replication  Memcached Protocol and Restful HTTP API  N1QL – SQL-like query language  Multi-dimensional scaling  Cross data center replication filtering - XDCR
  7. 7. Confidential and Proprietary 7 NuDB – Architecture
  8. 8. Confidential and Proprietary NuDB Development 8 Storage Format • 24 month trended credit data in JSON • App specific metadata • Compression with base64 encoding Interface • JSON based HTTP Post • Retrieve, Update, Add and Delete operations • Spring MVC to marshal request response App Server • App server in Tomcat shields Couchbase as backend • Simple drop installation • DAO to decouple Database transactions Data Ingestion • Online live system, Ingest data faster with little downtime • RxJava, multi-threaded parallel loader • Programmed in Java
  9. 9. Confidential and Proprietary 9 NuDB Deployment Cluster App Server 8 Node Cluster with 2x replication, 100% data cached in memory, RAID 10 mirroring 2 Linux ETL server as App Server w/ failover, Load balancing with F5 Monthly import and export via Control-M scheduler when cluster is live, No impact to production System generated transactions to monitor health, Transactions aggregated time monitored Regular transactions extractions to UAT to monitor for verification and validation XDCR to handle Cluster Replication. No coding required Ingestion Monitoring Sampling DR
  10. 10. Confidential and Proprietary 10 NuDB – Lessons Learned Data Compression RxJava View and Consoles • Compression friendly internal data format • Compression saved 70% in document size • Compression helps nullify the additional storage needed for replication • Compression helps in data import. IO bound operations with 50% increase in CPU clock time • Hadoop based import tool was replaced by RxJava • RxJava utilizes resources better • 300 million documents (1TB) in 40 minutes with 2 Java processes • Exported 50million transactions in 10 minutes with 1 Java process • Need to identify the latest updated transactions • Initial design was to use Kafka asynchronous and switched to Couchbase views • Operations team uses Views to analyze data. No additional coding required • Couchbase health via Console
  11. 11. Confidential and Proprietary 11 Performance and Stress Testing • 8 external servers with 2 threads per server • 15 hours of continuous transactions • Estimated 115 million transactions • Average transaction time is 60ms • Only failure observed was due to log filling disk after 15 hours Stress Testing • 2133 Ops/Sec in Debug mode. • 500K to 1.6million Ops/Sec with Couchbase Pillowfight load test tool • System can support up to an estimated 250million transactions/day approximately Performance Sample Stats
  12. 12. Confidential and Proprietary 12 INTHENEWS
  13. 13. Confidential and Proprietary 13 Questions?