Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Evolution of Big Data Pipelines at Intuit

1,034 views

Published on

The Evolution of Big Data Pipelines at Intuit

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The Evolution of Big Data Pipelines at Intuit

  1. 1. The Evolution of Big Data Pipelines At Intuit June 30, 2016 #hadoopsummit #HS16SJ
  2. 2. Your Speakers Lokesh Rajaram Senior Software Engineer, Intuit likes Photography Rekha Joshi Principal Software Engineer, Intuit Currently likes Chopped
  3. 3. The Plan
  4. 4. Unicellular Amoeba Multicellular Humans
  5. 5. Cannot Evolve? Disappear.. Gone!
  6. 6. Evolution of Big Data
  7. 7. Our Mission To improve our customers’ financial lives so profoundly … they can’t imagine going back to the old way!
  8. 8. Consumers Small Businesses Accounting Professionals Who we serve
  9. 9. 42M 2.3M 7MFile their own taxes with TurboTax Run their small businesses with QuickBooks Manage their personal finances with Mint The Numbers Are Growing 65+ Applications, 25% of US GDP
  10. 10. Era of Windows Era of Web E ra of th e Cl o u d Era of DOS Intuit - An Evolution Case Study Compliant data M o bi le Fi rs t 1980s 1990s 2000s • Employees: 150 • Customers: 1.3M customers • Revenue: $33M • Employees: 4,500 • Customers: 5.6M • Revenue: $1.04B • Employees: 7,700 • Customers: 37M • Revenue: $4.2B 20162010 Regulatory data Transactional data Batch data Real time data Complex, secure data
  11. 11. Data Is The Decision Maker
  12. 12. Evolution of Big Data Pipelines – The Need Secure Cloud Environment Single Cohesive Data Pipeline AB Testing Personalization Streaming Profile Store Fraud Detection Support Varied Use Cases and more..
  13. 13. Evolution of Big Data Pipelines Thin Slices - Minimal Viable Product
  14. 14. Evolution of Big Data Pipelines – The Recipe Taking the Data In Transforming Data Handling The Indigestion With Scale
  15. 15. Evolution of Big Data Pipelines – The Recipe No Snowflakes Solutions Getting Vested Stakeholders Agreements Establishing The Standards
  16. 16. Evolution of Big Data Pipelines – The Recipe Breaking The Silos Moving Organization In One Direction
  17. 17. Evolution of Big Data Pipelines – The Recipe ● Making The Configuration Knobs Work ● At Scale o Latency o Throughput ● Schema, PII, Metadata, Changes, Audit, Governance ● Controlled Access←→ Innovation ● Error Monitoring ● Cluster Deployment
  18. 18. Organization Evolution  Data Evolution
  19. 19. SDK User-entered data Apache Kafka Collector: User-entered and clickstream data Real-time processing Personalization Engine Profile Store Big Data Pipeline Slice View
  20. 20. Big Data Pipeline Components
  21. 21. Monitoring The Pipeline AWS resource alarms Custom App MetricsJVM and App Metrics Custom process alerts Logging and alert
  22. 22. Evolution In Stages
  23. 23. Evolution - Stage 0: Disparate And Chaotic Disparate Databases
  24. 24. Data Pipeline (an example) • Collect event stream data into one location • Handle ~ 200k events / sec • Payload ~ 3-5KB • Enrich message and load it into Hive in defined SLA
  25. 25. Evolution - Stage 1 Event Stream Oozie Sqoop Netezza Loader Hive QL operations Storm Samza Flume
  26. 26. Evolution - Stage 2 Event Stream { ReST }
  27. 27. Evolution - Stage 3 (HA & DR) SDK { ReST } SDK { ReST } Mirrorin g
  28. 28. Challenges & Opportunities
  29. 29. Set of Changes • Network upgrades • Increase pipe • Broker • Mirrormaker • Host TCP
  30. 30. Evolution - Stage 4 (Streaming + Batch) SDK { ReST } SDK { ReST } Mirrorin g
  31. 31. Evolution - Stage 5 (Cloud only) SDK { ReST } Kafka Connectors
  32. 32. Evolution - Stage 5 (Cloud only - Future state)
  33. 33. Pipeline Essentials SDK { ReST } SDK { ReST }
  34. 34. Traffic Rate Monitoring
  35. 35. Trust by Verification • Test all Observable End-points • Functional • Data Loss • Data Parity • Measure for SLA • Baseline Tests
  36. 36. Interested in Joining? goo.gl/BLPfyR
  37. 37. Thank You!

×