Successfully reported this slideshow.
Your SlideShare is downloading. ×

Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit TurboTax Case Study

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Loading in …3
×

Check these out next

1 of 41 Ad

Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit TurboTax Case Study

Ravi Pillala, Chief Data Architect & Distinguished Engineer at Intuit
TurboTax is one of the well known consumer software brand which at its peak serves 385K+ concurrent users. In this session, We start with looking at how user behavioral data & tax domain events are captured in real time using the event bus and analyzed to drive real time personalization with various TurboTax data pipelines. We will also look at solutions performing analytics which make use of these events, with the help of Kafka, Apache Flink, Apache Beam, Spark, Amazon S3, Amazon EMR, Redshift, Athena and Amazon lambda functions. Finally, we look at how SageMaker is used to create the TurboTax model to predict if a customer is at risk or needs help.

Ravi Pillala, Chief Data Architect & Distinguished Engineer at Intuit
TurboTax is one of the well known consumer software brand which at its peak serves 385K+ concurrent users. In this session, We start with looking at how user behavioral data & tax domain events are captured in real time using the event bus and analyzed to drive real time personalization with various TurboTax data pipelines. We will also look at solutions performing analytics which make use of these events, with the help of Kafka, Apache Flink, Apache Beam, Spark, Amazon S3, Amazon EMR, Redshift, Athena and Amazon lambda functions. Finally, we look at how SageMaker is used to create the TurboTax model to predict if a customer is at risk or needs help.

Advertisement
Advertisement

More Related Content

More from Data Con LA (20)

Recently uploaded (20)

Advertisement

Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit TurboTax Case Study

  1. 1. Ravi Pillala, Chief Data Architect & Distinguished Engineer Modernizing Analytics & AI for today’s needs: Intuit TurboTax Case Study 7/21/2022
  2. 2. ©2021 Intuit Inc. All rights reserved. 2
  3. 3. Consumers Small businesses Self-employed Who we serve
  4. 4. ©2021 Intuit Inc. All rights reserved. 4 Unique consumer and small business assets at scale
  5. 5. ©2021 Intuit Inc. All rights reserved. 5 Married 2 years ago— last year he claimed his daughter, Candace, as a dependent. This year his ex-wife will claim their daughter. Recently left his job at Toyota to work for Honda Had been renting, but just bought a condo Goal To be confident he can file easily with TurboTax, given all the changes in his life. RETURNING TURBOTAX CUSTOMER Liam
  6. 6. ©2021 Intuit Inc. All rights reserved. 6 Goal To be confident she can file easily with TurboTax to get the maximum refund possible. First time filers Liam
  7. 7. ©2021 Intuit Inc. All rights reserved. 7 ©2021 Intuit Inc. All rights reserved. 7 Intuit Confidential and Proprietary 7
  8. 8. ©2021 Intuit Inc. All rights reserved. 8 Powering Prosperity with AI and Data-driven platforms
  9. 9. ©2021 Intuit Inc. All rights reserved. 9 ©2021 Intuit Inc. All rights reserved. 9 Intuit Confidential and Proprietary 9
  10. 10. ©2021 Intuit Inc. All rights reserved. 10 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  11. 11. ©2021 Intuit Inc. All rights reserved. 11 Event Collection (From → To)
  12. 12. ©2021 Intuit Inc. All rights reserved. 12 From: Behavior Analytics - Event Collection
  13. 13. ©2021 Intuit Inc. All rights reserved. 13 Data available for consumption after 4 hours to 1 day Legacy Clickstream Architecture
  14. 14. ©2021 Intuit Inc. All rights reserved. 14 Legacy Payload fid : 75F773438B1D0E25-3DDB5C9586B1731B cc : USD ch : support c1 : TT_S_SQ_COOKIE c2 : 1588699149406 c4 : fecb4198593190599779 c5 : Customer Care c6 : sh-view c7 : Help System<mytt c14 : View>LCQ>4331716>>2>IL c19 : ViewWidget c34 : en-US c36 : websdk-prod c44 : HPArticle<MYTT:undefined<expert_approved_ugc:false v3 : display:viewWidget pageName : MYTT/sh-view v47 : https://ttlc.intuit.com/questions/4331716 WHERE ? WHAT ? WHO ? Unreadable and Difficult to Use
  15. 15. ©2021 Intuit Inc. All rights reserved. 15 To: Behavior Analytics - Event Collection Amplitude Adobe Braze
  16. 16. ©2021 Intuit Inc. All rights reserved. 16 Rainbow Properties action object What (logical) object_detail ui_action What (behavioral) ui_object_detail ui_access_point ui_object Domain purpose org scope Where screen scope_area ivid pseudonym_id Who
  17. 17. ©2021 Intuit Inc. All rights reserved. 17 Event Collection Standards (ECS) - Standard Event Tracking Example WHO WHAT org : cg purpose : prod scope : turbotax event sender name : oihs/contact-us-plugin/widget event sender purpose : care event sender scope : contactus event sender screen : questionStep event : content : engaged object : content action : engaged search term : I haven't received my refund yet and I need to know what's the problem. ui action : clicked ui object : button ui object detail : Continue workflow id : 7fa8d4d6-6fb5-41c0-b2d5-971742227b6c topic name : cg-turbotax-clickstream timestamp : 2020-05-04T06:27:41.799Z userId : 20abd451b935d4c27ad417a258f15ccba *** This example only includes a specific subset of attributes ***
  18. 18. ©2021 Intuit Inc. All rights reserved. 18 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  19. 19. ©2021 Intuit Inc. All rights reserved. 19 Intuit analytics journey before modernization Reporting silos MPP appliance Hadoop data lake New MPP appliance Migrated to Cloud
  20. 20. ©2021 Intuit Inc. All rights reserved. 20 MPP Data Lake Lift and Shift to AWS Data Sources Applications Behavioral 3rd Party Hive Metastore Data EC2 EBS … … EMR Cluster Batch Stream Processing Data Workers Tables : 50K Data : 2.5PB ETLs: 10K Queries: 500K Users: 2000 ETL Users: 60
  21. 21. ©2021 Intuit Inc. All rights reserved. 21 ETL Processing Data Lake Data Sources Applications Behavioral 3rd Party Hive Metastore Data EMR Cluster Batch Stream Data Workers AWS Glue Redshift ETL Athena Redshift Reporting Dashboards Phase 2: Migrating to Redshift (Modernizing analytics) Tables : 10K Data : 400TB ETLs: 3K Queries: 130K Users: 2000 ETL Users: N/A
  22. 22. Modernized analytics platform with Redshift Amazon Redshift managed storage Data sharing Amazon Redshift Spectrum Concurrency scaling Elasticity
  23. 23. ©2021 Intuit Inc. All rights reserved. 23 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  24. 24. ©2021 Intuit Inc. All rights reserved. 24 Processors and Pipelines ● Serial processors (e.g., reusable intermediate topic) ● Parallel processors (e.g., fleet deployment) ● Processor = Business Logic & Code ● Pipeline = Deployment & Infrastructure
  25. 25. ©2021 Intuit Inc. All rights reserved. 25 Processor CI/CD Layer UX Layer Control Layer Runtime Layer Infrastructure Layer Application Layer Pipeline CI/CD Layer Customer Experience Behind-the-scenes Tech Stack Overview
  26. 26. ©2021 Intuit Inc. All rights reserved. 26 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  27. 27. ©2021 Intuit Inc. All rights reserved. 27 Our Data Ecosystem is big, complex and messy...
  28. 28. ©2021 Intuit Inc. All rights reserved. 28 We have a lot of data which is great, but very hard to discover and figure out what to use Our Data Ecosystem is big, complex and messy... DATA LAKE DATA WAREHOUSE(S) 200,000+ Tables 3,000+ Schemas 200+ Data Sources DATA MARTS CURATED DATA RAW DATA ANALYST PROCESSED DATA SOURCES SELECT RAW DATA DATA MARTS REPORTING TABLES & more Internal External/3P Pradeep
  29. 29. ©2021 Intuit Inc. All rights reserved. 29 I am a DATA SCIENTIST building ML models and often use data produced by BU/FG Analysts. I would like to know the owner, data quality and reliability of the data I want to use. I am a BU DEVELOPER trying to see if data produced by the new service launched is being ingested accurately into the lake for downstream consumption. I am a DATA ENGINEER building pipelines for data marts and trying to choose the right data for my use-case and get alerted when metadata changes occur so I can ensure my pipelines continue to work properly. I am a BUSINESS ANALYST trying to build Dashboards to report on KPIs for a new product Feature launched. I need to find data that I can trust and use for my analysis. What are the Core Personas and why is data important to them I am a ENTITY DATA STEWARD curating Data Map entities in my domain for downstream use. I need to query the raw data to produce the entities. Our Users Veena
  30. 30. ©2021 Intuit Inc. All rights reserved. 30 Understanding user problems we need to solve What is making data discovery and exploration hard for our data workers? Where can I find the data? What does the data mean? Can I trust the data? How is the data connected? How can I get access to data? Which datasource to use? When to use what tool? Why are my queries slow? DISCOVERY EXPLORATION
  31. 31. ©2021 Intuit Inc. All rights reserved. 31 Ideal State What users need for a great Data Discovery & Exploration experience? A tool that helps our data workers to ● easily find relevant data that is well-documented, reliable & trustable by providing quality metrics like data freshness, completeness and the ability to quickly reach out to the owner for clarifications and see similar data and joins to solve the use-case ● seamlessly request for access, run queries against blazing-fast, performant engines, reuse & share their work Veena
  32. 32. solve it!
  33. 33. ©2021 Intuit Inc. All rights reserved. 33 Data Map OUR APPROACH Data Discovery Data Exploration Organize and govern data across Intuit Build a rich data discovery (catalog) experience for all our data in the lake & warehouses Buy a superior data exploration tool for all our data - powered by MDR
  34. 34. ©2021 Intuit Inc. All rights reserved. 34 Data Discovery app
  35. 35. ©2021 Intuit Inc. All rights reserved. 35 Data Exploration
  36. 36. ©2021 Intuit Inc. All rights reserved. 36 Behavioral Analytics ● Event Collection Standards ● Customer intents ● Personalization Key areas to focus Analytics at Scale ● Analytics tech stack ● Separate storage & processing ● Real-time analytics Data Discovery/Understanding ● Data Documentation ● Tools to explore data ● Data Stewardship ● Centralized Governance ● Data Lineage
  37. 37. Work In Progress!!!
  38. 38. ©2021 Intuit Inc. All rights reserved. 38 AWS Glue/ Lake Formation: Data Lake Design
  39. 39. ©2021 Intuit Inc. All rights reserved. 39 Data Lake & Data Mesh Ta x Work Commerce Finance
  40. 40. Q&A
  41. 41. ©2021 Intuit Inc. All rights reserved. 41 Intuit’s Journey ERA OF DOS ERA OF WINDOWS ERA OF WEB ERA OF MOBILE AND CLOUD ERA OF ARTIFICIAL INTELLIGENCE D A T A V O L U M E P E R C U S T O M E R 1980s 1990s 2000s 2010s 2020 to Present* Intuit Founded Customers: 1.3M Revenue: $33M Digital Footprint: MBs Customers: 5.6M Revenue: $1B Digital Footprint: GBs Customers: 29M Revenue: $3.5B Digital Footprint: TBs Customers: 102M Revenue: $9.6B Digital Footprint: PBs 2019: Analytical Platform on AWS 2021: Analytics powered by Redshift

×