Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

More Related Content

You Might Also Like

Related Books

Free with a 30 day trial from Scribd

See all

Data Warehousing Trends

  1. 1. Data Tech Talk ft. external speaker Chris Riccomini, Author, Engineer & Manager, … 1. Talk: Data Warehousing Trends 2. Open Dialogue: Q & A
  2. 2. Data Warehousing Trends 2021/12/09 · Chris Riccomini
  3. 3. Hi, I’m Chris ● Engineer & Manager WePay, LinkedIn, PayPal ● Open Source Apache Samza, Apache Airflow, Debezium ● Author The Missing README ● Investor & Advisor Prefect, Meroxa, StarTree, Amundsen, Anomalo, TopCoat, ... @criccomini
  4. 4. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  5. 5. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  6. 6. You are here
  7. 7. Realtime DWHs
  8. 8. Batch ETL
  9. 9. Pubsub
  10. 10. Realtime DWH
  11. 11. Realtime DWH
  12. 12. Why Realtime DWHs? ● Debugging ○ Investigate application errors ○ Audit log shows how things changed ● Operational ○ Monitoring ○ Scripts that pull from DWH ● Security/Compliance ○ Audit log ● Customer data products ○ Ad hoc customer reports (e.g. Stripe Sigma, WePay txns) ○ Data clean rooms
  13. 13. Realtime DWHs Technical Advantages ● Handles hard deletes ● No schema requirements (timestamps) ● Replay from Kafka ● Data integration
  14. 14. Realtime DWH Drawbacks ● Operationally complex ● Depends on source DB support (for CDC) ● Inline transformation is harder ● Fixing bad data is harder
  15. 15. Data Mesh
  16. 16. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
  17. 17. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0 wat.
  18. 18. “A product is any item or service you sell to serve a customer's need or want.” Data is a Product https://www.aha.io/roadmapping/guide/product-management/what-is-a-product
  19. 19. ● Customers ○ Data scientists ○ Business analysts ○ Finance ○ Sales ○ Product managers ○ Engineers ○ External customers ● Products ○ Recommender systems ○ Billing ○ Fraud ○ Reports ○ Dashboards Data is a Product https://cnr.sh/essays/what-the-heck-data-mesh
  20. 20. ● Versioned ● Compatible ● Documented ● Monitored ● Self-serve ● Secure (AuthN, AuthZ) Treat Data Models like APIs https://cnr.sh/essays/what-the-heck-data-mesh
  21. 21. ● Microservice ● DevOps We’ve Done This Before https://cnr.sh/essays/what-the-heck-data-mesh
  22. 22. Headless BI
  23. 23. Metrics then ● BI tools to create and visualize metrics ○ Looker ○ Mode ○ Tableau ○ Data Studio ● Answer internal business questions ○ How is a product's health? ○ What does revenue look like?
  24. 24. Metrics now ● Metrics matter for external business workflows ○ Predicting when a customer might churn ○ Notifying users when they reach their capacity limit ○ DS wants to create models to optimize certain metrics ○ Computing customer bills ● BI tools aren’t meant for this ○ Walled garden ○ Have to re-implement the same metrics in different systems
  25. 25. “For data consumption, we heard complaints from decision makers that different teams reported different numbers for very simple business questions, and there was no easy way to know which number was correct.” "...the teams that own metrics would be able to define them once, in a way that’s consistent across dashboards, automation tools, sales reporting, and so on. Let’s call this ‘Headless BI’." Headless BI https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70 https://basecase.vc/blog/headless-bi
  26. 26. Headless BI ● Programmatically manage business metrics ○ Automated ○ Centralized ○ Documented ○ Validated ○ Metadata/lineage ○ Backfills ○ Cost ○ Privacy ○ Access ○ Deprecation ○ Retention https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
  27. 27. Headless BI https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
  28. 28. Q&A ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz
  29. 29. Appendix
  30. 30. Analytics Engineering Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions. While a data analyst spends their time analyzing data, an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base. https://www.getdbt.com/what-is-analytics-engineering
  31. 31. Analytics Engineering ● Job ○ Building ○ Testing ○ Cataloging ● Tools ○ DBT ○ Airflow ● Customers ○ Data science ○ Data analysts ○ BI ○ Reporting
  32. 32. Data Catalogs ● Flavor of the month ○ Amundsen ○ DataHub ○ Metaphor ○ Marquez ○ Atlan ○ Collibra ○ Alation ● Use cases ○ Discoverability ○ Operations ○ Governance
  33. 33. “Reverse ETL syncs data from a system of records like a warehouse to a system of actions like CRM, MAP, and other SaaS apps to operationalize data.” Reverse ETL https://blog.getcensus.com/what-is-reverse-etl/
  34. 34. Reverse ETL https://blog.getcensus.com/what-is-reverse-etl/
  35. 35. ● https://unsplash.com/photos/Lbvi0GGJWY4 ● https://unsplash.com/photos/8vTAAFYhFfQ ● https://www.flickr.com/photos/jurgenappelo/5201275209 ● Photos
  36. 36. Realtime DWHs

×