SlideShare a Scribd company logo
Mukul Sood
Agenda
 Problem Statement
 Specs
 Design Decisions/Criteria
 Target State Architecture Diagram
 Conceptual Data Model
 Technical Rationalization
Problem Statement
Challenges in getting insights in expected SLA across the business due
to lack of a unified data fabric across the multiple data sources.
This includes :
 On premise enterprise data warehouse, databases
 Media publishers on web, other channels
 Other 3rd party sources
Specs
 Volume of data – 40k events/s, 100GB+ daily records
 High variability in data volume - spikes, bursts
 Real time, interactive analysis of very large Media activity data sets
unified with Publisher data
 On premise Teradata DW Publisher data, frequent adds/inserts,
infrequent updates
 BI Reporting Layer data sources - Big Data OLAP, Teradata Data Marts
 24/7 availability, ensure data quality
Design Decisions/Criteria
 Lambda Architecture, ETL+ELT, Hybrid Cloud
 Separation of concerns, layers :
 Extract
 Load, Processing
 Reporting
 Solution evolves as technology changes
 Key guiding principles :
 Agility
 Performance
 Scalability
 Reliability
 Security
 Maintainability
 Cost
Target State Architecture
Media
Website1
Media
Website2
Publishe
r3
Teradata
EDW –
Data Marts
On Premise
Extract
• Python
scripts
• Alteryx,
Fivetran,
Stitch
Load
• AWS S3
• Google
• Bucket
• Azure Blob?
Transform
• AWS Athena
• Google
• BigQuery
• Azure
HDInsights?
Data
Sources
Cloud
Reporting/
Visualization
• Tableau
• Qlik
• Power BI?
Full Load one time –
script sql client read
data, write parquet
Serverless DW,
Very fast OLAP
Rest Api
endpoints
Azure
HDInsights is not
Serverless
Facts,
Dimensions Star
schema
Power BI No
native connector
for Athena, BQ
Incremental – script
sql client determine
change, write arrow
Incremental
Transform
Pipelines
Full Load
Load
Pipelines
Orchestration Layer – Airflow – Pipeline interaction with Load, Transform
Kafka
Event
Layer Spark
Streaming
Spark
Batch
Load Job to
OLAP
OLAP – Facts,
Measures processing
Jenkins,
Docker
CI/CD
for
Pipelines
ETL – Full (One time),
Incremental (CDC)
Enrich, Cook,
Aggregate EDW
Data with OLAP
Publish Data
Extracts, Cubes to
Tableau, Qlik
Conceptual Data Model
Dim_Date
Dim_Author
Dim_Article
Dim_Publisher
Dim_Contract_Terms
Conditions
Fact_Activity_Aggregate_Daily
Dim_Contract
Pre computed
aggregate per
author, article,
publisher.
Could serve
multiple queries
Very high
volume table,
Date partition
Could be per
Publisher and
per Author
Activity
description
Fact_Activity
Number of articles read
Number of downloads
Publisher name
Number of shares
Duration
DollarsSCD Type 2
Fact_Revenue
Fact_Cost
Fast
Changing
Dimension
Confirmed
Dimensions
Revenue could be
derived from
contract revenue and
activity
Technical Rationalization, Factors
Data Volume,
Variety, Veracity,
Frequency
Measures, Grain,
Slice, Processing
volume
Query types,
SLAs,
Dashboards,
Metrics
Cost – Compute,
Storage, Network,
hosted infra
Flexible, Scalable,
Maintainable,
Reliable, Secure
Hybrid approach -
Cloud + OnPrem
Cloud for high
volume external, on
premise for
Enterprise DW
Appropriate
partitions, tuning
Very fast MPP
columnar OLAP
Athena, BigQuery
Serverless DW
minimize infra work
Use CI/CD practices,
Jenkins, Docker for
Deploy
ETL, ELT Layer –
Orchestration
(Airflow) +Pipeline
Tool(Stitch) +
Storage(S3, Google
Bucket) +
Transform (Spark)
+ OLAP (BigQuery,
Athena) + EDW
External data sources
for media activity -
Very high volume,
bursty, spike
Internal data source
for Enterprise
entities, source of
truth
Analyze query
requirements. Data
Model
Use of partitions,
keys, Fact, Dim tables
Various enrichments,
cooked data,
aggregations in
OLAP, EDW
Full Load one time,
Incremental Load on
going
Storage layer as Data
Lake
Data movement
within cloud infra
(Storage to
Transform)
Faster, easier,
maintainable pipeline
development
Establish Data
Quality Assurance,
Use Test framework
such as DBT
BI Reporting Layer
- Tableau, Qlik
Data source – OLAP,
EDW, Data Extracts
Star, Snowflake
schema to support
Ad Hoc, Canned Cloud based Native connectors for
Athena, BigQuery
SQL queries. EDW,
DataExtracts data
Questions??

More Related Content

What's hot

BI in FMCG
BI in FMCGBI in FMCG
BI in FMCG
Shweta Jain
 
B2B Digital Marketing
B2B Digital MarketingB2B Digital Marketing
B2B Digital Marketing
Digital Vidya
 
Build a Better Entrepreneur Pitch Deck
Build a Better Entrepreneur Pitch DeckBuild a Better Entrepreneur Pitch Deck
Build a Better Entrepreneur Pitch Deck
Center For Entrepreneurial Innovation
 
FMCG for Management Consultants and Business Analysts
FMCG for Management Consultants and Business AnalystsFMCG for Management Consultants and Business Analysts
FMCG for Management Consultants and Business Analysts
Asen Gyczew
 
web analytics overview
web analytics overviewweb analytics overview
web analytics overview
Masih Nabizadeh
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
Databricks
 
The Other C Word: What makes great content marketing great
The Other C Word: What makes great content marketing greatThe Other C Word: What makes great content marketing great
The Other C Word: What makes great content marketing great
Velocity Partners
 
Retail Business Management PowerPoint Presentation Slides
Retail Business Management PowerPoint Presentation Slides Retail Business Management PowerPoint Presentation Slides
Retail Business Management PowerPoint Presentation Slides
SlideTeam
 
People Don't Care About Your Brand
People Don't Care About Your BrandPeople Don't Care About Your Brand
People Don't Care About Your Brand
Slides That Rock
 
Go Viral on the Social Web: The Definitive How-To guide!
Go Viral on the Social Web: The Definitive How-To guide!Go Viral on the Social Web: The Definitive How-To guide!
Go Viral on the Social Web: The Definitive How-To guide!
XPLAIN
 
Agile Marketing PowerPoint Presentation Slides
Agile Marketing PowerPoint Presentation Slides Agile Marketing PowerPoint Presentation Slides
Agile Marketing PowerPoint Presentation Slides
SlideTeam
 
Digital Engagement Strategies
Digital Engagement StrategiesDigital Engagement Strategies
Digital Engagement Strategies
Drew Diskin
 
Business Model Canvas (Dr. Htet Zan Linn)
Business Model Canvas (Dr. Htet Zan Linn)Business Model Canvas (Dr. Htet Zan Linn)
Business Model Canvas (Dr. Htet Zan Linn)
Htet Zan Linn
 
Strategy Presentation on Amazon
Strategy Presentation on AmazonStrategy Presentation on Amazon
Strategy Presentation on Amazon
Gabbi Baker
 
The-Customer-Data-Platform-Report-2023.pdf
The-Customer-Data-Platform-Report-2023.pdfThe-Customer-Data-Platform-Report-2023.pdf
The-Customer-Data-Platform-Report-2023.pdf
VO Quang-Tri
 
Marketing on Amazon
Marketing on AmazonMarketing on Amazon
Marketing on Amazon
Martin Major
 
Digital transformation in transport and logistics
Digital transformation in transport and logisticsDigital transformation in transport and logistics
Digital transformation in transport and logistics
PostNL België
 
LinkedIn Career Services Webinar Presentation Slides
LinkedIn Career Services Webinar Presentation SlidesLinkedIn Career Services Webinar Presentation Slides
LinkedIn Career Services Webinar Presentation Slides
LinkedIn
 
How to implement Content Marketing Strategy in a large B2B enterprise
How to implement Content Marketing Strategy in a large B2B enterpriseHow to implement Content Marketing Strategy in a large B2B enterprise
How to implement Content Marketing Strategy in a large B2B enterprise
Giuseppe Caltabiano
 
Double Click for Advertisers
Double Click for AdvertisersDouble Click for Advertisers
Double Click for Advertisers
Kranthi Shaik
 

What's hot (20)

BI in FMCG
BI in FMCGBI in FMCG
BI in FMCG
 
B2B Digital Marketing
B2B Digital MarketingB2B Digital Marketing
B2B Digital Marketing
 
Build a Better Entrepreneur Pitch Deck
Build a Better Entrepreneur Pitch DeckBuild a Better Entrepreneur Pitch Deck
Build a Better Entrepreneur Pitch Deck
 
FMCG for Management Consultants and Business Analysts
FMCG for Management Consultants and Business AnalystsFMCG for Management Consultants and Business Analysts
FMCG for Management Consultants and Business Analysts
 
web analytics overview
web analytics overviewweb analytics overview
web analytics overview
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
 
The Other C Word: What makes great content marketing great
The Other C Word: What makes great content marketing greatThe Other C Word: What makes great content marketing great
The Other C Word: What makes great content marketing great
 
Retail Business Management PowerPoint Presentation Slides
Retail Business Management PowerPoint Presentation Slides Retail Business Management PowerPoint Presentation Slides
Retail Business Management PowerPoint Presentation Slides
 
People Don't Care About Your Brand
People Don't Care About Your BrandPeople Don't Care About Your Brand
People Don't Care About Your Brand
 
Go Viral on the Social Web: The Definitive How-To guide!
Go Viral on the Social Web: The Definitive How-To guide!Go Viral on the Social Web: The Definitive How-To guide!
Go Viral on the Social Web: The Definitive How-To guide!
 
Agile Marketing PowerPoint Presentation Slides
Agile Marketing PowerPoint Presentation Slides Agile Marketing PowerPoint Presentation Slides
Agile Marketing PowerPoint Presentation Slides
 
Digital Engagement Strategies
Digital Engagement StrategiesDigital Engagement Strategies
Digital Engagement Strategies
 
Business Model Canvas (Dr. Htet Zan Linn)
Business Model Canvas (Dr. Htet Zan Linn)Business Model Canvas (Dr. Htet Zan Linn)
Business Model Canvas (Dr. Htet Zan Linn)
 
Strategy Presentation on Amazon
Strategy Presentation on AmazonStrategy Presentation on Amazon
Strategy Presentation on Amazon
 
The-Customer-Data-Platform-Report-2023.pdf
The-Customer-Data-Platform-Report-2023.pdfThe-Customer-Data-Platform-Report-2023.pdf
The-Customer-Data-Platform-Report-2023.pdf
 
Marketing on Amazon
Marketing on AmazonMarketing on Amazon
Marketing on Amazon
 
Digital transformation in transport and logistics
Digital transformation in transport and logisticsDigital transformation in transport and logistics
Digital transformation in transport and logistics
 
LinkedIn Career Services Webinar Presentation Slides
LinkedIn Career Services Webinar Presentation SlidesLinkedIn Career Services Webinar Presentation Slides
LinkedIn Career Services Webinar Presentation Slides
 
How to implement Content Marketing Strategy in a large B2B enterprise
How to implement Content Marketing Strategy in a large B2B enterpriseHow to implement Content Marketing Strategy in a large B2B enterprise
How to implement Content Marketing Strategy in a large B2B enterprise
 
Double Click for Advertisers
Double Click for AdvertisersDouble Click for Advertisers
Double Click for Advertisers
 

Similar to Acme data engineering case study

Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Romit Mehta
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
Amazon Web Services
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
DataStax
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
webuploader
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
Jeffrey T. Pollock
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
Deepak Chandramouli
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
VMware Tanzu
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Grega Kespret
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
MongoDB
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
HostedbyConfluent
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
 

Similar to Acme data engineering case study (20)

Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
Designing a Feedback Loop for Event-Driven Data Sharing With Teresa Wang | Cu...
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 

Recently uploaded

Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 

Recently uploaded (20)

Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 

Acme data engineering case study

  • 2. Agenda  Problem Statement  Specs  Design Decisions/Criteria  Target State Architecture Diagram  Conceptual Data Model  Technical Rationalization
  • 3. Problem Statement Challenges in getting insights in expected SLA across the business due to lack of a unified data fabric across the multiple data sources. This includes :  On premise enterprise data warehouse, databases  Media publishers on web, other channels  Other 3rd party sources
  • 4. Specs  Volume of data – 40k events/s, 100GB+ daily records  High variability in data volume - spikes, bursts  Real time, interactive analysis of very large Media activity data sets unified with Publisher data  On premise Teradata DW Publisher data, frequent adds/inserts, infrequent updates  BI Reporting Layer data sources - Big Data OLAP, Teradata Data Marts  24/7 availability, ensure data quality
  • 5. Design Decisions/Criteria  Lambda Architecture, ETL+ELT, Hybrid Cloud  Separation of concerns, layers :  Extract  Load, Processing  Reporting  Solution evolves as technology changes  Key guiding principles :  Agility  Performance  Scalability  Reliability  Security  Maintainability  Cost
  • 6. Target State Architecture Media Website1 Media Website2 Publishe r3 Teradata EDW – Data Marts On Premise Extract • Python scripts • Alteryx, Fivetran, Stitch Load • AWS S3 • Google • Bucket • Azure Blob? Transform • AWS Athena • Google • BigQuery • Azure HDInsights? Data Sources Cloud Reporting/ Visualization • Tableau • Qlik • Power BI? Full Load one time – script sql client read data, write parquet Serverless DW, Very fast OLAP Rest Api endpoints Azure HDInsights is not Serverless Facts, Dimensions Star schema Power BI No native connector for Athena, BQ Incremental – script sql client determine change, write arrow Incremental Transform Pipelines Full Load Load Pipelines Orchestration Layer – Airflow – Pipeline interaction with Load, Transform Kafka Event Layer Spark Streaming Spark Batch Load Job to OLAP OLAP – Facts, Measures processing Jenkins, Docker CI/CD for Pipelines ETL – Full (One time), Incremental (CDC) Enrich, Cook, Aggregate EDW Data with OLAP Publish Data Extracts, Cubes to Tableau, Qlik
  • 7. Conceptual Data Model Dim_Date Dim_Author Dim_Article Dim_Publisher Dim_Contract_Terms Conditions Fact_Activity_Aggregate_Daily Dim_Contract Pre computed aggregate per author, article, publisher. Could serve multiple queries Very high volume table, Date partition Could be per Publisher and per Author Activity description Fact_Activity Number of articles read Number of downloads Publisher name Number of shares Duration DollarsSCD Type 2 Fact_Revenue Fact_Cost Fast Changing Dimension Confirmed Dimensions Revenue could be derived from contract revenue and activity
  • 8. Technical Rationalization, Factors Data Volume, Variety, Veracity, Frequency Measures, Grain, Slice, Processing volume Query types, SLAs, Dashboards, Metrics Cost – Compute, Storage, Network, hosted infra Flexible, Scalable, Maintainable, Reliable, Secure Hybrid approach - Cloud + OnPrem Cloud for high volume external, on premise for Enterprise DW Appropriate partitions, tuning Very fast MPP columnar OLAP Athena, BigQuery Serverless DW minimize infra work Use CI/CD practices, Jenkins, Docker for Deploy ETL, ELT Layer – Orchestration (Airflow) +Pipeline Tool(Stitch) + Storage(S3, Google Bucket) + Transform (Spark) + OLAP (BigQuery, Athena) + EDW External data sources for media activity - Very high volume, bursty, spike Internal data source for Enterprise entities, source of truth Analyze query requirements. Data Model Use of partitions, keys, Fact, Dim tables Various enrichments, cooked data, aggregations in OLAP, EDW Full Load one time, Incremental Load on going Storage layer as Data Lake Data movement within cloud infra (Storage to Transform) Faster, easier, maintainable pipeline development Establish Data Quality Assurance, Use Test framework such as DBT BI Reporting Layer - Tableau, Qlik Data source – OLAP, EDW, Data Extracts Star, Snowflake schema to support Ad Hoc, Canned Cloud based Native connectors for Athena, BigQuery SQL queries. EDW, DataExtracts data