Submit Search
Upload
Big data processing with PubSub, Dataflow, and BigQuery
•
2 likes
•
1,197 views
T
Thuyen Ho
Follow
Presented at Google Developer Group Vietnam 2018
Read less
Read more
Data & Analytics
Report
Share
Report
Share
1 of 30
Download now
Download to read offline
Recommended
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
Gera Shegalov
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Rajit Saha
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
Murtaza Doctor
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
confluent
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR Technologies
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
HostedbyConfluent
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
Databricks
Recommended
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
Gera Shegalov
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Rajit Saha
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
Murtaza Doctor
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
confluent
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR Technologies
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
HostedbyConfluent
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
Databricks
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
confluent
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4j
GraphRM
IoT at Google Scale
IoT at Google Scale
James Chittenden
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
DataWorks Summit
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
kbajda
Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices
confluent
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
InfluxData
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
SingleStore
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage
Jedha Bootcamp
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
SingleStore
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
DataWorks Summit
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Software Guru
Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
More Related Content
What's hot
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
confluent
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4j
GraphRM
IoT at Google Scale
IoT at Google Scale
James Chittenden
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
DataWorks Summit
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
kbajda
Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices
confluent
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
InfluxData
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
SingleStore
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage
Jedha Bootcamp
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
SingleStore
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
DataWorks Summit
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
SingleStore
What's hot
(20)
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4j
IoT at Google Scale
IoT at Google Scale
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Gain Deep Visibility into APIs and Integrations with Anypoint Monitoring
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Five ways database modernization simplifies your data life
Five ways database modernization simplifies your data life
Similar to Big data processing with PubSub, Dataflow, and BigQuery
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Software Guru
Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Rittman Analytics
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
Attunity
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
Gleb Otochkin
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Amazon Web Services
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
HostedbyConfluent
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
HostedbyConfluent
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Databricks
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
SingleStore
How Financial Services can Save On File Storage
How Financial Services can Save On File Storage
Charly Mostert
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
Kent Graziano
Veritas + MongoDB
Veritas + MongoDB
MongoDB
Datenvirtualisierung: Wie Sie Ihre Datenarchitektur agiler machen (German)
Datenvirtualisierung: Wie Sie Ihre Datenarchitektur agiler machen (German)
Denodo
Intro to Delta Lake
Intro to Delta Lake
Databricks
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
Kimmo Kantojärvi
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
Abdelkrim Hadjidj
Similar to Big data processing with PubSub, Dataflow, and BigQuery
(20)
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Zero to Snowflake Presentation
Zero to Snowflake Presentation
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
How Financial Services can Save On File Storage
How Financial Services can Save On File Storage
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
Veritas + MongoDB
Veritas + MongoDB
Datenvirtualisierung: Wie Sie Ihre Datenarchitektur agiler machen (German)
Datenvirtualisierung: Wie Sie Ihre Datenarchitektur agiler machen (German)
Intro to Delta Lake
Intro to Delta Lake
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
Recently uploaded
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
Bisnar Chase Personal Injury Attorneys
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
Jon Hansen
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Jack Cole
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
lward7
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Emmanuel Dauda
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
Payment Village
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
Alison Pitt
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentation
anshikakulshreshtha11
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
CEPTES Software Inc
basics of data science with application areas.pdf
basics of data science with application areas.pdf
vyankatesh1
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
Stephen266013
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
Easy and simple project file on mp online
Easy and simple project file on mp online
balibahu1313
Recently uploaded
(20)
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentation
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
basics of data science with application areas.pdf
basics of data science with application areas.pdf
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
Easy and simple project file on mp online
Easy and simple project file on mp online
Big data processing with PubSub, Dataflow, and BigQuery
1.
BIG DATA PROCESSING
WITH PUB/SUB, DATAFLOW AND BIGQUERY Thuyen Ho – Data Engineer @ KNOREX © 2018 KNOREX
2.
© 2018 KNOREX Established
in 2010, Knorex provides Precision Performance Marketing products and solutions to leading trading desks, agencies and brands. Offices and direct business presence across US, UK, Australia, China, India and Southeast Asia (SEA) ABOUT KNOREX 8 OFFICES 110+ STAFFS . . . . ....
3.
© 2018 KNOREX 3 PROBLEM
STATEMENT Ingest large volume of streaming user data, transform based on ever changing parameters, and store them in a database in real time. This data will be used for 2 purpose: 1. Targeting users in real time for advertising campaigns 2. Aggregation of data for estimation of campaign reach Third- party partner KNOREX DMP Ingest stream events • QPS: ~1500 - 2000 events • Event size: 50KB – 100KB • Data Volume: ~1TB a day Historical data • Reprocess: ~30TB each day • Aggregate: ~60TB each day
4.
© 2018 KNOREX 4 •
Quick Introduction To Pub/Sub, Dataflow and BigQuery • KNOREX Approach • Q&A AGENDA
5.
5 Quick Introduction To
Pub/Sub, Dataflow and BigQuery
6.
© 2018 KNOREX 6 SERVERLESS
STREAM PROCESSING PIPELINE WITH GCP Dataflow stream processing BigQuery analytics engine Data events Processed data Pub/Sub messaging queue
7.
© 2018 KNOREX 7 Cloud
Pub/Sub is an asynchronous messaging service designed to be highly reliable and scalable. CLOUD PUB/SUB
8.
© 2018 KNOREX 8 CLOUD
PUB/SUB – PULL SUBSCRIPTION
9.
© 2018 KNOREX 9 CLOUD
PUB/SUB – PUSH SUBSCRIPTION
10.
© 2018 KNOREX1 0 Lambda
architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods. (source: wikipedia.org) To balance: • Latency • Throughput • Fault-tolerance LAMBDA ARCHITECTURE
11.
© 2018 KNOREX1 1 DATA
PROCESSING - TRANSFORMS Storage Group Aggregate Filter Transform Input Data Output Data Data Processing
12.
© 2018 KNOREX1 2 Cloud
Dataflow is a fully-managed service, autoscaling execution environment for Beam pipelines. Beams supports the following language-specific SDKs: Java, Python and Go CLOUD DATAFLOW Implement batch and streaming data processing jobs that run on any execution engine. great execution environment
13.
© 2018 KNOREX1 3 BEAM
ABSTRACTIONS Storage Group Aggregate Filter Transform Input Data Output Data Data Processing Bounded / Unbounded PCollection PTransform PTransform PTransform PTransform Pipeline
14.
© 2018 KNOREX1 4 BEAM
- FIXED TIME WINDOWS 1 7 2 1 8 Unbounded events Processing time 3 8 6 3 5 3 8 8 2 4 2 1 9 3 7 30s window 0 00:00:00 00:00:30 00:01:00 00:01:30 30s window 1 30s window 2
15.
© 2018 KNOREX1 5 BEAM
– SLIDING TIME WINDOWS 1 7 2 1 8 Unbounded events Processing time 3 8 6 3 5 3 8 8 2 4 2 1 9 3 7 30s window 0 00:00:00 00:00:30 00:01:00 00:01:30 30s window 1 30s window 2
16.
© 2018 KNOREX1 6 BEAM
– SESSION WINDOWS 1 2 Processing time 2 4 7 window 0 00:00:00 00:00:30 00:01:00 00:01:30 window 1 window 2 7 4 2 2 2 2 2 2 2 4 4 4 Gap duration
17.
© 2018 KNOREX1 7 A
fast, highly scalable, cost-effective, and fully managed enterprise data warehouse for analytics. Some of the features: • Serverless • Real-time Analytics • Standard SQL • Storage and Compute Separation • Flexible Data Ingestion • Petabyte Scale CLOUD BIGQUERY
18.
© 2018 KNOREX1 8 BIGQUERY
STORAGE IS COLUMNAR Column1 Column2 Column3 Each column in sperate. No Indexes or key is required.
19.
© 2018 KNOREX1 9 INGESTION-TIME
PARTITIONED TABLE 19 Column1 Column2 Column3 SELECT Column1, Column2 FROM `database.table_name` WHERE PARTITIONDATE >= "2018-12-01" AND _PARTITIONDATE < "2018-12-03" 2018-12-01 00:00:00 2018-12-01 00:00:00 2018-12-02 00:00:00 2018-12-02 00:00:00 2018-12-02 00:00:00 2018-12-03 00:00:00 2018-12-03 00:00:00 _PARTITIONTIME 2018-12-01 2018-12-01 2018-12-02 2018-12-02 2018-12-02 2018-12-03 2018-12-03 _PARTITIONDATE
20.
© 2018 KNOREX2 0 INGESTION-TIME
PARTITIONED TABLE Column1 Column2 Column3 SELECT Column1, Column2 FROM `database.table_name` WHERE PARTITIONDATE >= "2018-12-01" AND _PARTITIONDATE < "2018-12-03" 2018-12-01 00:00:00 2018-12-01 00:00:00 2018-12-02 00:00:00 2018-12-02 00:00:00 2018-12-02 00:00:00 2018-12-03 00:00:00 2018-12-03 00:00:00 _PARTITIONTIME 2018-12-01 2018-12-01 2018-12-02 2018-12-02 2018-12-02 2018-12-03 2018-12-03 _PARTITIONDATE
21.
© 2018 KNOREX2 1 PARTITIONED
TABLE Column1 Column2 2018-12-01 2018-12-01 2018-12-02 2018-12-02 2018-12-02 2018-12-03 2018-12-03 Column3 Partitioned based on data in a specified TIMESTAMP or DATE column. SELECT Column1, Column2 FROM `database.table_name` WHERE Column3 >= "2018-12-01" AND Column3 < "2018-12-03"
22.
22 KNOREX APPROACH
23.
© 2018 KNOREX2 3 ARCHITECTURE
– STREAMING PIPELINE Third-Party partner Processing and analytics CMS & RTB engine API gateway Cloud Load Balancing Data warehouse BigQuery Sharding + Clustering Stream proc Cloud Dataflow Autoscaling API Compute Engine Autoscaling Audience Cloud Bigtable 3 regions CMS Cookie Cloud Pub/Sub Cookie topic Device Cloud Pub/Sub Device topic Segmented users Cloud Pub/Sub Device topic Python script Compute Engine Autoscaling Event ingest
24.
© 2018 KNOREX2 4 ARCHITECTURE
– EVENT INGEST GCE run code with auto-scaling instances. it receives 1500 events a sec from our partner. API endpoint will put events into two separate topics: cookie and device. Cloud Load Balancing API Compute Engine Autoscaling Cookie Cloud Pub/Sub Cookie topic Device Cloud Pub/Sub Device topic 1500 events a sec
25.
© 2018 KNOREX2 5 ARCHITECTURE
– PROCESSING AND ANALYTICS 25 Cloud Dataflow transforms and enriches raw events in real time and inserts both processed data into BigQuery as well as send them to RTB engine through Pub/Sub. Each region has a subscription to pull data from segment topic, then insert into BigTable. BigQuery is a warehouse for analytics. Tables are partitioned by ingestion time. It keep data in 60 days. Data warehouse BigQuery Partition + Clustering Stream proc Cloud Dataflow Autoscaling Cookie Cloud Pub/Sub Cookie topic Device Cloud Pub/Sub Device topic Segmented users Cloud Pub/Sub segment topic Asia region Compute Engine Cloud BigTable JP region Compute Engine Cloud BigTable US region Compute Engine Cloud BigTable CMS KNX RTB Engine
26.
© 2018 KNOREX2 6 ARCHITECTURE
– BATCH PIPELINE The Dataflow also takes data from BigQuery in the past 30 days and reprocess again in batch job. Cloud Dataflow batch processing BigQuery analytics engine Batch pipeline Batch loads BigQuery analytics engine Pub/Sub
27.
© 2018 KNOREX2 7 DATAFLOW
– PIPELINE VISUALIZATION
28.
28 Q&A
29.
29 Building Resilient Streaming
Systems Lab
30.
30 THANK YOU KNOR E
X.COM
Download now