SlideShare a Scribd company logo
1 of 37
Download to read offline
Google BigData solution
Su plus Hu @ GCPUG.TW
Simon Su
var simon = {};
simon.aboutme = 'http://about.me/peihsinsu';
simon.nodejs = ‘http://opennodes.arecord.us';
simon.googleshare = 'http://gappsnews.blogspot.tw'
simon.nodejsblog = ‘http://nodejs-in-example.blogspot.tw';
simon.blog = ‘http://peihsinsu.blogspot.com';
simon.slideshare = ‘http://slideshare.net/peihsinsu/';
simon.email = ‘simonsu.mail@gmail.com’;
simon.say(‘Good luck to everybody!');
Sunny Hu
var sunny = {};
sunny.aboutme = 'https://plus.google.com/u/0/+sunnyHU/posts';
sunny.email = sunnyhu@linkernetworks.com.’;
sunny.language =[‘Java’,’.NET’,’NodeJS’,’SQL’ ]
sunny.skill = [ ‘Project management’,
’System Analysis’,
’System design’,
’Car ho lan’ ]
sunny.say(‘寫code太苦悶,心情要sunny');
GCP Qualified Developer
● We are “舒” “服” 二人組 ...
● This is Su Hu style ...
https://www.facebook.com/groups/GCPUG.TW/
https://plus.google.com/u/0/communities/116100913832589966421
Google Cloud Platform User Group Taiwan
我們是Google Cloud Platform Taiwan User Group。在Google雲端服務在台灣地區展
露頭角之後,有許多新的服務、新的知識、新的創意,歡迎大家一起分享,一起了解
Google雲端服務...
GCPUG透過網際網路串聯喜好Google Cloud的使用者,分享與交流使用GCP的點滴
鑑驗。如果您是Google Cloud Platform的初學者,您應該來聽聽前輩們的使用經驗;如
果您是Google Cloud Platform的Expert,您應該來分享一下寶貴的經驗,並與更多高
手互相交流;如果您還沒開始用Google Cloud Platform,那麼您應該馬上來聽聽我們
是怎麼使用Google Cloud的!
Linker Want You...
● Data scientist
● Data engineer
● Frontend engineer
每分鐘上傳到YouTube的影片長度?
Google search 的 index 有多大?
Google有多少有效的使用者?
72 hours
425M+
100PB+ (over 100,000 TBs)
0.25 seconds
Google 需要平均回應客戶搜索關鍵字的時間?
Management MobileCompute
Networking
Big Data
Storage
Developer
Tools
SpannerDremelMapReduce
Big Table Colossus
2012 20132002 2004 2006 2008 2010
GFS MillWheel
Flume
Google innovation
Provided as a managed services ...
SpannerDremelMapReduce
Big Table Colossus
2012 20132002 2004 2006 2008 2010
GFS MillWheel
Flume
Google Changed the Big Data Market
Google
MapReduce
Google
Bigtable
Google
Borg
Google
Borg
Google
Dremel
StoreCapture Analyze
BigQuery Larger
Hadoop
Ecosystem
Hadoop
Spark
(on GCE)
Pub/Sub
BigQuery streaming
Process
Dataflow
(stream & batch)
Cloud Storage
(objects)
BigQuery Storage
(structured)
Hadoop
Spark (on GCE)
Big Data on Google Cloud Platform
Bigdata Scenario
1M
Devices
16.6K
Events/sec
16.6K
Events/sec
43B
Events/month
Cloud Pub/Sub
Publisher A Publisher B Publisher C
Message 1
Topic A Topic B Topic C
Subscription XA Subscription XB
Subscription
YC
Subscription
ZC
Cloud
Pub/Sub
Subscriber X Subscriber Y
Message 2 Message 3
Subscriber Z
Message 1
Message 2
Message 3
Message 3
● Globally redundant
● Low latency (sub sec.)
● N to N coupling
● Batched read/write
● Push & Pull
● Guaranteed Delivery
● Auto expiration
Cloud Dataflow = Managed Flume + MillWheel on GCE
Dataflow use case
• Movement
• Filtering
• Enrichment
• Shaping
• Reduction
• Batch computation
• Continuous
computation
• Composition
• External
orchestration
• Simulation
OrchestrationAnalysisETL
<- Aggregations, Filters, Joins, ...
<- Completeness
Pipeline{
Who => Inputs
What => Transforms
Where => Windows
When => Watermarks + Triggers
To => Outputs
}
Transform
Output
Input
Cloud Dataflow SDK - Logic model
Life of Pipeline
GCP
Managed Service
User Code & SDK
Work Manager
Deploy & Schedule
Monitoring UI
Job Manager
Progress & Logs
Cloud Dataflow SDK
❯ Unified programming model for both batch & stream processing
● Independent from the execution back-end aka “runner”
❯ Google driven & open sourced
● Java 7 or 8 @ github.com/GoogleCloudPlatform/DataflowJavaSDK
● Python
❯ Community sourced
● Scala @ github.com/darkjh/scalaflow
● Scala @ github.com/jhlch/scala-dataflow-dsl
Pipeline
● A Direct Acyclic Graph of data processing
transformations
● Can be submitted to the Dataflow Service for
optimization and execution or executed on an
alternate runner e.g. Spark
● May include multiple inputs and multiple outputs
● May encompass many logical MapReduce
operations
● PCollections flow through the pipeline
Your
Source/Sink
Here
❯ Read from standard Google Cloud Platform
data sources
• GCS, Pub/Sub, BigQuery, Datastore
❯ Write your own custom source by teaching
Dataflow how to read it in parallel
• Currently for bounded sources only
❯ Write to GCS, BigQuery, Pub/Sub
• More coming…
❯ Can use a combination of text, JSON, XML,
Avro formatted data
Inputs & Outputs
PCollection
❯ A collection of data of type T in a pipeline
- PCollection<K,V>
❯ Maybe be either bounded or unbounded
in size
❯ Created by using a PTransform to:
• Build from a java.util.Collection
• Read from a backing data store
• Transform an existing PCollection
❯ Often contain the key-value pairs using
KV
{Seahawks, NFC, Champions, Seattle,
...}
{...,
“NFC Champions #GreenBay”,
“Green Bay #superbowl!”,
...
“#GoHawks”,
...}
● A step, or a processing operation that transforms data
○ convert format , group , filter data
● Type of Transforms
○ ParDo
○ GroupByKey
○ Combine
○ Flatten
■ Multiple PCollection objects that contain the same data type, you can
merge them into a single logical PCollection using the Flatten transform
Transforms
❯ Processes each element of a PCollection
independently using a user-provided DoFn
❯ Corresponds to both the Map and Reduce
phases in Hadoop i.e. ParDo->GBK->ParDo
❯ Useful for
○ Filtering a data set.
○ Formatting or converting the type of each
element in a data set.
○ Extracting parts of each element in a data set.
○ Performing computations on each element in a
data set.
Pardo (Parallel do)
{Seahawks, NFC, Champions, Seattle, ...}
{
KV<S, Seahawks>,
KV<C,Champions>,
<KV<S, Seattle>,
KV<N, NFC>, …
}
KeyBySessionId
Map
Shuffle
Reduce
ParDo
GroupByKey
ParDo
Wait a minute…
How do you do a GroupByKey on an unbounded PCollection?
{KV<S, Seahawks>, KV<C,Champions>,
<KV<S, Seattle>, KV<N, NFC>, ...}
{KV<S, Seahawks>, KV<C,Champions>,
<KV<S, Seattle>, KV<N, NFC>, ...}
GroupByKey
• Takes a PCollection of key-value pairs
and gathers up all values with the same
key
• Corresponds to the shuffle phase in
Hadoop
{KV<S, {Seahawks, Seattle, …},
KV<N, {NFC, …}
KV<C, {Champion, …}}
Group by key
Dataflow in Advance
Windowing
● Triggers control when
results are emitted.
● Triggers are often relative
to the watermark
Trigger
http://cdn.oreillystatic.com/en/assets/1/event/155/Watermarks_%20Time%20and%20progress%20in
%20streaming%20dataflow%20and%20beyond%20Presentation.pdf
Composite Transform
● Code reuse
● Better monitoring experience
Benifits of Cloud Dataflow
● Functional (transform based) programming model
● Unified programming model for batch & stream processing
● Reduced operational cost of “cluster” management
● Decreased job clock time via platform innovation
● Open source ecosystem of SDKs, extensions, runners,
etc.
Optimizing Your Time
Programming
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
growing scale
Utilization
improvements
Typical Data Processing
More time to dig
into your data
Programming
Data Processing with Cloud Dataflow
Run the same code in multiple modes using different runners
❯ Direct Runner
• For local, in-memory execution.
• Great for developing and unit tests
❯ Cloud Dataflow Service Runner
• Runs on the fully-manage Dataflow Service
• Your code runs distributed across GCE instances
❯ Community sourced
• Spark runner @ github.com/cloudera/spark-dataflow
• Flink runner from dataArtisans
Cloud DataFlow Runners
Build a mobile gaming analytics platform
Q&A

More Related Content

What's hot

From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composerBruce Kuo
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
 
Terraforming the Kubernetes Land
Terraforming the Kubernetes LandTerraforming the Kubernetes Land
Terraforming the Kubernetes LandRadek Simko
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Sadayuki Furuhashi
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howAltinity Ltd
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doMetehan Çetinkaya
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Ltd
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Altinity Ltd
 
Google Cloud Platform Special Training
Google Cloud Platform Special TrainingGoogle Cloud Platform Special Training
Google Cloud Platform Special TrainingSimon Su
 
Google Compute Engine Starter Guide
Google Compute Engine Starter GuideGoogle Compute Engine Starter Guide
Google Compute Engine Starter GuideSimon Su
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Big Data Spain
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot InstancesAmazon Web Services
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Big Data Spain
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 

What's hot (20)

From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composer
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
Terraforming the Kubernetes Land
Terraforming the Kubernetes LandTerraforming the Kubernetes Land
Terraforming the Kubernetes Land
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.do
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
 
Google Cloud Platform Special Training
Google Cloud Platform Special TrainingGoogle Cloud Platform Special Training
Google Cloud Platform Special Training
 
Google Compute Engine Starter Guide
Google Compute Engine Starter GuideGoogle Compute Engine Starter Guide
Google Compute Engine Starter Guide
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 

Viewers also liked

Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Denodo
 
Lianjia data infrastructure, Yi Lyu
Lianjia data infrastructure, Yi LyuLianjia data infrastructure, Yi Lyu
Lianjia data infrastructure, Yi Lyu毅 吕
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)Nicolas Kourtellis
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...Nicolas Kourtellis
 
Callcenter HPE IDOL overview
Callcenter HPE IDOL overviewCallcenter HPE IDOL overview
Callcenter HPE IDOL overviewTania Akinina
 
ANTS - 360 view of your customer - bigdata innovation summit 2016
ANTS - 360 view of your customer - bigdata innovation summit 2016ANTS - 360 view of your customer - bigdata innovation summit 2016
ANTS - 360 view of your customer - bigdata innovation summit 2016Dinh Le Dat (Kevin D.)
 
クラウドを活用した自由自在なデータ分析
クラウドを活用した自由自在なデータ分析クラウドを活用した自由自在なデータ分析
クラウドを活用した自由自在なデータ分析aiichiro
 
Oxalide MorningTech #1 - BigData
Oxalide MorningTech #1 - BigDataOxalide MorningTech #1 - BigData
Oxalide MorningTech #1 - BigDataLudovic Piot
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...Inhacking
 
Brocade - Stingray Application Firewall
Brocade - Stingray Application FirewallBrocade - Stingray Application Firewall
Brocade - Stingray Application FirewallSimon Su
 
GCPNext17' Extend 開始GCP了嗎?
GCPNext17' Extend   開始GCP了嗎?GCPNext17' Extend   開始GCP了嗎?
GCPNext17' Extend 開始GCP了嗎?Simon Su
 
Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧Simon Su
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateSimon Su
 
中華電信 教育訓練
中華電信 教育訓練中華電信 教育訓練
中華電信 教育訓練謝 宗穎
 
GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論Simon Su
 
Developer team review of 2014
Developer team review of 2014Developer team review of 2014
Developer team review of 2014Caesar Chi
 
技術單兵作戰及團隊開發流程差異
技術單兵作戰及團隊開發流程差異技術單兵作戰及團隊開發流程差異
技術單兵作戰及團隊開發流程差異Caesar Chi
 
html5 & phonegap
html5 & phonegaphtml5 & phonegap
html5 & phonegapCaesar Chi
 
Google Cloud Platform 2014Q4
Google Cloud Platform 2014Q4Google Cloud Platform 2014Q4
Google Cloud Platform 2014Q4Simon Su
 
中原大學 Shift to cloud
中原大學   Shift to cloud中原大學   Shift to cloud
中原大學 Shift to cloudSimon Su
 

Viewers also liked (20)

Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
 
Lianjia data infrastructure, Yi Lyu
Lianjia data infrastructure, Yi LyuLianjia data infrastructure, Yi Lyu
Lianjia data infrastructure, Yi Lyu
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
 
Callcenter HPE IDOL overview
Callcenter HPE IDOL overviewCallcenter HPE IDOL overview
Callcenter HPE IDOL overview
 
ANTS - 360 view of your customer - bigdata innovation summit 2016
ANTS - 360 view of your customer - bigdata innovation summit 2016ANTS - 360 view of your customer - bigdata innovation summit 2016
ANTS - 360 view of your customer - bigdata innovation summit 2016
 
クラウドを活用した自由自在なデータ分析
クラウドを活用した自由自在なデータ分析クラウドを活用した自由自在なデータ分析
クラウドを活用した自由自在なデータ分析
 
Oxalide MorningTech #1 - BigData
Oxalide MorningTech #1 - BigDataOxalide MorningTech #1 - BigData
Oxalide MorningTech #1 - BigData
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
 
Brocade - Stingray Application Firewall
Brocade - Stingray Application FirewallBrocade - Stingray Application Firewall
Brocade - Stingray Application Firewall
 
GCPNext17' Extend 開始GCP了嗎?
GCPNext17' Extend   開始GCP了嗎?GCPNext17' Extend   開始GCP了嗎?
GCPNext17' Extend 開始GCP了嗎?
 
Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News Update
 
中華電信 教育訓練
中華電信 教育訓練中華電信 教育訓練
中華電信 教育訓練
 
GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論
 
Developer team review of 2014
Developer team review of 2014Developer team review of 2014
Developer team review of 2014
 
技術單兵作戰及團隊開發流程差異
技術單兵作戰及團隊開發流程差異技術單兵作戰及團隊開發流程差異
技術單兵作戰及團隊開發流程差異
 
html5 & phonegap
html5 & phonegaphtml5 & phonegap
html5 & phonegap
 
Google Cloud Platform 2014Q4
Google Cloud Platform 2014Q4Google Cloud Platform 2014Q4
Google Cloud Platform 2014Q4
 
中原大學 Shift to cloud
中原大學   Shift to cloud中原大學   Shift to cloud
中原大學 Shift to cloud
 

Similar to GCPUG meetup 201610 - Dataflow Introduction

Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery ConnectorHadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery ConnectorSimon Su
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
How Bitbucket Pipelines Loads Connect UI Assets Super-fast
How Bitbucket Pipelines Loads Connect UI Assets Super-fastHow Bitbucket Pipelines Loads Connect UI Assets Super-fast
How Bitbucket Pipelines Loads Connect UI Assets Super-fastAtlassian
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowDaniel Zivkovic
 
How We Build NG-MY Websites: Performance, SEO, CI, CD (Thai version)
How We Build NG-MY Websites: Performance, SEO, CI, CD (Thai version)How We Build NG-MY Websites: Performance, SEO, CI, CD (Thai version)
How We Build NG-MY Websites: Performance, SEO, CI, CD (Thai version)Seven Peaks Speaks
 
How We Build NG-MY Websites: Performance, SEO, CI, CD
How We Build NG-MY Websites: Performance, SEO, CI, CDHow We Build NG-MY Websites: Performance, SEO, CI, CD
How We Build NG-MY Websites: Performance, SEO, CI, CDSeven Peaks Speaks
 
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...Databricks
 
Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolatsliwowicz
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)wqchen
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Jonathan Felch
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Fastly
 
Pyramid Lighter/Faster/Better web apps
Pyramid Lighter/Faster/Better web appsPyramid Lighter/Faster/Better web apps
Pyramid Lighter/Faster/Better web appsDylan Jay
 
Step by Step Personal Drive to One Drive Migration using SPMT
Step by Step Personal Drive to One Drive Migration using SPMTStep by Step Personal Drive to One Drive Migration using SPMT
Step by Step Personal Drive to One Drive Migration using SPMTIT Industry
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysCAPSiDE
 
Everything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the WebEverything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the WebJames Rakich
 

Similar to GCPUG meetup 201610 - Dataflow Introduction (20)

Hadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery ConnectorHadoop Conf 2014 - Hadoop BigQuery Connector
Hadoop Conf 2014 - Hadoop BigQuery Connector
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
How Bitbucket Pipelines Loads Connect UI Assets Super-fast
How Bitbucket Pipelines Loads Connect UI Assets Super-fastHow Bitbucket Pipelines Loads Connect UI Assets Super-fast
How Bitbucket Pipelines Loads Connect UI Assets Super-fast
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
 
How We Build NG-MY Websites: Performance, SEO, CI, CD (Thai version)
How We Build NG-MY Websites: Performance, SEO, CI, CD (Thai version)How We Build NG-MY Websites: Performance, SEO, CI, CD (Thai version)
How We Build NG-MY Websites: Performance, SEO, CI, CD (Thai version)
 
How We Build NG-MY Websites: Performance, SEO, CI, CD
How We Build NG-MY Websites: Performance, SEO, CI, CDHow We Build NG-MY Websites: Performance, SEO, CI, CD
How We Build NG-MY Websites: Performance, SEO, CI, CD
 
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
 
Xephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backendsXephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backends
 
Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboola
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge
 
Pyramid Lighter/Faster/Better web apps
Pyramid Lighter/Faster/Better web appsPyramid Lighter/Faster/Better web apps
Pyramid Lighter/Faster/Better web apps
 
Step by Step Personal Drive to One Drive Migration using SPMT
Step by Step Personal Drive to One Drive Migration using SPMTStep by Step Personal Drive to One Drive Migration using SPMT
Step by Step Personal Drive to One Drive Migration using SPMT
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
Everything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the WebEverything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the Web
 

More from Simon Su

Kubernetes Basic Operation
Kubernetes Basic OperationKubernetes Basic Operation
Kubernetes Basic OperationSimon Su
 
Google IoT Core 初體驗
Google IoT Core 初體驗Google IoT Core 初體驗
Google IoT Core 初體驗Simon Su
 
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoTJSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoTSimon Su
 
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務Simon Su
 
GCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage GuideGCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage GuideSimon Su
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud SpannerSimon Su
 
Google Cloud Monitoring
Google Cloud MonitoringGoogle Cloud Monitoring
Google Cloud MonitoringSimon Su
 
JCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop SetupJCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop SetupSimon Su
 
IThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsIThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsSimon Su
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Simon Su
 
GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)Simon Su
 
Google Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile SolutionsGoogle Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile SolutionsSimon Su
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)Simon Su
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)Simon Su
 
GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧Simon Su
 
CouchDB Getting Start
CouchDB Getting StartCouchDB Getting Start
CouchDB Getting StartSimon Su
 
Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明Simon Su
 

More from Simon Su (17)

Kubernetes Basic Operation
Kubernetes Basic OperationKubernetes Basic Operation
Kubernetes Basic Operation
 
Google IoT Core 初體驗
Google IoT Core 初體驗Google IoT Core 初體驗
Google IoT Core 初體驗
 
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoTJSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
 
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
 
GCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage GuideGCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage Guide
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud Spanner
 
Google Cloud Monitoring
Google Cloud MonitoringGoogle Cloud Monitoring
Google Cloud Monitoring
 
JCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop SetupJCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop Setup
 
IThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsIThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOps
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
 
GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)
 
Google Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile SolutionsGoogle Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile Solutions
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
 
GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧
 
CouchDB Getting Start
CouchDB Getting StartCouchDB Getting Start
CouchDB Getting Start
 
Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

GCPUG meetup 201610 - Dataflow Introduction

  • 1. Google BigData solution Su plus Hu @ GCPUG.TW
  • 2. Simon Su var simon = {}; simon.aboutme = 'http://about.me/peihsinsu'; simon.nodejs = ‘http://opennodes.arecord.us'; simon.googleshare = 'http://gappsnews.blogspot.tw' simon.nodejsblog = ‘http://nodejs-in-example.blogspot.tw'; simon.blog = ‘http://peihsinsu.blogspot.com'; simon.slideshare = ‘http://slideshare.net/peihsinsu/'; simon.email = ‘simonsu.mail@gmail.com’; simon.say(‘Good luck to everybody!');
  • 3. Sunny Hu var sunny = {}; sunny.aboutme = 'https://plus.google.com/u/0/+sunnyHU/posts'; sunny.email = sunnyhu@linkernetworks.com.’; sunny.language =[‘Java’,’.NET’,’NodeJS’,’SQL’ ] sunny.skill = [ ‘Project management’, ’System Analysis’, ’System design’, ’Car ho lan’ ] sunny.say(‘寫code太苦悶,心情要sunny'); GCP Qualified Developer
  • 4. ● We are “舒” “服” 二人組 ... ● This is Su Hu style ...
  • 5. https://www.facebook.com/groups/GCPUG.TW/ https://plus.google.com/u/0/communities/116100913832589966421 Google Cloud Platform User Group Taiwan 我們是Google Cloud Platform Taiwan User Group。在Google雲端服務在台灣地區展 露頭角之後,有許多新的服務、新的知識、新的創意,歡迎大家一起分享,一起了解 Google雲端服務... GCPUG透過網際網路串聯喜好Google Cloud的使用者,分享與交流使用GCP的點滴 鑑驗。如果您是Google Cloud Platform的初學者,您應該來聽聽前輩們的使用經驗;如 果您是Google Cloud Platform的Expert,您應該來分享一下寶貴的經驗,並與更多高 手互相交流;如果您還沒開始用Google Cloud Platform,那麼您應該馬上來聽聽我們 是怎麼使用Google Cloud的!
  • 6. Linker Want You... ● Data scientist ● Data engineer ● Frontend engineer
  • 7. 每分鐘上傳到YouTube的影片長度? Google search 的 index 有多大? Google有多少有效的使用者? 72 hours 425M+ 100PB+ (over 100,000 TBs) 0.25 seconds Google 需要平均回應客戶搜索關鍵字的時間?
  • 9. SpannerDremelMapReduce Big Table Colossus 2012 20132002 2004 2006 2008 2010 GFS MillWheel Flume Google innovation
  • 10. Provided as a managed services ... SpannerDremelMapReduce Big Table Colossus 2012 20132002 2004 2006 2008 2010 GFS MillWheel Flume
  • 11. Google Changed the Big Data Market Google MapReduce Google Bigtable Google Borg Google Borg Google Dremel
  • 12. StoreCapture Analyze BigQuery Larger Hadoop Ecosystem Hadoop Spark (on GCE) Pub/Sub BigQuery streaming Process Dataflow (stream & batch) Cloud Storage (objects) BigQuery Storage (structured) Hadoop Spark (on GCE) Big Data on Google Cloud Platform
  • 14. Cloud Pub/Sub Publisher A Publisher B Publisher C Message 1 Topic A Topic B Topic C Subscription XA Subscription XB Subscription YC Subscription ZC Cloud Pub/Sub Subscriber X Subscriber Y Message 2 Message 3 Subscriber Z Message 1 Message 2 Message 3 Message 3 ● Globally redundant ● Low latency (sub sec.) ● N to N coupling ● Batched read/write ● Push & Pull ● Guaranteed Delivery ● Auto expiration
  • 15. Cloud Dataflow = Managed Flume + MillWheel on GCE
  • 16. Dataflow use case • Movement • Filtering • Enrichment • Shaping • Reduction • Batch computation • Continuous computation • Composition • External orchestration • Simulation OrchestrationAnalysisETL
  • 17. <- Aggregations, Filters, Joins, ... <- Completeness Pipeline{ Who => Inputs What => Transforms Where => Windows When => Watermarks + Triggers To => Outputs } Transform Output Input Cloud Dataflow SDK - Logic model
  • 18. Life of Pipeline GCP Managed Service User Code & SDK Work Manager Deploy & Schedule Monitoring UI Job Manager Progress & Logs
  • 19. Cloud Dataflow SDK ❯ Unified programming model for both batch & stream processing ● Independent from the execution back-end aka “runner” ❯ Google driven & open sourced ● Java 7 or 8 @ github.com/GoogleCloudPlatform/DataflowJavaSDK ● Python ❯ Community sourced ● Scala @ github.com/darkjh/scalaflow ● Scala @ github.com/jhlch/scala-dataflow-dsl
  • 20. Pipeline ● A Direct Acyclic Graph of data processing transformations ● Can be submitted to the Dataflow Service for optimization and execution or executed on an alternate runner e.g. Spark ● May include multiple inputs and multiple outputs ● May encompass many logical MapReduce operations ● PCollections flow through the pipeline
  • 21. Your Source/Sink Here ❯ Read from standard Google Cloud Platform data sources • GCS, Pub/Sub, BigQuery, Datastore ❯ Write your own custom source by teaching Dataflow how to read it in parallel • Currently for bounded sources only ❯ Write to GCS, BigQuery, Pub/Sub • More coming… ❯ Can use a combination of text, JSON, XML, Avro formatted data Inputs & Outputs
  • 22. PCollection ❯ A collection of data of type T in a pipeline - PCollection<K,V> ❯ Maybe be either bounded or unbounded in size ❯ Created by using a PTransform to: • Build from a java.util.Collection • Read from a backing data store • Transform an existing PCollection ❯ Often contain the key-value pairs using KV {Seahawks, NFC, Champions, Seattle, ...} {..., “NFC Champions #GreenBay”, “Green Bay #superbowl!”, ... “#GoHawks”, ...}
  • 23. ● A step, or a processing operation that transforms data ○ convert format , group , filter data ● Type of Transforms ○ ParDo ○ GroupByKey ○ Combine ○ Flatten ■ Multiple PCollection objects that contain the same data type, you can merge them into a single logical PCollection using the Flatten transform Transforms
  • 24. ❯ Processes each element of a PCollection independently using a user-provided DoFn ❯ Corresponds to both the Map and Reduce phases in Hadoop i.e. ParDo->GBK->ParDo ❯ Useful for ○ Filtering a data set. ○ Formatting or converting the type of each element in a data set. ○ Extracting parts of each element in a data set. ○ Performing computations on each element in a data set. Pardo (Parallel do) {Seahawks, NFC, Champions, Seattle, ...} { KV<S, Seahawks>, KV<C,Champions>, <KV<S, Seattle>, KV<N, NFC>, … } KeyBySessionId
  • 26. Wait a minute… How do you do a GroupByKey on an unbounded PCollection? {KV<S, Seahawks>, KV<C,Champions>, <KV<S, Seattle>, KV<N, NFC>, ...} {KV<S, Seahawks>, KV<C,Champions>, <KV<S, Seattle>, KV<N, NFC>, ...} GroupByKey • Takes a PCollection of key-value pairs and gathers up all values with the same key • Corresponds to the shuffle phase in Hadoop {KV<S, {Seahawks, Seattle, …}, KV<N, {NFC, …} KV<C, {Champion, …}} Group by key
  • 27.
  • 30. ● Triggers control when results are emitted. ● Triggers are often relative to the watermark Trigger http://cdn.oreillystatic.com/en/assets/1/event/155/Watermarks_%20Time%20and%20progress%20in %20streaming%20dataflow%20and%20beyond%20Presentation.pdf
  • 31.
  • 32. Composite Transform ● Code reuse ● Better monitoring experience
  • 33. Benifits of Cloud Dataflow ● Functional (transform based) programming model ● Unified programming model for batch & stream processing ● Reduced operational cost of “cluster” management ● Decreased job clock time via platform innovation ● Open source ecosystem of SDKs, extensions, runners, etc.
  • 34. Optimizing Your Time Programming Resource provisioning Performance tuning Monitoring Reliability Deployment & configuration Handling growing scale Utilization improvements Typical Data Processing More time to dig into your data Programming Data Processing with Cloud Dataflow
  • 35. Run the same code in multiple modes using different runners ❯ Direct Runner • For local, in-memory execution. • Great for developing and unit tests ❯ Cloud Dataflow Service Runner • Runs on the fully-manage Dataflow Service • Your code runs distributed across GCE instances ❯ Community sourced • Spark runner @ github.com/cloudera/spark-dataflow • Flink runner from dataArtisans Cloud DataFlow Runners
  • 36. Build a mobile gaming analytics platform
  • 37. Q&A