SlideShare a Scribd company logo
Google Dataflow 小試
Simon Su @ LinkerNetworks
{Google Developer Expert}
https://goo.gl/xkANFT
var simon = {/** I am at GCPUG.TW **/};
simon.aboutme = 'http://about.me/peihsinsu';
simon.nodejs = ‘http://opennodes.arecord.us';
simon.googleshare = 'http://gappsnews.blogspot.tw'
simon.nodejsblog = ‘http://nodejs-in-example.blogspot.tw';
simon.blog = ‘http://peihsinsu.blogspot.com';
simon.slideshare = ‘http://slideshare.net/peihsinsu/';
simon.email = ‘simonsu.mail@gmail.com’;
simon.say(‘Good luck to everybody!');
I’m Simon Su...
What… I’m at Japan!!
https://www.facebook.com/groups/GCPUG.TW/
https://plus.google.com/u/0/communities/116100913832589966421
● Data scientist
● Data engineer
● Frontend engineer
Before Workshop...
Prepare Doc
http://www.slideshare.net/peihsinsu/jcconf2016-dataflow-workshop
LAB Doc
http://www.slideshare.net/peihsinsu/jcconf-2016-dataflow-workshop-labs
Or http://goo.gl/nfdBhV
Google Cloud in Big Data Solution
Google Focused Cloud
Virtualized
Data Centers
Standard virtual kit for
Rent. Still yours to
manage.
2nd
Wave
Colocation
1st
Wave
Your kit, someone
else’s building.
Yours to manage.
Assembly required True On Demand Cloud
Next
Storage Processing Memory Network
Clusters
Distributed Storage, Processing
& Machine Learning
Containers
3rd
Wave
An actual, global
elastic cloud
Invest your energy in
great apps.
Google Cloud Family
Foundation
Infrastructure & Operations
Data Services
Application
Runtime Services
Enabling No-Touch Operations
Breakthrough Insights,
Breakthrough Applications
The Gear that Powers Google
GCP tools for data processing and analysis
Dataflow
StoreCapture Analyze
BigQuery Larger
Hadoop
Ecosystem
Pub/Sub
Logs
App Engine
BigQuery streaming
Process
Cloud
Storage
Cloud
Datastore
(NoSQL)
Cloud SQL
(mySQL)
BigQuery
Storage
Dataproc Dataproc
Common big data processing flow
Devices
Physical or
virtual servers
as frontend
data receiver
MapReduce
servers for large
data
transformation in
batch way or
streaming
Strong queue
service for
handling large
scale data
injection
Large scale data
store for storing
data and serve
query workload
Smart devices,
IoT devices
and Sensors
1M
Devices
16.6K
Events/sec
16.6K
Events/sec
43B
Events/month
SpannerDremelMapReduce
Big Table Colossus
2012 20132002 2004 2006 2008 2010
GFS MillWheel
Flume
Google Big Data Evolution History
Look into Dataflow
GCP
Managed Service
User Code & SDK
Work Manager
Deploy & Schedule
Monitoring UI
Job Manager
Progress & Logs
Dataflow use case
• Movement
• Filtering
• Enrichment
• Shaping
• Reduction
• Batch computation
• Continuous
computation
• Composition
• External
orchestration
• Simulation
OrchestrationAnalysisETL
Getting Start - Installation
Get your GCP project
Install Eclipse Plugin for Dataflow
Verify your installation
Another Choice...
https://github.com/peihsinsu/gcp-dataflow-java
Getting start with GCS
Google Cloud Storage Features
Online cloud
import (Cloud
Storage Transfer
Service)
Object lifecycle
management
ACLs
Object change
notification
Offline import
(third party)
Regional buckets
Object
versioning
Create your bucket for Dataflow use
Run Dataflow in Local
Create dataflow project
Dataflow Sample
Execute Dataflow
Lab 1: Ready your dataflow
environment and create your
first dataflow project
● Create GCP project
● Install Eclipse and Dataflow plugin
● Create first Dataflow project
● Run your project
After lab - What thing in GCS bucket
After lab - The Run Configuration
Dataflow in Batch Mode
What we do in Big Data process...
Map
Shuffle
Reduce
ParDo
GroupByKey
ParDo
Pipeline
● A Direct Acyclic Graph of data processing
transformations
● Can be submitted to the Dataflow Service for
optimization and execution or executed on an
alternate runner e.g. Spark
● May include multiple inputs and multiple outputs
● May encompass many logical MapReduce
operations
● PCollections flow through the pipeline
Code Review
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
p.apply(Create.of("Hello", "World"))
.apply(ParDo.of(new DoFn<String, String>() {
@Override
public void processElement(ProcessContext c) {
c.output(c.element().toUpperCase());
}
}))
.apply(ParDo.of(new DoFn<String, Void>() {
@Override
public void processElement(ProcessContext c) {
LOG.info(c.element());
}
}));
p.run();
建立pipeline物件
執行pipeline
Inputs & Outputs
Your
Source/Sink
Here
❯ Read from standard Google Cloud Platform
data sources
• GCS, Pub/Sub, BigQuery, Datastore
❯ Write your own custom source by teaching
Dataflow how to read it in parallel
• Currently for bounded sources only
❯ Write to GCS, BigQuery, Pub/Sub
• More coming…
❯ Can use a combination of text, JSON, XML,
Avro formatted data
Code Review
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
p.apply(Create.of("Hello", "World"))
.apply(ParDo.of(new DoFn<String, String>() {
@Override
public void processElement(ProcessContext c) {
c.output(c.element().toUpperCase());
}
}))
.apply(ParDo.of(new DoFn<String, Void>() {
@Override
public void processElement(ProcessContext c) {
LOG.info(c.element());
}
}));
p.run();
建立Input
Parallel Do,Log輸出
PCollection
❯ A collection of data of type T in a pipeline
- PCollection<K,V>
❯ Maybe be either bounded or unbounded
in size
❯ Created by using a PTransform to:
• Build from a java.util.Collection
• Read from a backing data store
• Transform an existing PCollection
❯ Often contain the key-value pairs using
KV
{Seahawks, NFC, Champions, Seattle,
...}
{...,
“NFC Champions #GreenBay”,
“Green Bay #superbowl!”,
...
“#GoHawks”,
...}
Transforms
● A step, or a processing operation that transforms data
○ convert format , group , filter data
● Type of Transforms
○ ParDo
○ GroupByKey
○ Combine
○ Flatten
■ Multiple PCollection objects that contain the same data type, you can
merge them into a single logical PCollection using the Flatten transform
Code Review
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
p.apply(Create.of("Hello", "World"))
.apply(ParDo.of(new DoFn<String, String>() {
@Override
public void processElement(ProcessContext c) {
c.output(c.element().toUpperCase());
}
}))
.apply(ParDo.of(new DoFn<String, Void>() {
@Override
public void processElement(ProcessContext c) {
LOG.info(c.element());
}
}));
p.run();
Transform輸入,使用Parallel Do
方式,平行的將輸入物件轉大寫
Parallel Do,Log輸出
Pardo (Parallel do)
❯ Processes each element of a PCollection
independently using a user-provided DoFn
❯ Corresponds to both the Map and Reduce
phases in Hadoop i.e. ParDo->GBK->ParDo
❯ Useful for
Filtering a data set.
Formatting or converting the type of each element
in a data set.
Extracting parts of each element in a data set.
Performing computations on each element in a
data set.
{Seahawks, NFC,
Champions, Seattle, ...}
{
KV<S, Seahawks>,
KV<C,Champions>,
<KV<S, Seattle>,
KV<N, NFC>, …
}
KeyBySessionId
Group by key
Wait a minute…
How do you do a GroupByKey on an unbounded PCollection?
{KV<S, Seahawks>, KV<C,Champions>,
<KV<S, Seattle>, KV<N, NFC>, ...}
{KV<S, Seahawks>, KV<C,Champions>,
<KV<S, Seattle>, KV<N, NFC>, ...}
GroupByKey
• Takes a PCollection of key-value pairs
and gathers up all values with the same
key
• Corresponds to the shuffle phase in
Hadoop
{KV<S, {Seahawks, Seattle, …},
KV<N, {NFC, …}
KV<C, {Champion, …}}
Group by key sample
Lab 2: Deploy your first
project to Google Cloud
Platform
● Checking the Lab 1 project working well
● Deploy to cloud and watch the dataflow
task dashboard
● Implement the Input/Output/Transform
in your project
Dataflow Task Dashboard
Dataflow in Streaming Mode
Pub/Sub working model
Pub/Sub Operations - Topics
Pub/Sub Operations - Subscriber
Simple Guide for Pub/Sub
Ref: https://gcpug-tw.gitbooks.io/google-cloud-platform-in-practice/content/pubsub_getting_start.html
Dataflow with Cloud Pub/Sub Use Case
Publisher A Publisher B Publisher C
Message 1
Topic A Topic B Topic C
Subscription XA Subscription XB
Subscription
YC
Subscription
ZC
Cloud
Pub/Sub
Subscriber X Subscriber Y
Message 2 Message 3
Subscriber Z
Message 1
Message 2
Message 3
Message 3
Globally redundant
Low latency (sub sec.)
N to N coupling
Batched read/write
Push & Pull
Guaranteed Delivery
Auto expiration
Example of PubSub IO to BigQuery IO
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
options.setStreaming(true);
Pipeline p = Pipeline.create(options);
PCollection<String> input =
p.apply(PubsubIO.Read.topic("projects/sunny-573/topics/jcconf2016"));
PCollection<String> windowedWords =
input.apply(Window.<String> into(
FixedWindows.of(Duration.standardMinutes(options.getWindowSize()))));
PCollection<KV<String, Long>> wordCounts =
windowedWords.apply(new TestMain.MyCountWords());
wordCounts.apply(ParDo.of(new FormatAsTableRowFn())).apply(
BigQueryIO.Write.to(getTableReference(options)).withSchema(getSchema()));
p.run();
Lab 3: Create a Streaming
Dataflow model
● Create PubSub topic
● Deploy Dataflow streaming sample
● Watch Dataflow task dashboard
Streaming Dataflow Executing Log
Don’t forget to shut down streaming process...
After Dataflow
Datalab
An easy tool for analysis and report
Analysis tools - BigQuery & Datalab
BigQuery
An interactive analysis service
Google Data Studio
https://datastudio.google.com/
Thinking your data orchestration...
● Google Codelab:
https://codelabs.developers.google.com/?cat=Cloud
● Dataflow Doc:
https://cloud.google.com/dataflow/docs/
More learning resources
--- THANKS ---

More Related Content

What's hot

GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow Introduction
Simon Su
 
Google Cloud Platform Special Training
Google Cloud Platform Special TrainingGoogle Cloud Platform Special Training
Google Cloud Platform Special Training
Simon Su
 
Docker in Action
Docker in ActionDocker in Action
Docker in Action
Simon Su
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute Engine
Arun Nagarajan
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composer
Bruce Kuo
 
Using Google Compute Engine
Using Google Compute EngineUsing Google Compute Engine
Using Google Compute Engine
Lynn Langit
 
Hands on Compute Engine
Hands on Compute EngineHands on Compute Engine
Hands on Compute Engine
Simon Su
 
Introduction to Google Compute Engine
Introduction to Google Compute EngineIntroduction to Google Compute Engine
Introduction to Google Compute Engine
Colin Su
 
Intro to the Google Cloud for Developers
Intro to the Google Cloud for DevelopersIntro to the Google Cloud for Developers
Intro to the Google Cloud for Developers
Lynn Langit
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud Dataflow
C4Media
 
Google Cloud Platform - Eric Johnson, Joe Selman - ManageIQ Design Summit 2016
Google Cloud Platform - Eric Johnson, Joe Selman - ManageIQ Design Summit 2016Google Cloud Platform - Eric Johnson, Joe Selman - ManageIQ Design Summit 2016
Google Cloud Platform - Eric Johnson, Joe Selman - ManageIQ Design Summit 2016
ManageIQ
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
Hands on App Engine
Hands on App EngineHands on App Engine
Hands on App Engine
Simon Su
 
Getting started with Google Cloud Training Material - 2018
Getting started with Google Cloud Training Material - 2018Getting started with Google Cloud Training Material - 2018
Getting started with Google Cloud Training Material - 2018
JK Baseer
 
A Tour of Google Cloud Platform
A Tour of Google Cloud PlatformA Tour of Google Cloud Platform
A Tour of Google Cloud Platform
Colin Su
 
Google Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your ProductGoogle Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your Product
Sergey Smetanin
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
Simon Su
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
DataWorks Summit
 
Google Dataflow Intro
Google Dataflow IntroGoogle Dataflow Intro
Google Dataflow Intro
Ivan Glushkov
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 

What's hot (20)

GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow Introduction
 
Google Cloud Platform Special Training
Google Cloud Platform Special TrainingGoogle Cloud Platform Special Training
Google Cloud Platform Special Training
 
Docker in Action
Docker in ActionDocker in Action
Docker in Action
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute Engine
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composer
 
Using Google Compute Engine
Using Google Compute EngineUsing Google Compute Engine
Using Google Compute Engine
 
Hands on Compute Engine
Hands on Compute EngineHands on Compute Engine
Hands on Compute Engine
 
Introduction to Google Compute Engine
Introduction to Google Compute EngineIntroduction to Google Compute Engine
Introduction to Google Compute Engine
 
Intro to the Google Cloud for Developers
Intro to the Google Cloud for DevelopersIntro to the Google Cloud for Developers
Intro to the Google Cloud for Developers
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud Dataflow
 
Google Cloud Platform - Eric Johnson, Joe Selman - ManageIQ Design Summit 2016
Google Cloud Platform - Eric Johnson, Joe Selman - ManageIQ Design Summit 2016Google Cloud Platform - Eric Johnson, Joe Selman - ManageIQ Design Summit 2016
Google Cloud Platform - Eric Johnson, Joe Selman - ManageIQ Design Summit 2016
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Hands on App Engine
Hands on App EngineHands on App Engine
Hands on App Engine
 
Getting started with Google Cloud Training Material - 2018
Getting started with Google Cloud Training Material - 2018Getting started with Google Cloud Training Material - 2018
Getting started with Google Cloud Training Material - 2018
 
A Tour of Google Cloud Platform
A Tour of Google Cloud PlatformA Tour of Google Cloud Platform
A Tour of Google Cloud Platform
 
Google Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your ProductGoogle Cloud Platform as a Backend Solution for your Product
Google Cloud Platform as a Backend Solution for your Product
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Google Dataflow Intro
Google Dataflow IntroGoogle Dataflow Intro
Google Dataflow Intro
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 

Viewers also liked

Brocade - Stingray Application Firewall
Brocade - Stingray Application FirewallBrocade - Stingray Application Firewall
Brocade - Stingray Application Firewall
Simon Su
 
GCPNext17' Extend 開始GCP了嗎?
GCPNext17' Extend   開始GCP了嗎?GCPNext17' Extend   開始GCP了嗎?
GCPNext17' Extend 開始GCP了嗎?
Simon Su
 
Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧
Simon Su
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News Update
Simon Su
 
中原大學 Shift to cloud
中原大學   Shift to cloud中原大學   Shift to cloud
中原大學 Shift to cloud
Simon Su
 
GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論
Simon Su
 
Google Cloud Platform 2014Q4
Google Cloud Platform 2014Q4Google Cloud Platform 2014Q4
Google Cloud Platform 2014Q4
Simon Su
 
技術單兵作戰及團隊開發流程差異
技術單兵作戰及團隊開發流程差異技術單兵作戰及團隊開發流程差異
技術單兵作戰及團隊開發流程差異
Caesar Chi
 
html5 & phonegap
html5 & phonegaphtml5 & phonegap
html5 & phonegap
Caesar Chi
 
中華電信 教育訓練
中華電信 教育訓練中華電信 教育訓練
中華電信 教育訓練
謝 宗穎
 
Developer team review of 2014
Developer team review of 2014Developer team review of 2014
Developer team review of 2014
Caesar Chi
 
為 Node.js 專案打造專屬管家進行開發流程整合及健康檢測
為 Node.js 專案打造專屬管家進行開發流程整合及健康檢測為 Node.js 專案打造專屬管家進行開發流程整合及健康檢測
為 Node.js 專案打造專屬管家進行開發流程整合及健康檢測
謝 宗穎
 
GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧
Simon Su
 
Web development, from git flow to github flow
Web development, from git flow to github flowWeb development, from git flow to github flow
Web development, from git flow to github flow
Caesar Chi
 
Docker with Cloud Service GCPUG
Docker with Cloud Service  GCPUGDocker with Cloud Service  GCPUG
Docker with Cloud Service GCPUG
Caesar Chi
 
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic TrainingGCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
Simon Su
 
Google IO - When Bigquery meeet Node.js
Google IO - When Bigquery meeet Node.jsGoogle IO - When Bigquery meeet Node.js
Google IO - When Bigquery meeet Node.js
Simon Su
 
遠端團隊專案建立與管理 remote team management 2016
遠端團隊專案建立與管理 remote team management 2016遠端團隊專案建立與管理 remote team management 2016
遠端團隊專案建立與管理 remote team management 2016
Caesar Chi
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud Spanner
Simon Su
 
從失敗中學習打造技術團隊
從失敗中學習打造技術團隊從失敗中學習打造技術團隊
從失敗中學習打造技術團隊
Caesar Chi
 

Viewers also liked (20)

Brocade - Stingray Application Firewall
Brocade - Stingray Application FirewallBrocade - Stingray Application Firewall
Brocade - Stingray Application Firewall
 
GCPNext17' Extend 開始GCP了嗎?
GCPNext17' Extend   開始GCP了嗎?GCPNext17' Extend   開始GCP了嗎?
GCPNext17' Extend 開始GCP了嗎?
 
Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News Update
 
中原大學 Shift to cloud
中原大學   Shift to cloud中原大學   Shift to cloud
中原大學 Shift to cloud
 
GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論
 
Google Cloud Platform 2014Q4
Google Cloud Platform 2014Q4Google Cloud Platform 2014Q4
Google Cloud Platform 2014Q4
 
技術單兵作戰及團隊開發流程差異
技術單兵作戰及團隊開發流程差異技術單兵作戰及團隊開發流程差異
技術單兵作戰及團隊開發流程差異
 
html5 & phonegap
html5 & phonegaphtml5 & phonegap
html5 & phonegap
 
中華電信 教育訓練
中華電信 教育訓練中華電信 教育訓練
中華電信 教育訓練
 
Developer team review of 2014
Developer team review of 2014Developer team review of 2014
Developer team review of 2014
 
為 Node.js 專案打造專屬管家進行開發流程整合及健康檢測
為 Node.js 專案打造專屬管家進行開發流程整合及健康檢測為 Node.js 專案打造專屬管家進行開發流程整合及健康檢測
為 Node.js 專案打造專屬管家進行開發流程整合及健康檢測
 
GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧
 
Web development, from git flow to github flow
Web development, from git flow to github flowWeb development, from git flow to github flow
Web development, from git flow to github flow
 
Docker with Cloud Service GCPUG
Docker with Cloud Service  GCPUGDocker with Cloud Service  GCPUG
Docker with Cloud Service GCPUG
 
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic TrainingGCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
 
Google IO - When Bigquery meeet Node.js
Google IO - When Bigquery meeet Node.jsGoogle IO - When Bigquery meeet Node.js
Google IO - When Bigquery meeet Node.js
 
遠端團隊專案建立與管理 remote team management 2016
遠端團隊專案建立與管理 remote team management 2016遠端團隊專案建立與管理 remote team management 2016
遠端團隊專案建立與管理 remote team management 2016
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud Spanner
 
從失敗中學習打造技術團隊
從失敗中學習打造技術團隊從失敗中學習打造技術團隊
從失敗中學習打造技術團隊
 

Similar to JCConf 2016 - Google Dataflow 小試

Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
DSDT_MTL
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
Derek Jacoby
 
Turku loves-storybook-styleguidist-styled-components
Turku loves-storybook-styleguidist-styled-componentsTurku loves-storybook-styleguidist-styled-components
Turku loves-storybook-styleguidist-styled-components
James Stone
 
Xephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backendsXephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backends
University of California, Santa Cruz
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
Daniel Zivkovic
 
How Bitbucket Pipelines Loads Connect UI Assets Super-fast
How Bitbucket Pipelines Loads Connect UI Assets Super-fastHow Bitbucket Pipelines Loads Connect UI Assets Super-fast
How Bitbucket Pipelines Loads Connect UI Assets Super-fast
Atlassian
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
TypeScript and SharePoint Framework
TypeScript and SharePoint FrameworkTypeScript and SharePoint Framework
TypeScript and SharePoint Framework
Bob German
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
Julian Hyde
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
Stamatis Zampetakis
 
Setting Up a TIG Stack for Your Testing
Setting Up a TIG Stack for Your TestingSetting Up a TIG Stack for Your Testing
Setting Up a TIG Stack for Your Testing
Jet Liu
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Daniel Zivkovic
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Chester Chen
 
Introduction to Django
Introduction to DjangoIntroduction to Django
Introduction to Django
James Casey
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
Amazon Web Services
 
Serverless Computing with Google Cloud
Serverless Computing with Google CloudServerless Computing with Google Cloud
Serverless Computing with Google Cloud
wesley chun
 
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
nagachika t
 

Similar to JCConf 2016 - Google Dataflow 小試 (20)

Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
Turku loves-storybook-styleguidist-styled-components
Turku loves-storybook-styleguidist-styled-componentsTurku loves-storybook-styleguidist-styled-components
Turku loves-storybook-styleguidist-styled-components
 
Xephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backendsXephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backends
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
 
How Bitbucket Pipelines Loads Connect UI Assets Super-fast
How Bitbucket Pipelines Loads Connect UI Assets Super-fastHow Bitbucket Pipelines Loads Connect UI Assets Super-fast
How Bitbucket Pipelines Loads Connect UI Assets Super-fast
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
TypeScript and SharePoint Framework
TypeScript and SharePoint FrameworkTypeScript and SharePoint Framework
TypeScript and SharePoint Framework
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Setting Up a TIG Stack for Your Testing
Setting Up a TIG Stack for Your TestingSetting Up a TIG Stack for Your Testing
Setting Up a TIG Stack for Your Testing
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
Introduction to Django
Introduction to DjangoIntroduction to Django
Introduction to Django
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Serverless Computing with Google Cloud
Serverless Computing with Google CloudServerless Computing with Google Cloud
Serverless Computing with Google Cloud
 
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
 

More from Simon Su

Kubernetes Basic Operation
Kubernetes Basic OperationKubernetes Basic Operation
Kubernetes Basic Operation
Simon Su
 
Google IoT Core 初體驗
Google IoT Core 初體驗Google IoT Core 初體驗
Google IoT Core 初體驗
Simon Su
 
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoTJSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
Simon Su
 
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
Simon Su
 
GCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage GuideGCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage Guide
Simon Su
 
Google Cloud Monitoring
Google Cloud MonitoringGoogle Cloud Monitoring
Google Cloud Monitoring
Simon Su
 
JCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop SetupJCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop Setup
Simon Su
 
IThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsIThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOps
Simon Su
 
GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)
Simon Su
 
Google Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile SolutionsGoogle Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile Solutions
Simon Su
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
Simon Su
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
Simon Su
 
CouchDB Getting Start
CouchDB Getting StartCouchDB Getting Start
CouchDB Getting Start
Simon Su
 
Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明
Simon Su
 

More from Simon Su (14)

Kubernetes Basic Operation
Kubernetes Basic OperationKubernetes Basic Operation
Kubernetes Basic Operation
 
Google IoT Core 初體驗
Google IoT Core 初體驗Google IoT Core 初體驗
Google IoT Core 初體驗
 
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoTJSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
 
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
 
GCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage GuideGCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage Guide
 
Google Cloud Monitoring
Google Cloud MonitoringGoogle Cloud Monitoring
Google Cloud Monitoring
 
JCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop SetupJCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop Setup
 
IThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsIThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOps
 
GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)
 
Google Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile SolutionsGoogle Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile Solutions
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
 
CouchDB Getting Start
CouchDB Getting StartCouchDB Getting Start
CouchDB Getting Start
 
Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明
 

Recently uploaded

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

JCConf 2016 - Google Dataflow 小試

  • 1. Google Dataflow 小試 Simon Su @ LinkerNetworks {Google Developer Expert} https://goo.gl/xkANFT
  • 2. var simon = {/** I am at GCPUG.TW **/}; simon.aboutme = 'http://about.me/peihsinsu'; simon.nodejs = ‘http://opennodes.arecord.us'; simon.googleshare = 'http://gappsnews.blogspot.tw' simon.nodejsblog = ‘http://nodejs-in-example.blogspot.tw'; simon.blog = ‘http://peihsinsu.blogspot.com'; simon.slideshare = ‘http://slideshare.net/peihsinsu/'; simon.email = ‘simonsu.mail@gmail.com’; simon.say(‘Good luck to everybody!'); I’m Simon Su...
  • 5. ● Data scientist ● Data engineer ● Frontend engineer
  • 6. Before Workshop... Prepare Doc http://www.slideshare.net/peihsinsu/jcconf2016-dataflow-workshop LAB Doc http://www.slideshare.net/peihsinsu/jcconf-2016-dataflow-workshop-labs Or http://goo.gl/nfdBhV
  • 7. Google Cloud in Big Data Solution
  • 8. Google Focused Cloud Virtualized Data Centers Standard virtual kit for Rent. Still yours to manage. 2nd Wave Colocation 1st Wave Your kit, someone else’s building. Yours to manage. Assembly required True On Demand Cloud Next Storage Processing Memory Network Clusters Distributed Storage, Processing & Machine Learning Containers 3rd Wave An actual, global elastic cloud Invest your energy in great apps.
  • 9. Google Cloud Family Foundation Infrastructure & Operations Data Services Application Runtime Services Enabling No-Touch Operations Breakthrough Insights, Breakthrough Applications The Gear that Powers Google
  • 10. GCP tools for data processing and analysis Dataflow StoreCapture Analyze BigQuery Larger Hadoop Ecosystem Pub/Sub Logs App Engine BigQuery streaming Process Cloud Storage Cloud Datastore (NoSQL) Cloud SQL (mySQL) BigQuery Storage Dataproc Dataproc
  • 11. Common big data processing flow Devices Physical or virtual servers as frontend data receiver MapReduce servers for large data transformation in batch way or streaming Strong queue service for handling large scale data injection Large scale data store for storing data and serve query workload Smart devices, IoT devices and Sensors 1M Devices 16.6K Events/sec 16.6K Events/sec 43B Events/month
  • 12. SpannerDremelMapReduce Big Table Colossus 2012 20132002 2004 2006 2008 2010 GFS MillWheel Flume Google Big Data Evolution History
  • 13. Look into Dataflow GCP Managed Service User Code & SDK Work Manager Deploy & Schedule Monitoring UI Job Manager Progress & Logs
  • 14. Dataflow use case • Movement • Filtering • Enrichment • Shaping • Reduction • Batch computation • Continuous computation • Composition • External orchestration • Simulation OrchestrationAnalysisETL
  • 15. Getting Start - Installation
  • 16. Get your GCP project
  • 17. Install Eclipse Plugin for Dataflow
  • 21. Google Cloud Storage Features Online cloud import (Cloud Storage Transfer Service) Object lifecycle management ACLs Object change notification Offline import (third party) Regional buckets Object versioning
  • 22. Create your bucket for Dataflow use
  • 27. Lab 1: Ready your dataflow environment and create your first dataflow project ● Create GCP project ● Install Eclipse and Dataflow plugin ● Create first Dataflow project ● Run your project
  • 28. After lab - What thing in GCS bucket
  • 29. After lab - The Run Configuration
  • 31. What we do in Big Data process... Map Shuffle Reduce ParDo GroupByKey ParDo
  • 32. Pipeline ● A Direct Acyclic Graph of data processing transformations ● Can be submitted to the Dataflow Service for optimization and execution or executed on an alternate runner e.g. Spark ● May include multiple inputs and multiple outputs ● May encompass many logical MapReduce operations ● PCollections flow through the pipeline
  • 33. Code Review Pipeline p = Pipeline.create( PipelineOptionsFactory.fromArgs(args).withValidation().create()); p.apply(Create.of("Hello", "World")) .apply(ParDo.of(new DoFn<String, String>() { @Override public void processElement(ProcessContext c) { c.output(c.element().toUpperCase()); } })) .apply(ParDo.of(new DoFn<String, Void>() { @Override public void processElement(ProcessContext c) { LOG.info(c.element()); } })); p.run(); 建立pipeline物件 執行pipeline
  • 34. Inputs & Outputs Your Source/Sink Here ❯ Read from standard Google Cloud Platform data sources • GCS, Pub/Sub, BigQuery, Datastore ❯ Write your own custom source by teaching Dataflow how to read it in parallel • Currently for bounded sources only ❯ Write to GCS, BigQuery, Pub/Sub • More coming… ❯ Can use a combination of text, JSON, XML, Avro formatted data
  • 35. Code Review Pipeline p = Pipeline.create( PipelineOptionsFactory.fromArgs(args).withValidation().create()); p.apply(Create.of("Hello", "World")) .apply(ParDo.of(new DoFn<String, String>() { @Override public void processElement(ProcessContext c) { c.output(c.element().toUpperCase()); } })) .apply(ParDo.of(new DoFn<String, Void>() { @Override public void processElement(ProcessContext c) { LOG.info(c.element()); } })); p.run(); 建立Input Parallel Do,Log輸出
  • 36. PCollection ❯ A collection of data of type T in a pipeline - PCollection<K,V> ❯ Maybe be either bounded or unbounded in size ❯ Created by using a PTransform to: • Build from a java.util.Collection • Read from a backing data store • Transform an existing PCollection ❯ Often contain the key-value pairs using KV {Seahawks, NFC, Champions, Seattle, ...} {..., “NFC Champions #GreenBay”, “Green Bay #superbowl!”, ... “#GoHawks”, ...}
  • 37. Transforms ● A step, or a processing operation that transforms data ○ convert format , group , filter data ● Type of Transforms ○ ParDo ○ GroupByKey ○ Combine ○ Flatten ■ Multiple PCollection objects that contain the same data type, you can merge them into a single logical PCollection using the Flatten transform
  • 38. Code Review Pipeline p = Pipeline.create( PipelineOptionsFactory.fromArgs(args).withValidation().create()); p.apply(Create.of("Hello", "World")) .apply(ParDo.of(new DoFn<String, String>() { @Override public void processElement(ProcessContext c) { c.output(c.element().toUpperCase()); } })) .apply(ParDo.of(new DoFn<String, Void>() { @Override public void processElement(ProcessContext c) { LOG.info(c.element()); } })); p.run(); Transform輸入,使用Parallel Do 方式,平行的將輸入物件轉大寫 Parallel Do,Log輸出
  • 39. Pardo (Parallel do) ❯ Processes each element of a PCollection independently using a user-provided DoFn ❯ Corresponds to both the Map and Reduce phases in Hadoop i.e. ParDo->GBK->ParDo ❯ Useful for Filtering a data set. Formatting or converting the type of each element in a data set. Extracting parts of each element in a data set. Performing computations on each element in a data set. {Seahawks, NFC, Champions, Seattle, ...} { KV<S, Seahawks>, KV<C,Champions>, <KV<S, Seattle>, KV<N, NFC>, … } KeyBySessionId
  • 40. Group by key Wait a minute… How do you do a GroupByKey on an unbounded PCollection? {KV<S, Seahawks>, KV<C,Champions>, <KV<S, Seattle>, KV<N, NFC>, ...} {KV<S, Seahawks>, KV<C,Champions>, <KV<S, Seattle>, KV<N, NFC>, ...} GroupByKey • Takes a PCollection of key-value pairs and gathers up all values with the same key • Corresponds to the shuffle phase in Hadoop {KV<S, {Seahawks, Seattle, …}, KV<N, {NFC, …} KV<C, {Champion, …}}
  • 41. Group by key sample
  • 42. Lab 2: Deploy your first project to Google Cloud Platform ● Checking the Lab 1 project working well ● Deploy to cloud and watch the dataflow task dashboard ● Implement the Input/Output/Transform in your project
  • 47. Pub/Sub Operations - Subscriber
  • 48. Simple Guide for Pub/Sub Ref: https://gcpug-tw.gitbooks.io/google-cloud-platform-in-practice/content/pubsub_getting_start.html
  • 49. Dataflow with Cloud Pub/Sub Use Case Publisher A Publisher B Publisher C Message 1 Topic A Topic B Topic C Subscription XA Subscription XB Subscription YC Subscription ZC Cloud Pub/Sub Subscriber X Subscriber Y Message 2 Message 3 Subscriber Z Message 1 Message 2 Message 3 Message 3 Globally redundant Low latency (sub sec.) N to N coupling Batched read/write Push & Pull Guaranteed Delivery Auto expiration
  • 50. Example of PubSub IO to BigQuery IO Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class); options.setStreaming(true); Pipeline p = Pipeline.create(options); PCollection<String> input = p.apply(PubsubIO.Read.topic("projects/sunny-573/topics/jcconf2016")); PCollection<String> windowedWords = input.apply(Window.<String> into( FixedWindows.of(Duration.standardMinutes(options.getWindowSize())))); PCollection<KV<String, Long>> wordCounts = windowedWords.apply(new TestMain.MyCountWords()); wordCounts.apply(ParDo.of(new FormatAsTableRowFn())).apply( BigQueryIO.Write.to(getTableReference(options)).withSchema(getSchema())); p.run();
  • 51. Lab 3: Create a Streaming Dataflow model ● Create PubSub topic ● Deploy Dataflow streaming sample ● Watch Dataflow task dashboard
  • 53. Don’t forget to shut down streaming process...
  • 55.
  • 56. Datalab An easy tool for analysis and report Analysis tools - BigQuery & Datalab BigQuery An interactive analysis service
  • 58. Thinking your data orchestration...
  • 59. ● Google Codelab: https://codelabs.developers.google.com/?cat=Cloud ● Dataflow Doc: https://cloud.google.com/dataflow/docs/ More learning resources