SlideShare a Scribd company logo
JCConf Dataflow Workshop Labs
{Simon Su / 20161015}
Index
Index 1
Lab 1: 準備Dataflow環境,並建置第一個專案 1
建立GCP專案,並安裝Eclipse開發環境 1
安裝Google Cloud SDK 1
啟用Dataflow API 2
建立第一個Dataflow專案 3
執行您的專案 6
Lab 2: 佈署您的第一個專案到Google Cloud Platform 9
準備工作 9
執行佈署 9
檢測執行結果 10
實作Input/Output/Transform等功能 12
Lab 3: 建立Streaming Dataflow 16
建立PubSub topic / subscription 16
佈署Dataflow streaming sample 16
Streaming範例1 16
Streaming範例2 17
從Dashboard監控Dataflow Streaming Task 19
Lab結束後 20
Lab 1: 準備Dataflow環境,並建置第一個專案
建立GCP專案,並安裝Eclipse開發環境
請參考:​JCConf 2016 - Dataflow Workshop行前說明
安裝Google Cloud SDK
● 請參考此URL安裝Cloud SDK:​https://cloud.google.com/sdk/?hl=en_US#download
● 認證Cloud SDK:
> gcloud auth login
> gcloud auth application-default login
● 設定預設專案
> gcloud config set project <your-project-id>
● 確認安裝
> gcloud config list
啟用Dataflow API
至所屬Project的API Manager項目:
在API Manager Dashboard中點選Enable API:
搜尋Dataflow項目:
將該項目做Enable:
建立第一個Dataflow專案
透過Eclipse Dataflow Wizard可以協助您建立Dataflow的相關專案,步驟如下:
Step1: 選擇New > Other...
Step2: 選擇Google Cloud Platform > Cloud Dataflow Java Project
Step3: 輸入您的專案資訊
Step4: 輸入Google Cloud Platform上的專案ID與Cloud Storage資訊
Step4: 專案建立好後,可以檢視專案狀態
範例程式如下:
執行您的專案
點選右上角 按鈕,建立新的Dataflow Run Configuration...
設定Run Configuration名稱
設定Runner形式:
檢視佈署Log狀態...
Lab 2: 佈署您的第一個專案到Google Cloud Platform
準備工作
在進行Lab2前的前置工作部分,需要先確認您在Lab1的專案可以正常執行,然後您可以依照您的需
求稍加改動您的專案,測試一下變化...
執行佈署
透過”Run As > Run Configurations...”之項目進入到Run Configurations設定視窗
設定視窗如下:
您可以點選視窗中的”New Launch Configuration”按鈕(下圖紅色標記處)來建立新的Configuration…
本Lab中,新的Configuration有兩個地方需要設定:
1. 設定Main method
2. 設定Pipeline Arguments
檢測執行結果
在執行視窗中,Console會顯示執行的過程,大致結果如下:
執行當下,可以到依照IDE Console的指示,連線到Web Console檢視該Dataflow Task狀態:
該執行項目的詳細畫面如下:
可以透過”LOGS”鏈結檢視執行狀況...
實作Input/Output/Transform等功能
修改您的專案,讓他從Google Cloud Storage抓取檔案...
@SuppressWarnings("serial")
public class TestMain {
private static final Logger LOG = LoggerFactory.getLogger(TestMain.class);
public static void main(String[] args) {
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
p.apply(​TextIO.Read.named("sample-book").from("​gs://jcconf2016-dataflow-workshop/sample/book-sample.txt​")​)
.apply(ParDo.of(new DoFn<String, String>() {
@Override
public void processElement(ProcessContext c) {
c.output(c.element().toUpperCase());
}
}))
.apply(ParDo.of(new DoFn<String, Void>() {
@Override
public void processElement(ProcessContext c) {
LOG.info(c.element());
}
}));
p.run();
}
}
進一步修改程式,讓資料輸出到Google Cloud Storage…
@SuppressWarnings("serial")
public class TestMain {
private static final Logger LOG = LoggerFactory.getLogger(TestMain.class);
public static void main(String[] args) {
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
p.apply(​TextIO.Read.named("sample-book").from("​gs://jcconf2016-dataflow-workshop/sample/book-sample.txt​"))
.apply(ParDo.of(new DoFn<String, String>() {
@Override
public void processElement(ProcessContext c) {
c.output(c.element().toUpperCase());
}
}))
.apply(​TextIO.Write.named("output-book").to("​gs://jcconf2016-dataflow-workshop/result/book-sample.txt​"));
p.run();
}
}
增加Transform Function,將段落字元切割
@SuppressWarnings("serial")
public class TestMain {
private static final Logger LOG = LoggerFactory.getLogger(TestMain.class);
public static void main(String[] args) {
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
p.apply(TextIO.Read.named("sample-book").from("gs://jcconf2016-dataflow-workshop/sample/book-sample.txt"))
.apply(​ParDo.of(new DoFn<String, String>() {
private final Aggregator<Long, Long> emptyLines =
createAggregator("emptyLines", new Sum.SumLongFn());
@Override
public void processElement(ProcessContext c) {
if (c.element().trim().isEmpty()) {
emptyLines.addValue(1L);
}
// Split the line into words.
String[] words = c.element().split("[^a-zA-Z']+");
// Output each word encountered into the output PCollection.
for (String word : words) {
if (!word.isEmpty()) {
c.output(word);
}
}
}
}))
.apply(TextIO.Write.named("output-book").to("gs://jcconf2016-dataflow-workshop/result/book-sample.txt"));
p.run();
}
}
Word Count Sample - 計算每個文件中單字出現的數量
@SuppressWarnings("serial")
public class TestMain {
static class MyExtractWordsFn extends DoFn<String, String> {
private final Aggregator<Long, Long> emptyLines = createAggregator(
"emptyLines", new Sum.SumLongFn());
@Override
public void processElement(ProcessContext c) {
if (c.element().trim().isEmpty()) {
emptyLines.addValue(1L);
}
// Split the line into words.
String[] words = c.element().split("[^a-zA-Z']+");
// Output each word encountered into the output PCollection.
for (String word : words) {
if (!word.isEmpty()) {
c.output(word);
}
}
}
}
public static class MyCountWords extends
PTransform<PCollection<String>, PCollection<KV<String, Long>>> {
@Override
public PCollection<KV<String, Long>> apply(PCollection<String> lines) {
// Convert lines of text into individual words.
PCollection<String> words = lines.apply(ParDo.of(new MyExtractWordsFn()));
// Count the number of times each word occurs.
PCollection<KV<String, Long>> wordCounts = words.apply(Count.<String> perElement());
return wordCounts;
}
}
public static class MyFormatAsTextFn extends DoFn<KV<String, Long>, String> {
@Override
public void processElement(ProcessContext c) {
c.output(c.element().getKey() + ": " + c.element().getValue());
}
}
private static final Logger LOG = LoggerFactory.getLogger(TestMain.class);
public static void main(String[] args) {
Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(args)
.withValidation().create());
p.apply(TextIO.Read.named("sample-book").from(
"gs://jcconf2016-dataflow-workshop/sample/book-sample.txt"))
.apply(new MyCountWords())
.apply(ParDo.of(new MyFormatAsTextFn()))
.apply(TextIO.Write.named("output-book")
.to("gs://jcconf2016-dataflow-workshop/result/book-sample.txt"));
p.run();
}
}
Lab 3: 建立Streaming Dataflow
建立PubSub topic / subscription
建立topic
gcloud beta pubsub topics create jcconf2016
建立該topic的subscription
gcloud beta pubsub subscriptions create --topic jcconf2016 jcconf2016-sub001
佈署Dataflow streaming sample
Streaming範例1
聆聽subscription作為資料輸入,並將資料輸出在LOG中...
public static void main(String[] args) {
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
options.setStreaming(true);
Pipeline p = Pipeline.create(options);
p.apply(PubsubIO.Read.named("my-pubsub-input")
.subscription("projects/sunny-573/subscriptions/jcconf2016-sub001"))
.apply(ParDo.of(new DoFn<String, String>() {
@Override
public void processElement(ProcessContext c) {
c.output(c.element().toUpperCase());
}
}))
.apply(ParDo.of(new DoFn<String, Void>() {
@Override
public void processElement(ProcessContext c) {
LOG.info(c.element());
}
}));
p.run();
}
Streaming範例2
整合Work Count範例,將資料寫入BigQuery的dataset中...
/*
* Copyright (C) 2015 Google Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You may obtain a copy of
* the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations under
* the License.
*/
package com.jcconf2016.demo;
import java.util.ArrayList;
import java.util.List;
import org.joda.time.Duration;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.google.api.services.bigquery.model.TableFieldSchema;
import com.google.api.services.bigquery.model.TableReference;
import com.google.api.services.bigquery.model.TableRow;
import com.google.api.services.bigquery.model.TableSchema;
import com.google.cloud.dataflow.sdk.Pipeline;
import com.google.cloud.dataflow.sdk.io.BigQueryIO;
import com.google.cloud.dataflow.sdk.io.PubsubIO;
import com.google.cloud.dataflow.sdk.options.Default;
import com.google.cloud.dataflow.sdk.options.Description;
import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory;
import com.google.cloud.dataflow.sdk.options.StreamingOptions;
import com.google.cloud.dataflow.sdk.transforms.DoFn;
import com.google.cloud.dataflow.sdk.transforms.ParDo;
import com.google.cloud.dataflow.sdk.transforms.windowing.FixedWindows;
import com.google.cloud.dataflow.sdk.transforms.windowing.Window;
import com.google.cloud.dataflow.sdk.values.KV;
import com.google.cloud.dataflow.sdk.values.PCollection;
/**
* A starter example for writing Google Cloud Dataflow programs.
*
* <p>
* The example takes two strings, converts them to their upper-case
* representation and logs them.
*
* <p>
* To run this starter example locally using DirectPipelineRunner, just execute
* it without any additional parameters from your favorite development
* environment. In Eclipse, this corresponds to the existing 'LOCAL' run
* configuration.
*
* <p>
* To run this starter example using managed resource in Google Cloud Platform,
* you should specify the following command-line options:
* --project=<YOUR_PROJECT_ID>
* --stagingLocation=<STAGING_LOCATION_IN_CLOUD_STORAGE>
* --runner=BlockingDataflowPipelineRunner In Eclipse, you can just modify the
* existing 'SERVICE' run configuration.
*/
@SuppressWarnings("serial")
public class StreamingPipeline {
static final int WINDOW_SIZE = 1; // Default window duration in minutes
public static interface Options extends StreamingOptions {
@Description("Fixed window duration, in minutes")
@Default.Integer(WINDOW_SIZE)
Integer getWindowSize();
void setWindowSize(Integer value);
@Description("Whether to run the pipeline with unbounded input")
boolean isUnbounded();
void setUnbounded(boolean value);
}
private static TableReference getTableReference(Options options) {
TableReference tableRef = new TableReference();
tableRef.setProjectId("sunny-573");
tableRef.setDatasetId("jcconf2016");
tableRef.setTableId("pubsub");
return tableRef;
}
private static TableSchema getSchema() {
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("word").setType("STRING"));
fields.add(new TableFieldSchema().setName("count").setType("INTEGER"));
fields.add(new TableFieldSchema().setName("window_timestamp").setType(
"TIMESTAMP"));
TableSchema schema = new TableSchema().setFields(fields);
return schema;
}
static class FormatAsTableRowFn extends DoFn<KV<String, Long>, TableRow> {
@Override
public void processElement(ProcessContext c) {
TableRow row = new TableRow().set("word", c.element().getKey())
.set("count", c.element().getValue())
// include a field for the window timestamp
.set("window_timestamp", c.timestamp().toString());
c.output(row);
}
}
private static final Logger LOG = LoggerFactory
.getLogger(StreamingPipeline.class);
public static void main(String[] args) {
Options options = PipelineOptionsFactory.fromArgs(args)
.withValidation().as(Options.class);
options.setStreaming(true);
Pipeline p = Pipeline.create(options);
PCollection<String> input = p.apply(PubsubIO.Read.topic("projects/sunny-573/topics/jcconf2016"));
PCollection<String> windowedWords =
input.apply(Window.<String>
into(FixedWindows.of(Duration.standardMinutes(options.getWindowSize()))));
PCollection<KV<String, Long>> wordCounts = windowedWords.apply(new
TestMain.MyCountWords());
wordCounts.apply(ParDo.of(new FormatAsTableRowFn())).apply(
BigQueryIO.Write.to(getTableReference(options)).withSchema(getSchema()));
p.run();
}
}
從Dashboard監控Dataflow Streaming Task
打開GCP Web Console,使用Dataflow Dashboard來檢視每個流程的執行狀況。
並透過Cloud Logging來檢視執行Log…
Lab結束後
在Lab結束後,記得參考IDE輸出的Log,將Dataflow job做cancel動作,避免Streaming Dataflow仍
在運行中,主機無法關閉...
gcloud alpha dataflow jobs --project=sunny-573 cancel
2016-10-14_08_38_48-17987270960467929246

More Related Content

What's hot

Docker in Action
Docker in ActionDocker in Action
Docker in Action
Simon Su
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
Chris Riccomini
 
Google Cloud Dataflow meets TensorFlow
Google Cloud Dataflow meets TensorFlowGoogle Cloud Dataflow meets TensorFlow
Google Cloud Dataflow meets TensorFlow
Hayato Yoshikawa
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
mutt_data
 
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
Jarek Potiuk
 
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Kaxil Naik
 
Supercharge your app with Cloud Functions for Firebase
Supercharge your app with Cloud Functions for FirebaseSupercharge your app with Cloud Functions for Firebase
Supercharge your app with Cloud Functions for Firebase
Bret McGowen - NYC Google Developer Advocate
 
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonScheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
VMware Tanzu
 
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonScheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
VMware Tanzu
 
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
DataWorks Summit
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
Amazon Web Services
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
Tatiana Al-Chueyr
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
Yohei Onishi
 
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Jarek Potiuk
 
Google compute engine - overview
Google compute engine - overviewGoogle compute engine - overview
Google compute engine - overview
Charles Fan
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
Laura Lorenz
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
 
Google Cloud Platform Special Training
Google Cloud Platform Special TrainingGoogle Cloud Platform Special Training
Google Cloud Platform Special Training
Simon Su
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
Kaxil Naik
 

What's hot (20)

Docker in Action
Docker in ActionDocker in Action
Docker in Action
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
Google Cloud Dataflow meets TensorFlow
Google Cloud Dataflow meets TensorFlowGoogle Cloud Dataflow meets TensorFlow
Google Cloud Dataflow meets TensorFlow
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
 
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021
 
Supercharge your app with Cloud Functions for Firebase
Supercharge your app with Cloud Functions for FirebaseSupercharge your app with Cloud Functions for Firebase
Supercharge your app with Cloud Functions for Firebase
 
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonScheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
 
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy ClarksonScheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
Scheduling Apps in the Cloud - Glenn Renfro & Roy Clarkson
 
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
 
Google compute engine - overview
Google compute engine - overviewGoogle compute engine - overview
Google compute engine - overview
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Google Cloud Platform Special Training
Google Cloud Platform Special TrainingGoogle Cloud Platform Special Training
Google Cloud Platform Special Training
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
 
Go With The Flow
Go With The FlowGo With The Flow
Go With The Flow
 

Viewers also liked

Google Cloud Monitoring
Google Cloud MonitoringGoogle Cloud Monitoring
Google Cloud Monitoring
Simon Su
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud Spanner
Simon Su
 
GAE - Using CloudStorage through FileReadChannel
GAE - Using CloudStorage through FileReadChannelGAE - Using CloudStorage through FileReadChannel
GAE - Using CloudStorage through FileReadChannelSimon Su
 
JCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop SetupJCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop Setup
Simon Su
 
GAE Java IDE installation
GAE Java IDE installationGAE Java IDE installation
GAE Java IDE installation
Simon Su
 
GCS - Java to store data in Cloud Storage
GCS - Java to store data in Cloud StorageGCS - Java to store data in Cloud Storage
GCS - Java to store data in Cloud StorageSimon Su
 
GCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage GuideGCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage Guide
Simon Su
 
愛愛上雲端
愛愛上雲端愛愛上雲端
愛愛上雲端
志賢 黃
 
Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明
Simon Su
 
均一Gae甘苦談
均一Gae甘苦談均一Gae甘苦談
均一Gae甘苦談
蘇 倚恩
 
GCPNext17' Extend 開始GCP了嗎?
GCPNext17' Extend   開始GCP了嗎?GCPNext17' Extend   開始GCP了嗎?
GCPNext17' Extend 開始GCP了嗎?
Simon Su
 

Viewers also liked (11)

Google Cloud Monitoring
Google Cloud MonitoringGoogle Cloud Monitoring
Google Cloud Monitoring
 
Try Cloud Spanner
Try Cloud SpannerTry Cloud Spanner
Try Cloud Spanner
 
GAE - Using CloudStorage through FileReadChannel
GAE - Using CloudStorage through FileReadChannelGAE - Using CloudStorage through FileReadChannel
GAE - Using CloudStorage through FileReadChannel
 
JCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop SetupJCConf2016 - Dataflow Workshop Setup
JCConf2016 - Dataflow Workshop Setup
 
GAE Java IDE installation
GAE Java IDE installationGAE Java IDE installation
GAE Java IDE installation
 
GCS - Java to store data in Cloud Storage
GCS - Java to store data in Cloud StorageGCS - Java to store data in Cloud Storage
GCS - Java to store data in Cloud Storage
 
GCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage GuideGCE Windows Serial Console Usage Guide
GCE Windows Serial Console Usage Guide
 
愛愛上雲端
愛愛上雲端愛愛上雲端
愛愛上雲端
 
Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明Google Cloud Platform專案建立說明
Google Cloud Platform專案建立說明
 
均一Gae甘苦談
均一Gae甘苦談均一Gae甘苦談
均一Gae甘苦談
 
GCPNext17' Extend 開始GCP了嗎?
GCPNext17' Extend   開始GCP了嗎?GCPNext17' Extend   開始GCP了嗎?
GCPNext17' Extend 開始GCP了嗎?
 

Similar to JCConf 2016 - Dataflow Workshop Labs

Dropwizard and Friends
Dropwizard and FriendsDropwizard and Friends
Dropwizard and Friends
Yun Zhi Lin
 
Automated User Tests with Apache Flex
Automated User Tests with Apache FlexAutomated User Tests with Apache Flex
Automated User Tests with Apache Flex
Gert Poppe
 
Release with confidence
Release with confidenceRelease with confidence
Release with confidence
John Congdon
 
Automated User Tests with Apache Flex
Automated User Tests with Apache FlexAutomated User Tests with Apache Flex
Automated User Tests with Apache FlexGert Poppe
 
Let's play with adf 3.0
Let's play with adf 3.0Let's play with adf 3.0
Let's play with adf 3.0
Eugenio Romano
 
Parse cloud code
Parse cloud codeParse cloud code
Parse cloud code維佋 唐
 
Introduction to Google App Engine with Python
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with Python
Brian Lyttle
 
Griffon Presentation
Griffon PresentationGriffon Presentation
Griffon Presentation
Kelly Robinson
 
GradleFX
GradleFXGradleFX
Integration tests: use the containers, Luke!
Integration tests: use the containers, Luke!Integration tests: use the containers, Luke!
Integration tests: use the containers, Luke!
Roberto Franchini
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
Sadayuki Furuhashi
 
Agile Swift
Agile SwiftAgile Swift
Agile Swift
Godfrey Nolan
 
Java 6 [Mustang] - Features and Enchantments
Java 6 [Mustang] - Features and Enchantments Java 6 [Mustang] - Features and Enchantments
Java 6 [Mustang] - Features and Enchantments
Pavel Kaminsky
 
Cocoapods and Most common used library in Swift
Cocoapods and Most common used library in SwiftCocoapods and Most common used library in Swift
Cocoapods and Most common used library in Swift
Wan Muzaffar Wan Hashim
 
RichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile DevicesRichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile Devices
Pavol Pitoňák
 
SFDX - Spring 2019 Update
SFDX - Spring 2019 UpdateSFDX - Spring 2019 Update
SFDX - Spring 2019 Update
Bohdan Dovhań
 
Developingapiplug insforcs-151112204727-lva1-app6891
Developingapiplug insforcs-151112204727-lva1-app6891Developingapiplug insforcs-151112204727-lva1-app6891
Developingapiplug insforcs-151112204727-lva1-app6891
NetApp
 
Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Tony Frame
 
Google Cloud Platform
Google Cloud Platform Google Cloud Platform
Google Cloud Platform
Francesco Marchitelli
 
OWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA TestersOWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA Testers
Javan Rasokat
 

Similar to JCConf 2016 - Dataflow Workshop Labs (20)

Dropwizard and Friends
Dropwizard and FriendsDropwizard and Friends
Dropwizard and Friends
 
Automated User Tests with Apache Flex
Automated User Tests with Apache FlexAutomated User Tests with Apache Flex
Automated User Tests with Apache Flex
 
Release with confidence
Release with confidenceRelease with confidence
Release with confidence
 
Automated User Tests with Apache Flex
Automated User Tests with Apache FlexAutomated User Tests with Apache Flex
Automated User Tests with Apache Flex
 
Let's play with adf 3.0
Let's play with adf 3.0Let's play with adf 3.0
Let's play with adf 3.0
 
Parse cloud code
Parse cloud codeParse cloud code
Parse cloud code
 
Introduction to Google App Engine with Python
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with Python
 
Griffon Presentation
Griffon PresentationGriffon Presentation
Griffon Presentation
 
GradleFX
GradleFXGradleFX
GradleFX
 
Integration tests: use the containers, Luke!
Integration tests: use the containers, Luke!Integration tests: use the containers, Luke!
Integration tests: use the containers, Luke!
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
Agile Swift
Agile SwiftAgile Swift
Agile Swift
 
Java 6 [Mustang] - Features and Enchantments
Java 6 [Mustang] - Features and Enchantments Java 6 [Mustang] - Features and Enchantments
Java 6 [Mustang] - Features and Enchantments
 
Cocoapods and Most common used library in Swift
Cocoapods and Most common used library in SwiftCocoapods and Most common used library in Swift
Cocoapods and Most common used library in Swift
 
RichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile DevicesRichFaces - Testing on Mobile Devices
RichFaces - Testing on Mobile Devices
 
SFDX - Spring 2019 Update
SFDX - Spring 2019 UpdateSFDX - Spring 2019 Update
SFDX - Spring 2019 Update
 
Developingapiplug insforcs-151112204727-lva1-app6891
Developingapiplug insforcs-151112204727-lva1-app6891Developingapiplug insforcs-151112204727-lva1-app6891
Developingapiplug insforcs-151112204727-lva1-app6891
 
Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01
 
Google Cloud Platform
Google Cloud Platform Google Cloud Platform
Google Cloud Platform
 
OWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA TestersOWASP ZAP Workshop for QA Testers
OWASP ZAP Workshop for QA Testers
 

More from Simon Su

Kubernetes Basic Operation
Kubernetes Basic OperationKubernetes Basic Operation
Kubernetes Basic Operation
Simon Su
 
Google IoT Core 初體驗
Google IoT Core 初體驗Google IoT Core 初體驗
Google IoT Core 初體驗
Simon Su
 
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoTJSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
Simon Su
 
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
Simon Su
 
Brocade - Stingray Application Firewall
Brocade - Stingray Application FirewallBrocade - Stingray Application Firewall
Brocade - Stingray Application Firewall
Simon Su
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News Update
Simon Su
 
IThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsIThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOps
Simon Su
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
Simon Su
 
Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧
Simon Su
 
GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)
Simon Su
 
Google Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile SolutionsGoogle Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile Solutions
Simon Su
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
Simon Su
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
Simon Su
 
GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論
Simon Su
 
GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧
Simon Su
 
CouchDB Getting Start
CouchDB Getting StartCouchDB Getting Start
CouchDB Getting Start
Simon Su
 

More from Simon Su (16)

Kubernetes Basic Operation
Kubernetes Basic OperationKubernetes Basic Operation
Kubernetes Basic Operation
 
Google IoT Core 初體驗
Google IoT Core 初體驗Google IoT Core 初體驗
Google IoT Core 初體驗
 
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoTJSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
JSDC 2017 - 使用google cloud 從雲到端,動手刻個IoT
 
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務GCPUG.TW meetup #28 - GKE上運作您的k8s服務
GCPUG.TW meetup #28 - GKE上運作您的k8s服務
 
Brocade - Stingray Application Firewall
Brocade - Stingray Application FirewallBrocade - Stingray Application Firewall
Brocade - Stingray Application Firewall
 
Google I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News UpdateGoogle I/O 2016 Recap - Google Cloud Platform News Update
Google I/O 2016 Recap - Google Cloud Platform News Update
 
IThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOpsIThome DevOps Summit - IoT、docker與DevOps
IThome DevOps Summit - IoT、docker與DevOps
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
 
Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧Google I/O Extended 2016 - 台北場活動回顧
Google I/O Extended 2016 - 台北場活動回顧
 
GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)GCS - Access Control Lists (中文)
GCS - Access Control Lists (中文)
 
Google Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile SolutionsGoogle Cloud Platform - for Mobile Solutions
Google Cloud Platform - for Mobile Solutions
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(下)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(下)
 
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)JCConf 2015  - 輕鬆學google的雲端開發 - Google App Engine入門(上)
JCConf 2015 - 輕鬆學google的雲端開發 - Google App Engine入門(上)
 
GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論GCPUG.TW - 2016活動討論
GCPUG.TW - 2016活動討論
 
GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧GCPUG.TW - 2015活動回顧
GCPUG.TW - 2015活動回顧
 
CouchDB Getting Start
CouchDB Getting StartCouchDB Getting Start
CouchDB Getting Start
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

JCConf 2016 - Dataflow Workshop Labs

  • 1. JCConf Dataflow Workshop Labs {Simon Su / 20161015} Index Index 1 Lab 1: 準備Dataflow環境,並建置第一個專案 1 建立GCP專案,並安裝Eclipse開發環境 1 安裝Google Cloud SDK 1 啟用Dataflow API 2 建立第一個Dataflow專案 3 執行您的專案 6 Lab 2: 佈署您的第一個專案到Google Cloud Platform 9 準備工作 9 執行佈署 9 檢測執行結果 10 實作Input/Output/Transform等功能 12 Lab 3: 建立Streaming Dataflow 16 建立PubSub topic / subscription 16 佈署Dataflow streaming sample 16 Streaming範例1 16 Streaming範例2 17 從Dashboard監控Dataflow Streaming Task 19 Lab結束後 20 Lab 1: 準備Dataflow環境,並建置第一個專案 建立GCP專案,並安裝Eclipse開發環境 請參考:​JCConf 2016 - Dataflow Workshop行前說明 安裝Google Cloud SDK ● 請參考此URL安裝Cloud SDK:​https://cloud.google.com/sdk/?hl=en_US#download
  • 2. ● 認證Cloud SDK: > gcloud auth login > gcloud auth application-default login ● 設定預設專案 > gcloud config set project <your-project-id> ● 確認安裝 > gcloud config list 啟用Dataflow API 至所屬Project的API Manager項目: 在API Manager Dashboard中點選Enable API: 搜尋Dataflow項目: 將該項目做Enable:
  • 3. 建立第一個Dataflow專案 透過Eclipse Dataflow Wizard可以協助您建立Dataflow的相關專案,步驟如下: Step1: 選擇New > Other... Step2: 選擇Google Cloud Platform > Cloud Dataflow Java Project
  • 4. Step3: 輸入您的專案資訊 Step4: 輸入Google Cloud Platform上的專案ID與Cloud Storage資訊
  • 9. Lab 2: 佈署您的第一個專案到Google Cloud Platform 準備工作 在進行Lab2前的前置工作部分,需要先確認您在Lab1的專案可以正常執行,然後您可以依照您的需 求稍加改動您的專案,測試一下變化... 執行佈署 透過”Run As > Run Configurations...”之項目進入到Run Configurations設定視窗 設定視窗如下: 您可以點選視窗中的”New Launch Configuration”按鈕(下圖紅色標記處)來建立新的Configuration…
  • 10. 本Lab中,新的Configuration有兩個地方需要設定: 1. 設定Main method 2. 設定Pipeline Arguments 檢測執行結果 在執行視窗中,Console會顯示執行的過程,大致結果如下:
  • 13. public static void main(String[] args) { Pipeline p = Pipeline.create( PipelineOptionsFactory.fromArgs(args).withValidation().create()); p.apply(​TextIO.Read.named("sample-book").from("​gs://jcconf2016-dataflow-workshop/sample/book-sample.txt​")​) .apply(ParDo.of(new DoFn<String, String>() { @Override public void processElement(ProcessContext c) { c.output(c.element().toUpperCase()); } })) .apply(ParDo.of(new DoFn<String, Void>() { @Override public void processElement(ProcessContext c) { LOG.info(c.element()); } })); p.run(); } } 進一步修改程式,讓資料輸出到Google Cloud Storage… @SuppressWarnings("serial") public class TestMain { private static final Logger LOG = LoggerFactory.getLogger(TestMain.class); public static void main(String[] args) { Pipeline p = Pipeline.create( PipelineOptionsFactory.fromArgs(args).withValidation().create()); p.apply(​TextIO.Read.named("sample-book").from("​gs://jcconf2016-dataflow-workshop/sample/book-sample.txt​")) .apply(ParDo.of(new DoFn<String, String>() { @Override public void processElement(ProcessContext c) { c.output(c.element().toUpperCase()); } })) .apply(​TextIO.Write.named("output-book").to("​gs://jcconf2016-dataflow-workshop/result/book-sample.txt​")); p.run(); } } 增加Transform Function,將段落字元切割
  • 14. @SuppressWarnings("serial") public class TestMain { private static final Logger LOG = LoggerFactory.getLogger(TestMain.class); public static void main(String[] args) { Pipeline p = Pipeline.create( PipelineOptionsFactory.fromArgs(args).withValidation().create()); p.apply(TextIO.Read.named("sample-book").from("gs://jcconf2016-dataflow-workshop/sample/book-sample.txt")) .apply(​ParDo.of(new DoFn<String, String>() { private final Aggregator<Long, Long> emptyLines = createAggregator("emptyLines", new Sum.SumLongFn()); @Override public void processElement(ProcessContext c) { if (c.element().trim().isEmpty()) { emptyLines.addValue(1L); } // Split the line into words. String[] words = c.element().split("[^a-zA-Z']+"); // Output each word encountered into the output PCollection. for (String word : words) { if (!word.isEmpty()) { c.output(word); } } } })) .apply(TextIO.Write.named("output-book").to("gs://jcconf2016-dataflow-workshop/result/book-sample.txt")); p.run(); } } Word Count Sample - 計算每個文件中單字出現的數量 @SuppressWarnings("serial") public class TestMain {
  • 15. static class MyExtractWordsFn extends DoFn<String, String> { private final Aggregator<Long, Long> emptyLines = createAggregator( "emptyLines", new Sum.SumLongFn()); @Override public void processElement(ProcessContext c) { if (c.element().trim().isEmpty()) { emptyLines.addValue(1L); } // Split the line into words. String[] words = c.element().split("[^a-zA-Z']+"); // Output each word encountered into the output PCollection. for (String word : words) { if (!word.isEmpty()) { c.output(word); } } } } public static class MyCountWords extends PTransform<PCollection<String>, PCollection<KV<String, Long>>> { @Override public PCollection<KV<String, Long>> apply(PCollection<String> lines) { // Convert lines of text into individual words. PCollection<String> words = lines.apply(ParDo.of(new MyExtractWordsFn())); // Count the number of times each word occurs. PCollection<KV<String, Long>> wordCounts = words.apply(Count.<String> perElement()); return wordCounts; } } public static class MyFormatAsTextFn extends DoFn<KV<String, Long>, String> { @Override public void processElement(ProcessContext c) { c.output(c.element().getKey() + ": " + c.element().getValue()); } } private static final Logger LOG = LoggerFactory.getLogger(TestMain.class); public static void main(String[] args) { Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(args) .withValidation().create()); p.apply(TextIO.Read.named("sample-book").from( "gs://jcconf2016-dataflow-workshop/sample/book-sample.txt")) .apply(new MyCountWords()) .apply(ParDo.of(new MyFormatAsTextFn())) .apply(TextIO.Write.named("output-book") .to("gs://jcconf2016-dataflow-workshop/result/book-sample.txt")); p.run(); } }
  • 16. Lab 3: 建立Streaming Dataflow 建立PubSub topic / subscription 建立topic gcloud beta pubsub topics create jcconf2016 建立該topic的subscription gcloud beta pubsub subscriptions create --topic jcconf2016 jcconf2016-sub001 佈署Dataflow streaming sample Streaming範例1 聆聽subscription作為資料輸入,並將資料輸出在LOG中...
  • 17. public static void main(String[] args) { Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class); options.setStreaming(true); Pipeline p = Pipeline.create(options); p.apply(PubsubIO.Read.named("my-pubsub-input") .subscription("projects/sunny-573/subscriptions/jcconf2016-sub001")) .apply(ParDo.of(new DoFn<String, String>() { @Override public void processElement(ProcessContext c) { c.output(c.element().toUpperCase()); } })) .apply(ParDo.of(new DoFn<String, Void>() { @Override public void processElement(ProcessContext c) { LOG.info(c.element()); } })); p.run(); } Streaming範例2 整合Work Count範例,將資料寫入BigQuery的dataset中... /* * Copyright (C) 2015 Google Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); you may not * use this file except in compliance with the License. You may obtain a copy of * the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the * License for the specific language governing permissions and limitations under * the License. */ package com.jcconf2016.demo; import java.util.ArrayList; import java.util.List; import org.joda.time.Duration; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.google.api.services.bigquery.model.TableFieldSchema; import com.google.api.services.bigquery.model.TableReference; import com.google.api.services.bigquery.model.TableRow; import com.google.api.services.bigquery.model.TableSchema; import com.google.cloud.dataflow.sdk.Pipeline; import com.google.cloud.dataflow.sdk.io.BigQueryIO; import com.google.cloud.dataflow.sdk.io.PubsubIO; import com.google.cloud.dataflow.sdk.options.Default; import com.google.cloud.dataflow.sdk.options.Description; import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory; import com.google.cloud.dataflow.sdk.options.StreamingOptions;
  • 18. import com.google.cloud.dataflow.sdk.transforms.DoFn; import com.google.cloud.dataflow.sdk.transforms.ParDo; import com.google.cloud.dataflow.sdk.transforms.windowing.FixedWindows; import com.google.cloud.dataflow.sdk.transforms.windowing.Window; import com.google.cloud.dataflow.sdk.values.KV; import com.google.cloud.dataflow.sdk.values.PCollection; /** * A starter example for writing Google Cloud Dataflow programs. * * <p> * The example takes two strings, converts them to their upper-case * representation and logs them. * * <p> * To run this starter example locally using DirectPipelineRunner, just execute * it without any additional parameters from your favorite development * environment. In Eclipse, this corresponds to the existing 'LOCAL' run * configuration. * * <p> * To run this starter example using managed resource in Google Cloud Platform, * you should specify the following command-line options: * --project=<YOUR_PROJECT_ID> * --stagingLocation=<STAGING_LOCATION_IN_CLOUD_STORAGE> * --runner=BlockingDataflowPipelineRunner In Eclipse, you can just modify the * existing 'SERVICE' run configuration. */ @SuppressWarnings("serial") public class StreamingPipeline { static final int WINDOW_SIZE = 1; // Default window duration in minutes public static interface Options extends StreamingOptions { @Description("Fixed window duration, in minutes") @Default.Integer(WINDOW_SIZE) Integer getWindowSize(); void setWindowSize(Integer value); @Description("Whether to run the pipeline with unbounded input") boolean isUnbounded(); void setUnbounded(boolean value); } private static TableReference getTableReference(Options options) { TableReference tableRef = new TableReference(); tableRef.setProjectId("sunny-573"); tableRef.setDatasetId("jcconf2016"); tableRef.setTableId("pubsub"); return tableRef; } private static TableSchema getSchema() { List<TableFieldSchema> fields = new ArrayList<>(); fields.add(new TableFieldSchema().setName("word").setType("STRING")); fields.add(new TableFieldSchema().setName("count").setType("INTEGER")); fields.add(new TableFieldSchema().setName("window_timestamp").setType( "TIMESTAMP")); TableSchema schema = new TableSchema().setFields(fields); return schema; } static class FormatAsTableRowFn extends DoFn<KV<String, Long>, TableRow> { @Override public void processElement(ProcessContext c) { TableRow row = new TableRow().set("word", c.element().getKey()) .set("count", c.element().getValue())
  • 19. // include a field for the window timestamp .set("window_timestamp", c.timestamp().toString()); c.output(row); } } private static final Logger LOG = LoggerFactory .getLogger(StreamingPipeline.class); public static void main(String[] args) { Options options = PipelineOptionsFactory.fromArgs(args) .withValidation().as(Options.class); options.setStreaming(true); Pipeline p = Pipeline.create(options); PCollection<String> input = p.apply(PubsubIO.Read.topic("projects/sunny-573/topics/jcconf2016")); PCollection<String> windowedWords = input.apply(Window.<String> into(FixedWindows.of(Duration.standardMinutes(options.getWindowSize())))); PCollection<KV<String, Long>> wordCounts = windowedWords.apply(new TestMain.MyCountWords()); wordCounts.apply(ParDo.of(new FormatAsTableRowFn())).apply( BigQueryIO.Write.to(getTableReference(options)).withSchema(getSchema())); p.run(); } } 從Dashboard監控Dataflow Streaming Task 打開GCP Web Console,使用Dataflow Dashboard來檢視每個流程的執行狀況。 並透過Cloud Logging來檢視執行Log…