SlideShare a Scribd company logo
1 of 23
Download to read offline
Our challenge for
Bulkload reliability improvement
Satoshi Akama
July. 14, 2016Treasure Data Tech Talk 201607 #tdtech
×
Satoshi Akama
Embulk plugins
embulk-input-gcs
embulk-input-azure_blob_storage
embulk-output-azure_blob_storage
embulk-output-dynamodb
embulk-output-sftp
Software Engineer (Java/Scala/Ruby)
github.com/sakama/
@oreradio
Treasure Data, Inc.
Topics
Embulk plugin development
Retry! Retry!! Retry!!!
Exception handling
Battle with external service’s specs
Write unit test
Java or JRuby ?
Use embulk at Treasure Data
Integration test
Implement new API endpoint
Infrastructure management
We’re using Embulk as bulkload tool
Pluggable bulkload tool
Released as OSS
We’re using same version of OSS
GUI interface is available
Currently Output side only:)
Data Connector(Import) - CUI
guess/preview/import
$ td connector:guess seed.yml -o load.yml
$ td connector:preview load.yml
$ td connector:issue load.yml —database td_sample_db 
—table td_sample_table
Scheduled execution
$ td connector:create 
daily_import 
“10 5 * * * “ 
td_sample_db 
td_sample_table 
load.yml 
—time-column created_at
GUI will come in the near future
Document and magazines
Official website
Qiita(JP only)
Twitter
http://www.embulk.org/
http://qiita.com/search?q=embulk
#embulk
Plugin development
Retry! Retry! Retry!!!
…
Storage.Objects.Get getObject = client.objects().get(bucket, key);
InputStream stream = getObject.executeMediaAsInputStream();
Embulk(embulk-core) provides RetryExecutor
Almost Official SDK contains retry logic, but not enough.
try {
return retryExecutor()
.withRetryLimit(3)
.withInitialRetryWait(500)
.withMaxRetryWait(30 * 1000)
.runInterruptible(new Retryable<InputStream>() {
@Override
public InputStream call() throws InterruptedIOException, IOException
{
Storage.Objects.Get getObject = client.objects().get(bucket, key);
return getObject.executeMediaAsInputStream();
}
}
} catch (RetryGiveupException ex) {
…
} catch (InterruptedException ex) {}
Fail
Retry with using Exponential Backoff
Java or JRuby ?
Embulk support both of Java and JRuby based plugin
Java based plugin
JRuby based plugin
High performance
Filter / Parser / Formatter / Encoder / Decoder plugin
These plugin need high performance
Some enterprise service/software support provides Java SDK.
write with Java7(MapReduce Executor needs Java7)
Easy to write
Network is bottleneck ( like cloud service).
Exception handling to avoid infinite retry
ConfigException
DataException
transaction method should validate all config values
should throw ConfigException or its subclass when validation fails
public ConfigDiff transaction(ConfigSource config, FileInputPlugin.Control control)
{
…
if (task.getFiles().isEmpty()) {
throw new ConfigException(“File is empty”);
}
}
…
} catch (CsvTokenizer.InvalidFormatException | CsvTokenizer.InvalidValueException … e) {
if (stopOnInvalidRecord) {
throw new DataException(“Invalid record”); // throw Exception if stopOnInvalidRecord : true
}
log.warn(“Invalid record”); // show warnings if stopOnInvalidRecord : false
}
should throw DataException or its subclass when it finds an invalid record
Battle with external service’s specs
Azure Blob Storage
Google Cloud Storage
AWS S3
String path = "/path/to/file";
String str = String.format("%06d", path.length()) + "!" + path + "!"
+ "000028" + "!" + "9999-12-31T23:59:59.9999999Z" + "!";
String encodedString = BaseEncoding.base64().encode(str);
String nextToken = "2" + "!" + encodedString.length + "!" + encodedString;
String path = "/path/to/file"; // use path string as next token
String path = "/path/to/file";
byte[] encoding;
byte[] utf8 = path.getBytes(Charsets.UTF_8);
encoding = new byte[utf8.length + 2];
encoding[0] = 0x0a;
encoding[1] = new Byte(String.valueOf(path.length()));
System.arraycopy(utf8, 0, encoding, 2, utf8.length);
String nextToken = BaseEncoding.base64().encode(encoding);
Example to get next token for object storage.
next token : next start point while getting file list stored at bucket or container.
Write unit test
We need 80% coverage to use at our platform.
But difficult to write test for embulk plugin😞
SFTP :
Create Java based virtual SFTP server at local machine.
DynamoDB :
AWS provides downloadable version of DynamoDB.
Filter/Parser/Formatter/Encoder/Decoder plugin
80% coverage is difficult without connect to service
Set confidential at environmental variables.
Use “Encryption keys” and “Encryption files” at Travis CI.
Connect to remote service for each running test
Unit test without remote connection I’ve ever seen
Use embulk at Treasure Data
Architecture of Treasure Data
Load Balancer
TD API(API Servers)Web Console
td commands
Response
Response
Request
Request
Bulkload API
(API Servers)
Perfect Queue
TD worker
(worker process)
enqueue
dequeue
Submit Job
(Retry if need)
Execute with MR / Local Executor
guess/preview
MySQL
TD API / Bulkload API
TD API(API Servers)
Bulkload API(API Servers)
guess/preview is processed at different API Servers.
ResponseRequest
guess/preview
data import
Perfect Queue
Load Balancer
Queuing
Http Request/Response
guess/preview needs quick response
enqueue
Comes huge data
Embulk Config with thousands of columns
Huge data
Need enough validation at transaction method
Return clear error or warning messages at plugin
Retry logic of plugin is important
Retry if retryable exception happens
use MapReduce Executor
Reduce usage dirrerence at each instance.
Write integration test
Write integration for each connector(result output) with RSpec
td connector:guess(embulk guess) works?
td connector:preview(embulk preview) works?
td connector:issue(embulk run) works expectedly?
works with LocalExecutor?
works with MapReduce Executor?
works with filter plugin?
scheduled execution works expectedly?
for each servicemany test cases ×
Want to improve…
Target service is timeout 😞
Target service returns 50x error 😞
API limit exceeded 😞
CI failure
Long execution time
for each servicemany test cases ×
Want to implement…
API endpoint is not enough
guess
preview
issue(run)
GUI console
CUI
Unclear until user run jobs( or guess or preview)
and plugin return result or ConfigException.
Username and Password is valid?
Want to implement…
input
host
port
username
password
valid?
new endpoint
GUI console
Validate before execute jobs
improve user experience
reduce jobs at our platform
Infrastructure Management
Chef
Monitoring
Datadog
Server configuration
More reliability with MapReduce Executor
incident resolution
PagerDuty
Chef requires not a few time to build server.
Thank you!

More Related Content

What's hot

Patterns and practices for real-world event-driven microservices
Patterns and practices for real-world event-driven microservicesPatterns and practices for real-world event-driven microservices
Patterns and practices for real-world event-driven microservicesRachel Reese
 
Getting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQLGetting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQLMorgan Dedmon
 
Getting to Know Airflow
Getting to Know AirflowGetting to Know Airflow
Getting to Know AirflowRosanne Hoyem
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
 
Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher
 
Confitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
Confitura 2018 — Apache Beam — Promyk Nadziei Data EngineeraConfitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
Confitura 2018 — Apache Beam — Promyk Nadziei Data EngineeraPiotr Wikiel
 
Fast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineFast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineDatabricks
 
Enable IPv6 on Route53 AWS ELB, docker and node App
Enable IPv6 on Route53 AWS ELB, docker and  node AppEnable IPv6 on Route53 AWS ELB, docker and  node App
Enable IPv6 on Route53 AWS ELB, docker and node AppFyllo
 
Heat up your stack
Heat up your stackHeat up your stack
Heat up your stackRico Lin
 
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @ShanghaiLuke Han
 
Programing for problem solving ( airline reservation system)
Programing for problem solving ( airline reservation system)Programing for problem solving ( airline reservation system)
Programing for problem solving ( airline reservation system)Home
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaMark Bittmann
 
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...Flink Forward
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot InstancesAmazon Web Services
 
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Malo Denielou
 
ELK - from zero to coding class hero
ELK - from zero to coding class heroELK - from zero to coding class hero
ELK - from zero to coding class heroJosipKovaek
 
Cowboy dating with big data
Cowboy dating with big data Cowboy dating with big data
Cowboy dating with big data b0ris_1
 

What's hot (20)

Patterns and practices for real-world event-driven microservices
Patterns and practices for real-world event-driven microservicesPatterns and practices for real-world event-driven microservices
Patterns and practices for real-world event-driven microservices
 
Getting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQLGetting started with Apollo Client and GraphQL
Getting started with Apollo Client and GraphQL
 
Getting to Know Airflow
Getting to Know AirflowGetting to Know Airflow
Getting to Know Airflow
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
 
Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#Tamir Dresher - Async Streams in C#
Tamir Dresher - Async Streams in C#
 
Confitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
Confitura 2018 — Apache Beam — Promyk Nadziei Data EngineeraConfitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
Confitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
 
Fast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineFast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL Engine
 
Enable IPv6 on Route53 AWS ELB, docker and node App
Enable IPv6 on Route53 AWS ELB, docker and  node AppEnable IPv6 on Route53 AWS ELB, docker and  node App
Enable IPv6 on Route53 AWS ELB, docker and node App
 
Heat up your stack
Heat up your stackHeat up your stack
Heat up your stack
 
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
Programing for problem solving ( airline reservation system)
Programing for problem solving ( airline reservation system)Programing for problem solving ( airline reservation system)
Programing for problem solving ( airline reservation system)
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
 
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
 
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
 
ELK - from zero to coding class hero
ELK - from zero to coding class heroELK - from zero to coding class hero
ELK - from zero to coding class hero
 
Cowboy dating with big data
Cowboy dating with big data Cowboy dating with big data
Cowboy dating with big data
 
Plproxy
PlproxyPlproxy
Plproxy
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 

Similar to Our challenge for Bulkload reliability improvement

Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsSadayuki Furuhashi
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Provectus
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
 
Federico Feroldi - Scala microservices
Federico Feroldi - Scala microservicesFederico Feroldi - Scala microservices
Federico Feroldi - Scala microservicesScala Italy
 
Understanding Framework Architecture using Eclipse
Understanding Framework Architecture using EclipseUnderstanding Framework Architecture using Eclipse
Understanding Framework Architecture using Eclipseanshunjain
 
A Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIA Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIJörn Guy Süß JGS
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-actionAssaf Gannon
 
Quick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerQuick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerNic Raboy
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?Doug Hawkins
 
Test Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTest Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTobias Trelle
 

Similar to Our challenge for Bulkload reliability improvement (20)

Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Mist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache SparkMist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache Spark
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Cocoa heads 09112017
Cocoa heads 09112017Cocoa heads 09112017
Cocoa heads 09112017
 
JS everywhere 2011
JS everywhere 2011JS everywhere 2011
JS everywhere 2011
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
 
Sqlapi0.1
Sqlapi0.1Sqlapi0.1
Sqlapi0.1
 
Federico Feroldi - Scala microservices
Federico Feroldi - Scala microservicesFederico Feroldi - Scala microservices
Federico Feroldi - Scala microservices
 
Understanding Framework Architecture using Eclipse
Understanding Framework Architecture using EclipseUnderstanding Framework Architecture using Eclipse
Understanding Framework Architecture using Eclipse
 
A Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIA Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its API
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-action
 
Scala at Netflix
Scala at NetflixScala at Netflix
Scala at Netflix
 
Quick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerQuick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase Server
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?
 
Test Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTest Automation for NoSQL Databases
Test Automation for NoSQL Databases
 

Recently uploaded

Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 

Recently uploaded (20)

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 

Our challenge for Bulkload reliability improvement

  • 1. Our challenge for Bulkload reliability improvement Satoshi Akama July. 14, 2016Treasure Data Tech Talk 201607 #tdtech ×
  • 3. Topics Embulk plugin development Retry! Retry!! Retry!!! Exception handling Battle with external service’s specs Write unit test Java or JRuby ? Use embulk at Treasure Data Integration test Implement new API endpoint Infrastructure management
  • 4. We’re using Embulk as bulkload tool Pluggable bulkload tool Released as OSS We’re using same version of OSS
  • 5. GUI interface is available Currently Output side only:)
  • 6. Data Connector(Import) - CUI guess/preview/import $ td connector:guess seed.yml -o load.yml $ td connector:preview load.yml $ td connector:issue load.yml —database td_sample_db —table td_sample_table Scheduled execution $ td connector:create daily_import “10 5 * * * “ td_sample_db td_sample_table load.yml —time-column created_at GUI will come in the near future
  • 7. Document and magazines Official website Qiita(JP only) Twitter http://www.embulk.org/ http://qiita.com/search?q=embulk #embulk
  • 9. Retry! Retry! Retry!!! … Storage.Objects.Get getObject = client.objects().get(bucket, key); InputStream stream = getObject.executeMediaAsInputStream(); Embulk(embulk-core) provides RetryExecutor Almost Official SDK contains retry logic, but not enough. try { return retryExecutor() .withRetryLimit(3) .withInitialRetryWait(500) .withMaxRetryWait(30 * 1000) .runInterruptible(new Retryable<InputStream>() { @Override public InputStream call() throws InterruptedIOException, IOException { Storage.Objects.Get getObject = client.objects().get(bucket, key); return getObject.executeMediaAsInputStream(); } } } catch (RetryGiveupException ex) { … } catch (InterruptedException ex) {} Fail Retry with using Exponential Backoff
  • 10. Java or JRuby ? Embulk support both of Java and JRuby based plugin Java based plugin JRuby based plugin High performance Filter / Parser / Formatter / Encoder / Decoder plugin These plugin need high performance Some enterprise service/software support provides Java SDK. write with Java7(MapReduce Executor needs Java7) Easy to write Network is bottleneck ( like cloud service).
  • 11. Exception handling to avoid infinite retry ConfigException DataException transaction method should validate all config values should throw ConfigException or its subclass when validation fails public ConfigDiff transaction(ConfigSource config, FileInputPlugin.Control control) { … if (task.getFiles().isEmpty()) { throw new ConfigException(“File is empty”); } } … } catch (CsvTokenizer.InvalidFormatException | CsvTokenizer.InvalidValueException … e) { if (stopOnInvalidRecord) { throw new DataException(“Invalid record”); // throw Exception if stopOnInvalidRecord : true } log.warn(“Invalid record”); // show warnings if stopOnInvalidRecord : false } should throw DataException or its subclass when it finds an invalid record
  • 12. Battle with external service’s specs Azure Blob Storage Google Cloud Storage AWS S3 String path = "/path/to/file"; String str = String.format("%06d", path.length()) + "!" + path + "!" + "000028" + "!" + "9999-12-31T23:59:59.9999999Z" + "!"; String encodedString = BaseEncoding.base64().encode(str); String nextToken = "2" + "!" + encodedString.length + "!" + encodedString; String path = "/path/to/file"; // use path string as next token String path = "/path/to/file"; byte[] encoding; byte[] utf8 = path.getBytes(Charsets.UTF_8); encoding = new byte[utf8.length + 2]; encoding[0] = 0x0a; encoding[1] = new Byte(String.valueOf(path.length())); System.arraycopy(utf8, 0, encoding, 2, utf8.length); String nextToken = BaseEncoding.base64().encode(encoding); Example to get next token for object storage. next token : next start point while getting file list stored at bucket or container.
  • 13. Write unit test We need 80% coverage to use at our platform. But difficult to write test for embulk plugin😞 SFTP : Create Java based virtual SFTP server at local machine. DynamoDB : AWS provides downloadable version of DynamoDB. Filter/Parser/Formatter/Encoder/Decoder plugin 80% coverage is difficult without connect to service Set confidential at environmental variables. Use “Encryption keys” and “Encryption files” at Travis CI. Connect to remote service for each running test Unit test without remote connection I’ve ever seen
  • 14. Use embulk at Treasure Data
  • 15. Architecture of Treasure Data Load Balancer TD API(API Servers)Web Console td commands Response Response Request Request Bulkload API (API Servers) Perfect Queue TD worker (worker process) enqueue dequeue Submit Job (Retry if need) Execute with MR / Local Executor guess/preview MySQL
  • 16. TD API / Bulkload API TD API(API Servers) Bulkload API(API Servers) guess/preview is processed at different API Servers. ResponseRequest guess/preview data import Perfect Queue Load Balancer Queuing Http Request/Response guess/preview needs quick response enqueue
  • 17. Comes huge data Embulk Config with thousands of columns Huge data Need enough validation at transaction method Return clear error or warning messages at plugin Retry logic of plugin is important Retry if retryable exception happens use MapReduce Executor Reduce usage dirrerence at each instance.
  • 18. Write integration test Write integration for each connector(result output) with RSpec td connector:guess(embulk guess) works? td connector:preview(embulk preview) works? td connector:issue(embulk run) works expectedly? works with LocalExecutor? works with MapReduce Executor? works with filter plugin? scheduled execution works expectedly? for each servicemany test cases ×
  • 19. Want to improve… Target service is timeout 😞 Target service returns 50x error 😞 API limit exceeded 😞 CI failure Long execution time for each servicemany test cases ×
  • 20. Want to implement… API endpoint is not enough guess preview issue(run) GUI console CUI Unclear until user run jobs( or guess or preview) and plugin return result or ConfigException. Username and Password is valid?
  • 21. Want to implement… input host port username password valid? new endpoint GUI console Validate before execute jobs improve user experience reduce jobs at our platform
  • 22. Infrastructure Management Chef Monitoring Datadog Server configuration More reliability with MapReduce Executor incident resolution PagerDuty Chef requires not a few time to build server.