SlideShare a Scribd company logo
1 of 17
Download to read offline
Embulk at Treasure Data
Satoshi Akama
Dec. 15, 2015
Embulk meetup #2
×
About me…
Satoshi Akama
Embulk plugins
 ・embulk-output-bigquery
 ・embulk-input-gcs
 ・embulk-input-azure_blob_storage
 ・embulk-output-azure_blob_storage
Treasure Data Inc.
Software Engineer (Java/Scala/Ruby)
github.com/sakama/
@oreradio
We are providing Hosted Embulk
Data Connector
(Import)
Result Output
(export)
+
“Data Loading” should not be customer’s work
unless they’re developing ETL tools.
Streaming Import
MySQL
PostgreSQL
Redshift
AWS S3
Google Cloud Storage
SalesForce
Marketo
…etc
MySQL
PostgreSQL
Redshift
BigQuery
…etc
Treasure Data as a Datahub
Schema Less
(Treasure Data)
Something Data Store
(Schema full)
You can create Data Pipeline easily
Various formatted data
・log
・Sensor data(IoT)
・Visualize
・Digital Marketing
Data Connector(Import) - CUI
guess/preview/import
$ td connector:guess seed.yml -o load.yml
$ td connector:preview load.yml
$ td connector:issue load.yml —database td_sample_db 
—table td_sample_table
Scheduled execution
$ td connector:create 
daily_import 
“10 5 * * * “ 
td_sample_db 
td_sample_table 
load.yml 
—time-column created_at
GUI will come in the near future
Result Output(Output) - GUI/CUI
Unchanged OSS Embulk/Embulk plugins
Send pull-request to OSS Embulk
We are using…
We will use at our service after
「いわゆるオープンソースソフトウェアの中で基本機能は無償で公開してコミュニティに任せる、でも機
能を追加したソフトを有償で提供するというモデルは実際にはそんなに上手く行ってないのではないか
と感じています。」-「「Fluentdをきっかけにビジネスが回る仕掛けがとっても気持ちイイです。」 ¦ Think IT(シンクイッ
ト)」 https://thinkit.co.jp/story/2015/07/17/6232
「オープンソースソフトウェアといってもいろいろな開発スタイルがあると思うんですが、fluentdの場
合、僕が所属するトレジャーデータが全面的にバックアップしています。現在は、この開発スタイル「企
業がバックについているけど、開発はオープンに行う」という手法が一番合っていると思います。」
- OSや言語ではなくデータベースを極めたい:グリー技術者が聞いた、fluentdの新機能とTreasure Data古橋氏の野心 (2/3) - @IT
http://www.atmarkit.co.jp/ait/articles/1310/07/news010_2.html
Process to use Embulk plugins at TD
Fix for MapReduce Executor
Write Unit test
Write Integration test
Add Features
Fix for Local Executor
Send Pull-Request to
OSS Embulk or Embulk Plugins
Sorry, this is sorry closed source code
Release as “Data Connector” or ”Result Output”
Process to use Embulk plugins at TD (1)
Fix for MapReduce Executor
Write Unit test
Write Integration test
Add Features
Fix for Local Executor
・Add some features
e.g. add various authentication method.
・Add some fixes
 e.g.
add retry logic
fix error handling
Process to use Embulk plugins at TD (2)
Fix for MapReduce Executor
Write Unit test
Write Integration test
Add Features
Fix for Local Executor
Handling of file path
MR executor could not read local file path(like private key)
Fix authorization logic if need
transaction() and open() method will run at different
instances
Process to use Embulk plugins at TD (3)
Fix for MapReduce Executor
Write Unit test
Write Integration test
Add Features
Fix for Local Executor
Need 80% coverage
By internal rules,
we can’t deploy without 80% coverered unit test.
Write Unit test
Write unit test for Embulk plugin is difficult.
e.g. connect to cloud service…
Process to use Embulk plugins at TD (4)
Fix for MapReduce Executor
Write Unit test
Write Integration test
Add Features
Fix for Local Executor
Write Integration Test for Treasure Data Service
(1) Import data into TD
(2) Send query into Presto, Hive
(3) Check result with local file.
e.g.
Process to use Embulk plugins at TD (5)
Fix for MapReduce Executor
Write Unit test
Write Integration test
Add Features
Fix for Local Executor
Release as “Data Connector” or ”Result Output”
We hope Win-Win relationship
Embulk Community
Use at TD
Core development
Plugin development
Use at your
own environment
Contribute
Embulk Execution Platform at Treasure Data
Load Balancer
TD API(API Servers)Web Console
td commands
td connector:issue
td guess config.yml…
Response
Response
Request
Request
Bulkload API
(API Servers)
Perfect Queue
TD worker
(worker process)
enqueue
dequeue
Submit Job
(Retry if need)
Execute with MR / Local Executor
guess/preview
TD API / Bulkload API
TD API(API Servers)
Bulkload API(API Servers)
guess/preview is processed at different API Servers.
ResponseRequest
guess/preview
data import
Perfect Queue
Load Balancer
Queuing
Http Request/Response
guess/preview needs quick response
enqueue
Problems
Stability of Integration Tests
Execution time of Integration Tests
・Many plugins × Many test cases × Frequent execution
 sometimes causes failure.
・Many plugins × Many test cases causes long execution time:)

More Related Content

What's hot

PostgREST Design Philosophy
PostgREST Design PhilosophyPostgREST Design Philosophy
PostgREST Design Philosophybegriffs
 
using Mithril.js + postgREST to build and consume API's
using Mithril.js + postgREST to build and consume API'susing Mithril.js + postgREST to build and consume API's
using Mithril.js + postgREST to build and consume API'sAntônio Roberto Silva
 
Cachopo - Scalable Stateful Services - Madrid Elixir Meetup
Cachopo - Scalable Stateful Services - Madrid Elixir MeetupCachopo - Scalable Stateful Services - Madrid Elixir Meetup
Cachopo - Scalable Stateful Services - Madrid Elixir MeetupAbel Muíño
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Sadayuki Furuhashi
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 
A Tour of PostgREST
A Tour of PostgRESTA Tour of PostgREST
A Tour of PostgRESTbegriffs
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And GroovyKen Kousen
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsSadayuki Furuhashi
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaYevgeniy Brikman
 
Digdag Updates 2020 July
Digdag Updates 2020 JulyDigdag Updates 2020 July
Digdag Updates 2020 JulyYou Yamagata
 
Heat optimization
Heat optimizationHeat optimization
Heat optimizationRico Lin
 
Phoenix for Rails Devs
Phoenix for Rails DevsPhoenix for Rails Devs
Phoenix for Rails DevsDiacode
 
What's New in v2 - AnsibleFest London 2015
What's New in v2 - AnsibleFest London 2015What's New in v2 - AnsibleFest London 2015
What's New in v2 - AnsibleFest London 2015jimi-c
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scalaStratio
 
A Brief Introduce to WSGI
A Brief Introduce to WSGIA Brief Introduce to WSGI
A Brief Introduce to WSGIMingli Yuan
 
The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015craig lehmann
 
Managing Your Cisco Datacenter Network with Ansible
Managing Your Cisco Datacenter Network with AnsibleManaging Your Cisco Datacenter Network with Ansible
Managing Your Cisco Datacenter Network with Ansiblefmaccioni
 

What's hot (20)

Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
PostgREST Design Philosophy
PostgREST Design PhilosophyPostgREST Design Philosophy
PostgREST Design Philosophy
 
using Mithril.js + postgREST to build and consume API's
using Mithril.js + postgREST to build and consume API'susing Mithril.js + postgREST to build and consume API's
using Mithril.js + postgREST to build and consume API's
 
Cachopo - Scalable Stateful Services - Madrid Elixir Meetup
Cachopo - Scalable Stateful Services - Madrid Elixir MeetupCachopo - Scalable Stateful Services - Madrid Elixir Meetup
Cachopo - Scalable Stateful Services - Madrid Elixir Meetup
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
A Tour of PostgREST
A Tour of PostgRESTA Tour of PostgREST
A Tour of PostgREST
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And Groovy
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
Cyansible
CyansibleCyansible
Cyansible
 
Bosh 2.0
Bosh 2.0Bosh 2.0
Bosh 2.0
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and Scala
 
Digdag Updates 2020 July
Digdag Updates 2020 JulyDigdag Updates 2020 July
Digdag Updates 2020 July
 
Heat optimization
Heat optimizationHeat optimization
Heat optimization
 
Phoenix for Rails Devs
Phoenix for Rails DevsPhoenix for Rails Devs
Phoenix for Rails Devs
 
What's New in v2 - AnsibleFest London 2015
What's New in v2 - AnsibleFest London 2015What's New in v2 - AnsibleFest London 2015
What's New in v2 - AnsibleFest London 2015
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
A Brief Introduce to WSGI
A Brief Introduce to WSGIA Brief Introduce to WSGI
A Brief Introduce to WSGI
 
The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015
 
Managing Your Cisco Datacenter Network with Ansible
Managing Your Cisco Datacenter Network with AnsibleManaging Your Cisco Datacenter Network with Ansible
Managing Your Cisco Datacenter Network with Ansible
 

Similar to Embulk at Treasure Data

Our challenge for Bulkload reliability improvement
Our challenge for Bulkload reliability  improvementOur challenge for Bulkload reliability  improvement
Our challenge for Bulkload reliability improvementSatoshi Akama
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Erwin de Kreuk
 
The 90-Day Startup with Google AppEngine for Java
The 90-Day Startup with Google AppEngine for JavaThe 90-Day Startup with Google AppEngine for Java
The 90-Day Startup with Google AppEngine for JavaDavid Chandler
 
Yaetos_Meetup_SparkBCN_v1.pdf
Yaetos_Meetup_SparkBCN_v1.pdfYaetos_Meetup_SparkBCN_v1.pdf
Yaetos_Meetup_SparkBCN_v1.pdfprevota
 
Google App Engine for Java
Google App Engine for JavaGoogle App Engine for Java
Google App Engine for JavaLars Vogel
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overviewprevota
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my dataAndrejs Vorobjovs
 
Ado.Net Data Services (Astoria)
Ado.Net Data Services (Astoria)Ado.Net Data Services (Astoria)
Ado.Net Data Services (Astoria)Igor Moochnick
 
Synapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineCalvin French-Owen
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeTrivadis
 
닷넷 개발자를 위한 패턴이야기
닷넷 개발자를 위한 패턴이야기닷넷 개발자를 위한 패턴이야기
닷넷 개발자를 위한 패턴이야기YoungSu Son
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowDaniel Zivkovic
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Servicesukdpe
 
Content migration for sitecore
Content migration for sitecoreContent migration for sitecore
Content migration for sitecoreSurendra Sharma
 

Similar to Embulk at Treasure Data (20)

Our challenge for Bulkload reliability improvement
Our challenge for Bulkload reliability  improvementOur challenge for Bulkload reliability  improvement
Our challenge for Bulkload reliability improvement
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...
 
The 90-Day Startup with Google AppEngine for Java
The 90-Day Startup with Google AppEngine for JavaThe 90-Day Startup with Google AppEngine for Java
The 90-Day Startup with Google AppEngine for Java
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Yaetos_Meetup_SparkBCN_v1.pdf
Yaetos_Meetup_SparkBCN_v1.pdfYaetos_Meetup_SparkBCN_v1.pdf
Yaetos_Meetup_SparkBCN_v1.pdf
 
Google App Engine for Java
Google App Engine for JavaGoogle App Engine for Java
Google App Engine for Java
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overview
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 
Ado.Net Data Services (Astoria)
Ado.Net Data Services (Astoria)Ado.Net Data Services (Astoria)
Ado.Net Data Services (Astoria)
 
Dataflow.pptx
Dataflow.pptxDataflow.pptx
Dataflow.pptx
 
Synapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipeline
 
USQ Landdemos Azure Data Lake
USQ Landdemos Azure Data LakeUSQ Landdemos Azure Data Lake
USQ Landdemos Azure Data Lake
 
닷넷 개발자를 위한 패턴이야기
닷넷 개발자를 위한 패턴이야기닷넷 개발자를 위한 패턴이야기
닷넷 개발자를 위한 패턴이야기
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Services
 
Content migration for sitecore
Content migration for sitecoreContent migration for sitecore
Content migration for sitecore
 

Recently uploaded

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 

Recently uploaded (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 

Embulk at Treasure Data

  • 1. Embulk at Treasure Data Satoshi Akama Dec. 15, 2015 Embulk meetup #2 ×
  • 2. About me… Satoshi Akama Embulk plugins  ・embulk-output-bigquery  ・embulk-input-gcs  ・embulk-input-azure_blob_storage  ・embulk-output-azure_blob_storage Treasure Data Inc. Software Engineer (Java/Scala/Ruby) github.com/sakama/ @oreradio
  • 3. We are providing Hosted Embulk Data Connector (Import) Result Output (export) + “Data Loading” should not be customer’s work unless they’re developing ETL tools. Streaming Import MySQL PostgreSQL Redshift AWS S3 Google Cloud Storage SalesForce Marketo …etc MySQL PostgreSQL Redshift BigQuery …etc
  • 4. Treasure Data as a Datahub Schema Less (Treasure Data) Something Data Store (Schema full) You can create Data Pipeline easily Various formatted data ・log ・Sensor data(IoT) ・Visualize ・Digital Marketing
  • 5. Data Connector(Import) - CUI guess/preview/import $ td connector:guess seed.yml -o load.yml $ td connector:preview load.yml $ td connector:issue load.yml —database td_sample_db —table td_sample_table Scheduled execution $ td connector:create daily_import “10 5 * * * “ td_sample_db td_sample_table load.yml —time-column created_at GUI will come in the near future
  • 7. Unchanged OSS Embulk/Embulk plugins Send pull-request to OSS Embulk We are using… We will use at our service after 「いわゆるオープンソースソフトウェアの中で基本機能は無償で公開してコミュニティに任せる、でも機 能を追加したソフトを有償で提供するというモデルは実際にはそんなに上手く行ってないのではないか と感じています。」-「「Fluentdをきっかけにビジネスが回る仕掛けがとっても気持ちイイです。」 ¦ Think IT(シンクイッ ト)」 https://thinkit.co.jp/story/2015/07/17/6232 「オープンソースソフトウェアといってもいろいろな開発スタイルがあると思うんですが、fluentdの場 合、僕が所属するトレジャーデータが全面的にバックアップしています。現在は、この開発スタイル「企 業がバックについているけど、開発はオープンに行う」という手法が一番合っていると思います。」 - OSや言語ではなくデータベースを極めたい:グリー技術者が聞いた、fluentdの新機能とTreasure Data古橋氏の野心 (2/3) - @IT http://www.atmarkit.co.jp/ait/articles/1310/07/news010_2.html
  • 8. Process to use Embulk plugins at TD Fix for MapReduce Executor Write Unit test Write Integration test Add Features Fix for Local Executor Send Pull-Request to OSS Embulk or Embulk Plugins Sorry, this is sorry closed source code Release as “Data Connector” or ”Result Output”
  • 9. Process to use Embulk plugins at TD (1) Fix for MapReduce Executor Write Unit test Write Integration test Add Features Fix for Local Executor ・Add some features e.g. add various authentication method. ・Add some fixes  e.g. add retry logic fix error handling
  • 10. Process to use Embulk plugins at TD (2) Fix for MapReduce Executor Write Unit test Write Integration test Add Features Fix for Local Executor Handling of file path MR executor could not read local file path(like private key) Fix authorization logic if need transaction() and open() method will run at different instances
  • 11. Process to use Embulk plugins at TD (3) Fix for MapReduce Executor Write Unit test Write Integration test Add Features Fix for Local Executor Need 80% coverage By internal rules, we can’t deploy without 80% coverered unit test. Write Unit test Write unit test for Embulk plugin is difficult. e.g. connect to cloud service…
  • 12. Process to use Embulk plugins at TD (4) Fix for MapReduce Executor Write Unit test Write Integration test Add Features Fix for Local Executor Write Integration Test for Treasure Data Service (1) Import data into TD (2) Send query into Presto, Hive (3) Check result with local file. e.g.
  • 13. Process to use Embulk plugins at TD (5) Fix for MapReduce Executor Write Unit test Write Integration test Add Features Fix for Local Executor Release as “Data Connector” or ”Result Output”
  • 14. We hope Win-Win relationship Embulk Community Use at TD Core development Plugin development Use at your own environment Contribute
  • 15. Embulk Execution Platform at Treasure Data Load Balancer TD API(API Servers)Web Console td commands td connector:issue td guess config.yml… Response Response Request Request Bulkload API (API Servers) Perfect Queue TD worker (worker process) enqueue dequeue Submit Job (Retry if need) Execute with MR / Local Executor guess/preview
  • 16. TD API / Bulkload API TD API(API Servers) Bulkload API(API Servers) guess/preview is processed at different API Servers. ResponseRequest guess/preview data import Perfect Queue Load Balancer Queuing Http Request/Response guess/preview needs quick response enqueue
  • 17. Problems Stability of Integration Tests Execution time of Integration Tests ・Many plugins × Many test cases × Frequent execution  sometimes causes failure. ・Many plugins × Many test cases causes long execution time:)