SlideShare a Scribd company logo
1 of 33
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rob Parrish, Director, Product Management, Treasure Data
Sadayuki Furuhashi, Founder & Software Architect, Treasure Data
November 29, 2016
DEV401
Automating Workflows
for Analytics Pipelines
Live Data Management Platform
Treasure Data Background
Key Technology Users and Global Enterprise Customers
Founded in 2011 - Headquartered in Silicon Valley (Mountain View, CA)
Global Team: USA, Japan, Korea, India
Innovator in the Data and Analytics OSS Community
Fluentd | Fluent-bit | Embulk | MessagePack | Hivemall | Presto
+
The Challenge: Managing Data Across Multiple Components
Processing steps managed via CRON
Data transfer require smart retrying
Some processing should only start once data is
available
Other processing flows involve 100s of steps
Collaboration is hard, because logic is kept in scripts
It’s particularly hard when data engineers & analysts
try to collaborate
Amazon
S3
Amazon
Redshift
Amazon
EMR
Amazon
Aurora
Workflows
Common Steps of Modern Data Processing Workflows
Ingest
Application logs
User attribute data
Ad impressions
3rd-party cookie data
Enrich
Removing bot
access
Geo location from
IP address
Parsing User-Agent
JOIN user
attributes to event
logs
Model
A/B Testing
Funnel analysis
Segmentation
analysis
Machine learning
Load
Creating indexes
Data partitioning
Data compression
Statistics
collection
Utilize
Recommendation API
Real-time ad bidding
Visualize using BI
applications
Ingest UtilizeEnrich Model Load
Operationalize eCommerce Product Recommendations
Ingest UtilizeEnrich Model Load
Amazon
S3
Amazon
Redshift
Amazon
EMR
Amazon
Aurora
Solution: Build a Workflow Tool Everyone can Leverage
Data Prep
Cohort Analysis
Attribution Analysis
Web &
Product Logs
Packlink is an online platform providing cost-effective package
delivery services in Europe & Internationally.
They use Digdag to manage their analytic workflows that power insights that allow Sales,
Marketing, and their Partners to operate more effectively – helping their business to grow.
“Using Digdag, I now feel confident in my ability to manage complex analytic
flows. From ETL processes for transferring data, to analytic steps for running
attribution or cohort analysis, to deploying those insights back into the cloud
systems my company uses to run our business.
It’s enabled us to get out refreshes of these insights more timely for our analytic
consumers - sales, marketing, and the executive suite. I now can feel confident
each night that our analysis will be completed as expected.”
Pierre Bèvillard
Head of Business Intelligence
Unified log collection infrastructure Plugin-based ETL tool
Sadayuki Furuhashi
A founder of Treasure Data.
An open-source hacker.
github: @frsyuki
It’s like JSON, but fast and small
Why Digdag?
Bringing Best Practices of Software Development
No change logs
Tight coupling to
server environment
Hard to maintain
- Lock-in the system
- No ways to verify the
results
- Hard to rollback
- No one understands
the scripts
- All flat custom scripts
- Messy dependencies
- No one understands
the scripts
Commit Histories
Deploy Anywhere
Pull-Requests
& Unit Tests
- Independent from someone’s
machine
- Easy to reproduce the
same results again
- Easy to know why results
are changing
- Everyone can track
the changes
- Collaboration on the code
& across the workflows
- Keep the results trusted
Encourage Use of Application Development Best Practices
Task GroupingParameterized
Modules
Rather than one giant unwieldy script,
break queries into manageable, well-
identified modules to aid in collaboration,
updates and maintenance.
- From bird’s eye to details
- redshift>
- emr>
- …
Enable query writers to specify
dependencies easily, without having to
slog through hundreds of lines of code to
make a change.
Automated
Validation
- Verify results between steps
Automate validation of intermediate data
to encourage testing of data results over
time. As data changes, we can ensure we
know. We can keep the results always
trusted.
Unite Engineering
& Analytic Teams
Powerful for Engineers
Our goal is to make it feasible for our most advanced
users to take advantage of engineering teams to
manage using their favorite tools (e.g. git).
Friendly for Analysts
While, also making the definition file straight forward
enough for a wider range of analysts to leverage &
use
_export:
td:
database: workflow_temp
+task1:
td>: queries/daily_open.sql
create_table: daily_open
+task2:
td>: queries/monthly_open.sql
create_table: monthly_open
Workflow Constructs
Operators
Standard libraries
redshift>: runs Amazon Redshift queries
emr>: create/shutdowns a cluster & runs steps
s3_wait>: waits until a file is put on S3
pg>: runs PostgreSQL queries
td>: runs Treasure Data queries
td_for_each>: repeats task for result rows
mail>: sends an email
Open-source libraries
You can release & use open-source operator libraries.
+wait_for_arrival:
s3_wait>: |
bucket/www_${date}.csv
+load_table:
redshift>: scripts/copy.sql
Scripting operators
Scripting operators
sh>: runs a Shell script
py>: runs a Python method
rb>: runs a Ruby method
Docker option
docker:
image: ubuntu:16.04
Digdag supports Docker natively.
Easy to use data analytics tools.
Reproducible anywhere.
+run_custom_script:
sh>: scripts/custom_work.sh
+run_python_in_docker:
py>: Analysis.item_recommends
docker:
image: ubuntu:16.04
Loops and parameters
Parameter
A task can propagate parameters to following
tasks
Loop
Generate subtasks dynamically so that Digdag
applies the same set of operators to different
data sets.
+send_email_to_active_users:
td_for_each>: list_active.sql
_do:
+send:
email>: tempalte.txt
to: ${td.for_each.addr}
(+send tasks are dynamically generated)
Parallel execution
+load_data:
_parallel: true
+load_users:
redshift>: copy/users.sql
+load_items:
redshift>: copy/items.sql
Parallel execution
Tasks under a same group run in parallel if
_parallel option is set to true.
Workflow Steps
Ingest UtilizeEnrich Model Load
+task
+task
+task
+task +task
+task +task
+task
+task
+task +task +task
Organizing tasks using groups
Ingest UtilizeEnrich Model Load
+ingest
+enrich
+task +task
+model
+basket_analysis
+task +task
+learn
+load
+task +task+tasks
+task
Bringing Best Practices of Software Development
No change
logs
Tight coupling
to server
environment
Hard
to maintain
- Lock-in the system
- No ways to verify the
results
- Hard to rollback
- No one understands
the scripts
- All flat custom scripts
- Messy dependencies
- No one understands
the scripts
Commit
Histories
Deploy
Anywhere
Pull-Requests
& Unit Tests
- Independent from someone’s
machine
- Easy to reproduce the
same results again
- Easy to know why results
are changing
- Everyone can track
the changes
- Collaboration on the code
& across the workflows
- Keep the results trusted
Demo
Operationalize eCommerce Product Recommendations
Ingest UtilizeEnrich Model Load
Amazon
S3
Amazon
Redshift
Amazon
EMR
Amazon
Aurora
to the demo
Conclusion
Scheduling Query Result Output
Loading Bulk Data
ETL Process Management
Presto Analytic Queries
AWS System Processing
Digdag Supports our Customers
Built to Handle Production Workloads
Maintaining 24/7 Uptime
With the complexities of the modern data stack, what we need is
Continuous Data Integration.
Managing Cloud Infrastructure
It’s not easy! The lessons we learn are always applied to our
OSS for the good of the community.
Handling over 100 Billion Queries
Ensuring robust operation with scale is a huge issue for us.
Thank you!
www.digdag.io
www.treasuredata.com
Visit our booth #1818 for more info,
and for VIP wristbands to our party at TAO tonight!
Remember to complete
your evaluations!

More Related Content

What's hot

AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAmazon Web Services Korea
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料Amazon Web Services
 
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...Amazon Web Services
 
AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...
AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...
AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...Amazon Web Services
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Amazon Web Services
 
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...Amazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesm vaishnavi
 
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWSAWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWSAmazon Web Services
 
AWS re:Invent 2016| HLC302 | AWS Infrastructure for a Global Population Healt...
AWS re:Invent 2016| HLC302 | AWS Infrastructure for a Global Population Healt...AWS re:Invent 2016| HLC302 | AWS Infrastructure for a Global Population Healt...
AWS re:Invent 2016| HLC302 | AWS Infrastructure for a Global Population Healt...Amazon Web Services
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakAmazon Web Services
 
Data Processing without Servers | AWS Public Sector Summit 2016
Data Processing without Servers | AWS Public Sector Summit 2016Data Processing without Servers | AWS Public Sector Summit 2016
Data Processing without Servers | AWS Public Sector Summit 2016Amazon Web Services
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...Amazon Web Services
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable InsightsSession Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable InsightsAmazon Web Services
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...Amazon Web Services
 
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017Amazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 

What's hot (20)

AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料
 
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
 
AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...
AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...
AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
AWS re:Invent 2016: Accelerating the Transition to Broadcast and OTT Infrastr...
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWSAWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
 
AWS re:Invent 2016| HLC302 | AWS Infrastructure for a Global Population Healt...
AWS re:Invent 2016| HLC302 | AWS Infrastructure for a Global Population Healt...AWS re:Invent 2016| HLC302 | AWS Infrastructure for a Global Population Healt...
AWS re:Invent 2016| HLC302 | AWS Infrastructure for a Global Population Healt...
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
Data Processing without Servers | AWS Public Sector Summit 2016
Data Processing without Servers | AWS Public Sector Summit 2016Data Processing without Servers | AWS Public Sector Summit 2016
Data Processing without Servers | AWS Public Sector Summit 2016
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
 
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable InsightsSession Sponsored by Tableau: Transforming Data Into Valuable Insights
Session Sponsored by Tableau: Transforming Data Into Valuable Insights
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
 
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 

Viewers also liked

スタディサプリを支えるデータ分析基盤 ~設計の勘所と利活用事例~
スタディサプリを支えるデータ分析基盤 ~設計の勘所と利活用事例~スタディサプリを支えるデータ分析基盤 ~設計の勘所と利活用事例~
スタディサプリを支えるデータ分析基盤 ~設計の勘所と利活用事例~Tetsuo Yamabe
 
Jenkins 2.0 Pipeline & Blue Ocean
Jenkins 2.0 Pipeline & Blue OceanJenkins 2.0 Pipeline & Blue Ocean
Jenkins 2.0 Pipeline & Blue OceanAkihiko Horiuchi
 
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11Sadayuki Furuhashi
 
DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?Sadayuki Furuhashi
 
EmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤とEmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤とToru Takahashi
 
Solaris 11.2 What's New
Solaris 11.2 What's NewSolaris 11.2 What's New
Solaris 11.2 What's NewOrgad Kimchi
 
In-Database Analyticsの必要性と可能性
In-Database Analyticsの必要性と可能性In-Database Analyticsの必要性と可能性
In-Database Analyticsの必要性と可能性Satoshi Nagayasu
 
The role of Design Thinking
The role of Design ThinkingThe role of Design Thinking
The role of Design ThinkingPieter Baert
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
YAML is the new Eval
YAML is the new EvalYAML is the new Eval
YAML is the new Evalarnebrasseur
 
My ambariexperience
My ambariexperienceMy ambariexperience
My ambariexperiencewyukawa
 
OSS監視ツールSensuの紹介
OSS監視ツールSensuの紹介OSS監視ツールSensuの紹介
OSS監視ツールSensuの紹介Akihiko Horiuchi
 
Building AWS Redshift Data Warehouse with Matillion and Tableau
Building AWS Redshift Data Warehouse with Matillion and TableauBuilding AWS Redshift Data Warehouse with Matillion and Tableau
Building AWS Redshift Data Warehouse with Matillion and TableauLynn Langit
 
Prometheus london
Prometheus londonPrometheus london
Prometheus londonwyukawa
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2wyukawa
 
Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24wyukawa
 

Viewers also liked (20)

Azkaban
AzkabanAzkaban
Azkaban
 
スタディサプリを支えるデータ分析基盤 ~設計の勘所と利活用事例~
スタディサプリを支えるデータ分析基盤 ~設計の勘所と利活用事例~スタディサプリを支えるデータ分析基盤 ~設計の勘所と利活用事例~
スタディサプリを支えるデータ分析基盤 ~設計の勘所と利活用事例~
 
Jenkins 2.0 Pipeline & Blue Ocean
Jenkins 2.0 Pipeline & Blue OceanJenkins 2.0 Pipeline & Blue Ocean
Jenkins 2.0 Pipeline & Blue Ocean
 
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
 
DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?
 
EmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤とEmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤と
 
Solaris 11.2 What's New
Solaris 11.2 What's NewSolaris 11.2 What's New
Solaris 11.2 What's New
 
In-Database Analyticsの必要性と可能性
In-Database Analyticsの必要性と可能性In-Database Analyticsの必要性と可能性
In-Database Analyticsの必要性と可能性
 
The role of Design Thinking
The role of Design ThinkingThe role of Design Thinking
The role of Design Thinking
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Yaml
YamlYaml
Yaml
 
YAML is the new Eval
YAML is the new EvalYAML is the new Eval
YAML is the new Eval
 
B.COM.PDF
B.COM.PDFB.COM.PDF
B.COM.PDF
 
DDoS対策の自動化
DDoS対策の自動化DDoS対策の自動化
DDoS対策の自動化
 
My ambariexperience
My ambariexperienceMy ambariexperience
My ambariexperience
 
OSS監視ツールSensuの紹介
OSS監視ツールSensuの紹介OSS監視ツールSensuの紹介
OSS監視ツールSensuの紹介
 
Building AWS Redshift Data Warehouse with Matillion and Tableau
Building AWS Redshift Data Warehouse with Matillion and TableauBuilding AWS Redshift Data Warehouse with Matillion and Tableau
Building AWS Redshift Data Warehouse with Matillion and Tableau
 
Prometheus london
Prometheus londonPrometheus london
Prometheus london
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2
 
Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24Upgrading from-hdp-21-to-hdp-24
Upgrading from-hdp-21-to-hdp-24
 

Similar to AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)

Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageJulien Le Dem
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Neotys_Partner
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAmazon Web Services
 
Informatica 5+years of experince
Informatica 5+years of experinceInformatica 5+years of experince
Informatica 5+years of experinceDharma Rao
 
Informatica_5+years of experince
Informatica_5+years of experinceInformatica_5+years of experince
Informatica_5+years of experinceDharma Rao
 
Informatica 5+years of experince
Informatica 5+years of experinceInformatica 5+years of experince
Informatica 5+years of experinceDharma Rao
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopRTTS
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionAmazon Web Services
 
hari_duche_updated
hari_duche_updatedhari_duche_updated
hari_duche_updatedHari Duche
 

Similar to AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401) (20)

Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Path to continuous delivery
Path to continuous deliveryPath to continuous delivery
Path to continuous delivery
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
North east user group tour
North east user group tourNorth east user group tour
North east user group tour
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
 
Informatica 5+years of experince
Informatica 5+years of experinceInformatica 5+years of experince
Informatica 5+years of experince
 
Informatica_5+years of experince
Informatica_5+years of experinceInformatica_5+years of experince
Informatica_5+years of experince
 
Informatica 5+years of experince
Informatica 5+years of experinceInformatica 5+years of experince
Informatica 5+years of experince
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
hari_duche_updated
hari_duche_updatedhari_duche_updated
hari_duche_updated
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rob Parrish, Director, Product Management, Treasure Data Sadayuki Furuhashi, Founder & Software Architect, Treasure Data November 29, 2016 DEV401 Automating Workflows for Analytics Pipelines
  • 2.
  • 4. Treasure Data Background Key Technology Users and Global Enterprise Customers Founded in 2011 - Headquartered in Silicon Valley (Mountain View, CA) Global Team: USA, Japan, Korea, India Innovator in the Data and Analytics OSS Community Fluentd | Fluent-bit | Embulk | MessagePack | Hivemall | Presto
  • 5. +
  • 6. The Challenge: Managing Data Across Multiple Components Processing steps managed via CRON Data transfer require smart retrying Some processing should only start once data is available Other processing flows involve 100s of steps Collaboration is hard, because logic is kept in scripts It’s particularly hard when data engineers & analysts try to collaborate Amazon S3 Amazon Redshift Amazon EMR Amazon Aurora
  • 8. Common Steps of Modern Data Processing Workflows Ingest Application logs User attribute data Ad impressions 3rd-party cookie data Enrich Removing bot access Geo location from IP address Parsing User-Agent JOIN user attributes to event logs Model A/B Testing Funnel analysis Segmentation analysis Machine learning Load Creating indexes Data partitioning Data compression Statistics collection Utilize Recommendation API Real-time ad bidding Visualize using BI applications Ingest UtilizeEnrich Model Load
  • 9. Operationalize eCommerce Product Recommendations Ingest UtilizeEnrich Model Load Amazon S3 Amazon Redshift Amazon EMR Amazon Aurora
  • 10. Solution: Build a Workflow Tool Everyone can Leverage
  • 11. Data Prep Cohort Analysis Attribution Analysis Web & Product Logs Packlink is an online platform providing cost-effective package delivery services in Europe & Internationally. They use Digdag to manage their analytic workflows that power insights that allow Sales, Marketing, and their Partners to operate more effectively – helping their business to grow.
  • 12. “Using Digdag, I now feel confident in my ability to manage complex analytic flows. From ETL processes for transferring data, to analytic steps for running attribution or cohort analysis, to deploying those insights back into the cloud systems my company uses to run our business. It’s enabled us to get out refreshes of these insights more timely for our analytic consumers - sales, marketing, and the executive suite. I now can feel confident each night that our analysis will be completed as expected.” Pierre Bèvillard Head of Business Intelligence
  • 13. Unified log collection infrastructure Plugin-based ETL tool Sadayuki Furuhashi A founder of Treasure Data. An open-source hacker. github: @frsyuki It’s like JSON, but fast and small
  • 15. Bringing Best Practices of Software Development No change logs Tight coupling to server environment Hard to maintain - Lock-in the system - No ways to verify the results - Hard to rollback - No one understands the scripts - All flat custom scripts - Messy dependencies - No one understands the scripts Commit Histories Deploy Anywhere Pull-Requests & Unit Tests - Independent from someone’s machine - Easy to reproduce the same results again - Easy to know why results are changing - Everyone can track the changes - Collaboration on the code & across the workflows - Keep the results trusted
  • 16. Encourage Use of Application Development Best Practices Task GroupingParameterized Modules Rather than one giant unwieldy script, break queries into manageable, well- identified modules to aid in collaboration, updates and maintenance. - From bird’s eye to details - redshift> - emr> - … Enable query writers to specify dependencies easily, without having to slog through hundreds of lines of code to make a change. Automated Validation - Verify results between steps Automate validation of intermediate data to encourage testing of data results over time. As data changes, we can ensure we know. We can keep the results always trusted.
  • 17. Unite Engineering & Analytic Teams Powerful for Engineers Our goal is to make it feasible for our most advanced users to take advantage of engineering teams to manage using their favorite tools (e.g. git). Friendly for Analysts While, also making the definition file straight forward enough for a wider range of analysts to leverage & use _export: td: database: workflow_temp +task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open
  • 19. Operators Standard libraries redshift>: runs Amazon Redshift queries emr>: create/shutdowns a cluster & runs steps s3_wait>: waits until a file is put on S3 pg>: runs PostgreSQL queries td>: runs Treasure Data queries td_for_each>: repeats task for result rows mail>: sends an email Open-source libraries You can release & use open-source operator libraries. +wait_for_arrival: s3_wait>: | bucket/www_${date}.csv +load_table: redshift>: scripts/copy.sql
  • 20. Scripting operators Scripting operators sh>: runs a Shell script py>: runs a Python method rb>: runs a Ruby method Docker option docker: image: ubuntu:16.04 Digdag supports Docker natively. Easy to use data analytics tools. Reproducible anywhere. +run_custom_script: sh>: scripts/custom_work.sh +run_python_in_docker: py>: Analysis.item_recommends docker: image: ubuntu:16.04
  • 21. Loops and parameters Parameter A task can propagate parameters to following tasks Loop Generate subtasks dynamically so that Digdag applies the same set of operators to different data sets. +send_email_to_active_users: td_for_each>: list_active.sql _do: +send: email>: tempalte.txt to: ${td.for_each.addr} (+send tasks are dynamically generated)
  • 22. Parallel execution +load_data: _parallel: true +load_users: redshift>: copy/users.sql +load_items: redshift>: copy/items.sql Parallel execution Tasks under a same group run in parallel if _parallel option is set to true.
  • 23. Workflow Steps Ingest UtilizeEnrich Model Load +task +task +task +task +task +task +task +task +task +task +task +task
  • 24. Organizing tasks using groups Ingest UtilizeEnrich Model Load +ingest +enrich +task +task +model +basket_analysis +task +task +learn +load +task +task+tasks +task
  • 25. Bringing Best Practices of Software Development No change logs Tight coupling to server environment Hard to maintain - Lock-in the system - No ways to verify the results - Hard to rollback - No one understands the scripts - All flat custom scripts - Messy dependencies - No one understands the scripts Commit Histories Deploy Anywhere Pull-Requests & Unit Tests - Independent from someone’s machine - Easy to reproduce the same results again - Easy to know why results are changing - Everyone can track the changes - Collaboration on the code & across the workflows - Keep the results trusted
  • 26. Demo
  • 27. Operationalize eCommerce Product Recommendations Ingest UtilizeEnrich Model Load Amazon S3 Amazon Redshift Amazon EMR Amazon Aurora
  • 30. Scheduling Query Result Output Loading Bulk Data ETL Process Management Presto Analytic Queries AWS System Processing Digdag Supports our Customers
  • 31. Built to Handle Production Workloads Maintaining 24/7 Uptime With the complexities of the modern data stack, what we need is Continuous Data Integration. Managing Cloud Infrastructure It’s not easy! The lessons we learn are always applied to our OSS for the good of the community. Handling over 100 Billion Queries Ensuring robust operation with scale is a huge issue for us.
  • 32. Thank you! www.digdag.io www.treasuredata.com Visit our booth #1818 for more info, and for VIP wristbands to our party at TAO tonight!