SlideShare a Scribd company logo
1 of 26
Download to read offline
Technologies,
Data Analytics Service
and Enterprise Businesses
SENDAI IT COMMUNE #2
2018-01-09
Satoshi Tagomori (@tagomoris)
Treasure Data, Inc.
Satoshi Tagomori (@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, Woothee, ...
Treasure Data, Inc.
Retry-able Failures or Not
Idempotent Operations: (冪等な操作)
べきとう
Technologies
Data Analytics Service
Enterprise Business
Technologies
↓
Data Analytics Service
↓
Enterprise Business
Enterprise Business ?
• Many different definitions and discussions about "Enterprise"... :(
• MY DEFINITION IN THIS TALK:



"Businesses NOT about IT"

• Thus, most of businesses are "Enterprise", everywhere, not only in Tokyo
Data Analytics Service ?
• Provides ways to know:
• How many people are reaching our products?
• How many times are they seeing our advertisements?
• And how many times do they buy our products?
• When are they use our products?
• When did they buy our products?
• Where did they buy our products?
• ...
• Something helps our business using data
Data Analytics Service

for Enterprise Business ?
• Something helps "Business not about IT", using data (IT)
• Staffs (using data analytics service) doesn't know about IT
• and also don't take care about IT
• but "need" result of analytics
• Everyone are checking report about yesterday at 10:00 AM
• We need results before 10:00AM
• 10:10 AM is too late, but 2:00 AM is too early...
Deadline and Retries
Big Job: Power 1
10:00AM00:00AM 05:30AM01:00AM
Big Job: Power 1
Crash! Delay...
Big Job: Power 2
Big Job: Power 2
Crash! OK!
Small Jobs: Power 1
Small Jobs: Power 1
Crash! OK!
Missions of Data Analytics Service
for Enterprise Business
Fast "enough"
Cheap "enough"
Stable
Easy to use "enough"
Technologies for Data Analytics Service
• Data Management System
• Distributed Processing System
• Queue and Scheduler
• Connecting Systems and Services
• Controlling Jobs, Tasks and Workflows
• Managing Retries
Data Management Systems
• Data Collecting Systems
• Fluentd, Embulk, ...
• Distributed Database and Storage
• Storing data in efficient format (MPC1, MessagePack columnar format)
• Managing index
• Managing schema
• Providing transactional operations
Distributed Processing System
• Running Analytics Queries
• MapReduce engines: Hadoop + Hive
• MPP (Massive Parallel Processing systems): Presto
• Running Data Management Jobs
• Converting data formats, re-index, detecting schema, ...
• Computing Resource Management
• Customer queries (and internal use) must be separated!
Queue and Scheduler
• Queuing Queries
• Allow to enqueue queries, run these next-to-next
Power 1
Customer
Request
• Scheduling Queries
• Run queries when it's ok to run
Data for Queries
01:00AM 03:00AM
Connecting Systems and Services
• Non-"connected" Data Analytics Service
Ultra Super Great
Analytics Service
Database
Query
Result
Not "easy enough"
Connecting Systems and Services
• Data Analytics Service MUST be "connected"
Treasure Data
Database
Query
Result
Control Jobs/Tasks
• A Job needs results of other jobs
"Risky"
Time based schedule
A,B,C -> D,E -> F
01:00AM
03:10AM ?
03:30AM
06:30AM ?
07:00AM 10:00AM
"Risky"
Time based schedule
A,B,C -> D,E -> F
01:00AM
Crash!
03:30AM
Oops, No Data...
10:00AM
• "Risk" for failures
07:00AM
Oops, No Data...
08:15AM ?
Control Jobs/Tasks
• A Job needs results of other jobs
Time based schedule
A,B,C -> D,E -> F
01:00AM
03:10AM ?
06:00AM
08:30AM ?
11:00AM ???
• "Time based schedule" needs
• Wide space for retries
• Big resource for fast results (not cheap!)
Space for Retries Space for Retries
Control Jobs/Tasks
• Workflow pattern
Workflow execution
A,B,C -> D,E -> F
01:00AM
07:15AM ?
10:00AMWorkflow control barriers
Workflow execution
A,B,C -> D,E -> F
01:00AM 10:00AMWorkflow control barriers
• Workflow pattern with retries
Crash!
Retries !!!!!!!!!!!!!!!!!!!!!!!!
Retry-able Failures or Not
• "Retry-able Failures"
• Crash of compute nodes
• Communication errors
• Service down of "connected" services
• ...
• Non-"Retry-able Failures"
• SQL syntax error
• Missing data sources / Missing tables
• Wrong API key of "connected" services
• ...
Table B
Table B
Retry-able Operations ?
• For example.... :
• Run Query A
• Append result of A into B
• Count rows of B
• Failures?:
• Run Query A
• Append result of A into B ... (Failed!)
• Retry Query A
• Retry to append result of A into B
• Count rows of B
1
2
3
4
1
2
1
2
3
4
Idempotent Operations
• "Idempotent" (冪等である) operation
• can get "same" result when it's executed twice or more
べきとう
Table B
1
2
3
4
• Idempotent Operation:
• Run Query A
• "Replace" table B with result of A
• Count rows of B
Table B
1
2
Replay-able Data Analytics Workflow
• Need to do many "try-and-error"
• w/ updated queries
• w/ updated data...
• Idempotent operations makes workflow "Replay-able"
• Fast try-and-error (PDCA!) cycles
• → Fast business growth!
Enterprise Business
❤
Technologies
Thank you!
@tagomoris

More Related Content

What's hot

Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookTreasure Data, Inc.
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_casewyukawa
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableShu Ting Tseng
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceSATOSHI TAGOMORI
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_enOgibayashi
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupWojciech Biela
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloudQubole
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Treasure Data, Inc.
 

What's hot (20)

Presto
PrestoPresto
Presto
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Norikra Recent Updates
Norikra Recent UpdatesNorikra Recent Updates
Norikra Recent Updates
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
 

Similar to Technologies, Data Analytics Service and Enterprise Business

Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...uptime software
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - PublicJulian Payne
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesSATOSHI TAGOMORI
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructureSimon Belak
 
Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017Jim Adcock
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
7 Trends in Time & Expense In 7 Minutes
7 Trends in Time & Expense In 7 Minutes7 Trends in Time & Expense In 7 Minutes
7 Trends in Time & Expense In 7 MinutesDATABASICS
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Bernardo Najlis
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Movin on Up - ScarePoint Friday Cincinnati 2016
Movin on Up - ScarePoint Friday Cincinnati 2016Movin on Up - ScarePoint Friday Cincinnati 2016
Movin on Up - ScarePoint Friday Cincinnati 2016Jim Adcock
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017BrandonWilhelm4
 
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...Chris McNulty
 
SKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BISKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BISequelGate
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talkKrishna Gade
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven ProductsDataWorks Summit
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015Bipin Singh
 

Similar to Technologies, Data Analytics Service and Enterprise Business (20)

Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - Public
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
7 Trends in Time & Expense In 7 Minutes
7 Trends in Time & Expense In 7 Minutes7 Trends in Time & Expense In 7 Minutes
7 Trends in Time & Expense In 7 Minutes
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Movin on Up - ScarePoint Friday Cincinnati 2016
Movin on Up - ScarePoint Friday Cincinnati 2016Movin on Up - ScarePoint Friday Cincinnati 2016
Movin on Up - ScarePoint Friday Cincinnati 2016
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
 
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
 
SKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BISKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BI
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talk
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
 

More from SATOSHI TAGOMORI

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speedSATOSHI TAGOMORI
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsSATOSHI TAGOMORI
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of RubySATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)SATOSHI TAGOMORI
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script ConfusingSATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubySATOSHI TAGOMORI
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsSATOSHI TAGOMORI
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the WorldSATOSHI TAGOMORI
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In RubySATOSHI TAGOMORI
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldSATOSHI TAGOMORI
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"SATOSHI TAGOMORI
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsSATOSHI TAGOMORI
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsSATOSHI TAGOMORI
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading RoleSATOSHI TAGOMORI
 

More from SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading Role
 

Recently uploaded

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Technologies, Data Analytics Service and Enterprise Business

  • 1. Technologies, Data Analytics Service and Enterprise Businesses SENDAI IT COMMUNE #2 2018-01-09 Satoshi Tagomori (@tagomoris) Treasure Data, Inc.
  • 2. Satoshi Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, Woothee, ... Treasure Data, Inc.
  • 3.
  • 4. Retry-able Failures or Not Idempotent Operations: (冪等な操作) べきとう
  • 7. Enterprise Business ? • Many different definitions and discussions about "Enterprise"... :( • MY DEFINITION IN THIS TALK:
 
 "Businesses NOT about IT"
 • Thus, most of businesses are "Enterprise", everywhere, not only in Tokyo
  • 8. Data Analytics Service ? • Provides ways to know: • How many people are reaching our products? • How many times are they seeing our advertisements? • And how many times do they buy our products? • When are they use our products? • When did they buy our products? • Where did they buy our products? • ... • Something helps our business using data
  • 9. Data Analytics Service
 for Enterprise Business ? • Something helps "Business not about IT", using data (IT) • Staffs (using data analytics service) doesn't know about IT • and also don't take care about IT • but "need" result of analytics • Everyone are checking report about yesterday at 10:00 AM • We need results before 10:00AM • 10:10 AM is too late, but 2:00 AM is too early...
  • 10. Deadline and Retries Big Job: Power 1 10:00AM00:00AM 05:30AM01:00AM Big Job: Power 1 Crash! Delay... Big Job: Power 2 Big Job: Power 2 Crash! OK! Small Jobs: Power 1 Small Jobs: Power 1 Crash! OK!
  • 11. Missions of Data Analytics Service for Enterprise Business Fast "enough" Cheap "enough" Stable Easy to use "enough"
  • 12. Technologies for Data Analytics Service • Data Management System • Distributed Processing System • Queue and Scheduler • Connecting Systems and Services • Controlling Jobs, Tasks and Workflows • Managing Retries
  • 13. Data Management Systems • Data Collecting Systems • Fluentd, Embulk, ... • Distributed Database and Storage • Storing data in efficient format (MPC1, MessagePack columnar format) • Managing index • Managing schema • Providing transactional operations
  • 14. Distributed Processing System • Running Analytics Queries • MapReduce engines: Hadoop + Hive • MPP (Massive Parallel Processing systems): Presto • Running Data Management Jobs • Converting data formats, re-index, detecting schema, ... • Computing Resource Management • Customer queries (and internal use) must be separated!
  • 15. Queue and Scheduler • Queuing Queries • Allow to enqueue queries, run these next-to-next Power 1 Customer Request • Scheduling Queries • Run queries when it's ok to run Data for Queries 01:00AM 03:00AM
  • 16. Connecting Systems and Services • Non-"connected" Data Analytics Service Ultra Super Great Analytics Service Database Query Result Not "easy enough"
  • 17. Connecting Systems and Services • Data Analytics Service MUST be "connected" Treasure Data Database Query Result
  • 18. Control Jobs/Tasks • A Job needs results of other jobs "Risky" Time based schedule A,B,C -> D,E -> F 01:00AM 03:10AM ? 03:30AM 06:30AM ? 07:00AM 10:00AM "Risky" Time based schedule A,B,C -> D,E -> F 01:00AM Crash! 03:30AM Oops, No Data... 10:00AM • "Risk" for failures 07:00AM Oops, No Data... 08:15AM ?
  • 19. Control Jobs/Tasks • A Job needs results of other jobs Time based schedule A,B,C -> D,E -> F 01:00AM 03:10AM ? 06:00AM 08:30AM ? 11:00AM ??? • "Time based schedule" needs • Wide space for retries • Big resource for fast results (not cheap!) Space for Retries Space for Retries
  • 20. Control Jobs/Tasks • Workflow pattern Workflow execution A,B,C -> D,E -> F 01:00AM 07:15AM ? 10:00AMWorkflow control barriers Workflow execution A,B,C -> D,E -> F 01:00AM 10:00AMWorkflow control barriers • Workflow pattern with retries Crash!
  • 22. Retry-able Failures or Not • "Retry-able Failures" • Crash of compute nodes • Communication errors • Service down of "connected" services • ... • Non-"Retry-able Failures" • SQL syntax error • Missing data sources / Missing tables • Wrong API key of "connected" services • ...
  • 23. Table B Table B Retry-able Operations ? • For example.... : • Run Query A • Append result of A into B • Count rows of B • Failures?: • Run Query A • Append result of A into B ... (Failed!) • Retry Query A • Retry to append result of A into B • Count rows of B 1 2 3 4 1 2 1 2 3 4
  • 24. Idempotent Operations • "Idempotent" (冪等である) operation • can get "same" result when it's executed twice or more べきとう Table B 1 2 3 4 • Idempotent Operation: • Run Query A • "Replace" table B with result of A • Count rows of B Table B 1 2
  • 25. Replay-able Data Analytics Workflow • Need to do many "try-and-error" • w/ updated queries • w/ updated data... • Idempotent operations makes workflow "Replay-able" • Fast try-and-error (PDCA!) cycles • → Fast business growth!