Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Treasure Data
Living in the cloud
Mitsunori Komatsu
About me
• Mitsunori Komatsu, 

Software engineer @ Treasure Data.
• Presto, Hive, PlazmaDB, td-android-sdk, 

td-ios-sdk,...
Today’s talk
• What's Treasure Data?
• System architecture
• Provisioning & Deployment
What’s Treasure Data?
What’s Treasure Data?
and more…
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On...
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With mul...
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With mul...
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With mul...
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With mul...
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log...
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log...
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log...
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log...
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log...
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log...
Integrations?
Treasure Data in figures
Some of our customers
System architecture
Architecture in Treasure Data
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Po...
Architecture in Treasure Data
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Po...
Architecture in Treasure Data
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Po...
Architecture in Treasure Data
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Po...
Architecture in Treasure Data
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Po...
Architecture in Treasure Data
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Po...
Architecture in Treasure Data
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Po...
Architecture in Treasure Data
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Po...
Architecture in Treasure Data
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Po...
Streaming import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td import
worker
process
(Ruby +
Java)
plazmadb
(PostgreS...
Streaming import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td import
worker
process
(Ruby +
Java)
plazmadb
(PostgreS...
Streaming import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td import
worker
process
(Ruby +
Java)
plazmadb
(PostgreS...
Streaming import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td import
worker
process
(Ruby +
Java)
plazmadb
(PostgreS...
Streaming import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td import
worker
process
(Ruby +
Java)
plazmadb
(PostgreS...
Streaming import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td import
worker
process
(Ruby +
Java)
plazmadb
(PostgreS...
Streaming import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td import
worker
process
(Ruby +
Java)
plazmadb
(PostgreS...
Bulk import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/Riak...
Bulk import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/Riak...
Bulk import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/Riak...
Bulk import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/Riak...
Bulk import
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/Riak...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Query
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(PostgreSQL +
S3/RiakCS)
Pr...
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log...
Query with Result output
worker queue
(MySQL)
api server
(Ruby on
Rails)
td worker
process
(Ruby +
Java)
plazmadb
(Postgre...
Schema on read
time code method user_id
2015-06-01 10:07:11 200 GET
2015-06-01 10:10:12 “200” GET
2015-06-01 10:10:20 200 ...
Schema on read
time code method user_id
2015-06-01 10:07:11 200 GET
2015-06-01 10:10:12 “200” GET
2015-06-01 10:10:20 200 ...
Columnar file format
time code method user_id
2015-06-01 10:07:11 200 GET
2015-06-01 10:10:12 “200” GET
2015-06-01 10:10:20...
Provisioning & Deployment
How we deploy our application?
• Chef server (chef-io)
• Using deploy_revision provider
• Github
• Some repos are private
...
Provisioning sequence
server on cloudDev
create server
attach volume
attach IP address
bootstrap as chef node
chef-client
...
Deployment sequence
Dev
git push
knife cookbook upload
knife data bag from file
knife environment from file
chef-client (fet...
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
...
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
...
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
...
Analytics Infrastructure, Simplified in the Cloud
We’re hiring! https://jobs.lever.co/treasure-data
Upcoming SlideShare
Loading in …5
×

Treasuredata living in the cloud

837 views

Published on

This presentation is about how Treasure Data works in the cloud

Published in: Software
  • Be the first to comment

Treasuredata living in the cloud

  1. 1. Treasure Data Living in the cloud Mitsunori Komatsu
  2. 2. About me • Mitsunori Komatsu, 
 Software engineer @ Treasure Data. • Presto, Hive, PlazmaDB, td-android-sdk, 
 td-ios-sdk, Mobile SDK backend,
 embedded-sdk • github:komamitsu,
 msgpack-java committer, Presto contributor, etc…
  3. 3. Today’s talk • What's Treasure Data? • System architecture • Provisioning & Deployment
  4. 4. What’s Treasure Data?
  5. 5. What’s Treasure Data? and more… • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage Founded in 2011 in the U.S. Over 85 employees
  6. 6. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  7. 7. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  8. 8. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  9. 9. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  10. 10. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  11. 11. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  12. 12. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  13. 13. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  14. 14. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  15. 15. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  16. 16. Integrations?
  17. 17. Treasure Data in figures
  18. 18. Some of our customers
  19. 19. System architecture
  20. 20. Architecture in Treasure Data worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto xxxx bucket (S3/RiakCS) Hive Pig With - td-agent (fluentd) - bulk-import tool - td command mobile SDK backend With - android-sdk - ios-sdk - unity-sdk Streaming Import Bulk Import Query Processing
  21. 21. Architecture in Treasure Data worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto xxxx bucket (S3/RiakCS) Hive Pig With - td-agent (fluentd) - bulk-import tool - td command mobile SDK backend With - android-sdk - ios-sdk - unity-sdk Streaming Import Bulk Import Query Processing
  22. 22. Architecture in Treasure Data worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto xxxx bucket (S3/RiakCS) Hive Pig With - td-agent (fluentd) - bulk-import tool - td command mobile SDK backend With - android-sdk - ios-sdk - unity-sdk Streaming Import Bulk Import Query Processing
  23. 23. Architecture in Treasure Data worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto xxxx bucket (S3/RiakCS) Hive Pig With - td-agent (fluentd) - bulk-import tool - td command mobile SDK backend With - android-sdk - ios-sdk - unity-sdk Streaming Import Bulk Import Query Processing
  24. 24. Architecture in Treasure Data worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto xxxx bucket (S3/RiakCS) Hive Pig With - td-agent (fluentd) - bulk-import tool - td command mobile SDK backend With - android-sdk - ios-sdk - unity-sdk Streaming Import Bulk Import Query Processing
  25. 25. Architecture in Treasure Data worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto xxxx bucket (S3/RiakCS) Hive Pig With - td-agent (fluentd) - bulk-import tool - td command mobile SDK backend With - android-sdk - ios-sdk - unity-sdk Streaming Import Bulk Import Query Processing
  26. 26. Architecture in Treasure Data worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto xxxx bucket (S3/RiakCS) Hive Pig With - td-agent (fluentd) - bulk-import tool - td command mobile SDK backend With - android-sdk - ios-sdk - unity-sdk Streaming Import Bulk Import Query Processing
  27. 27. Architecture in Treasure Data worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto xxxx bucket (S3/RiakCS) Hive Pig With - td-agent (fluentd) - bulk-import tool - td command mobile SDK backend With - android-sdk - ios-sdk - unity-sdk Streaming Import Bulk Import Query Processing
  28. 28. Architecture in Treasure Data worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto xxxx bucket (S3/RiakCS) Hive Pig With - td-agent (fluentd) - bulk-import tool - td command mobile SDK backend With - android-sdk - ios-sdk - unity-sdk Streaming Import Bulk Import Query Processing
  29. 29. Streaming import worker queue (MySQL) api server (Ruby on Rails) td import worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) import bucket (S3/RiakCS) With - td-agent (fluentd) {name: “komamitsu”, age: 25, languages: [“Java”, “OCaml”]}, {…}, …. mobile SDK backend With - android-sdk - ios-sdk - unity-sdk msgpack.gz Exactly once
  30. 30. Streaming import worker queue (MySQL) api server (Ruby on Rails) td import worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) With - td-agent (fluentd) {name: “komamitsu”, age: 25, languages: [“Java”, “OCaml”]}, {…}, …. mobile SDK backend With - android-sdk - ios-sdk - unity-sdk import bucket (S3/RiakCS) msgpack.gz
  31. 31. Streaming import worker queue (MySQL) api server (Ruby on Rails) td import worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) With - td-agent (fluentd) {name: “komamitsu”, age: 25, languages: [“Java”, “OCaml”]}, {…}, …. mobile SDK backend With - android-sdk - ios-sdk - unity-sdk import bucket (S3/RiakCS) msgpack.gz
  32. 32. Streaming import worker queue (MySQL) api server (Ruby on Rails) td import worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) With - td-agent (fluentd) {name: “komamitsu”, age: 25, languages: [“Java”, “OCaml”]}, {…}, …. mobile SDK backend With - android-sdk - ios-sdk - unity-sdk import bucket (S3/RiakCS) msgpack.gz
  33. 33. Streaming import worker queue (MySQL) api server (Ruby on Rails) td import worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) With - td-agent (fluentd) {name: “komamitsu”, age: 25, languages: [“Java”, “OCaml”]}, {…}, …. mobile SDK backend With - android-sdk - ios-sdk - unity-sdk import bucket (S3/RiakCS) msgpack.gz
  34. 34. Streaming import worker queue (MySQL) api server (Ruby on Rails) td import worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) With - td-agent (fluentd) {name: “komamitsu”, age: 25, languages: [“Java”, “OCaml”]}, {…}, …. mobile SDK backend With - android-sdk - ios-sdk - unity-sdk import bucket (S3/RiakCS) msgpack.gz Convert & Upload
  35. 35. Streaming import worker queue (MySQL) api server (Ruby on Rails) td import worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) With - td-agent (fluentd) {name: “komamitsu”, age: 25, languages: [“Java”, “OCaml”]}, {…}, …. mobile SDK backend With - android-sdk - ios-sdk - unity-sdk import bucket (S3/RiakCS) msgpack.gz
  36. 36. Bulk import worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) bulk import bucket (S3/RiakCS) With - bulk-import tool accesses_001.msgpack.gz accesses_002.msgpack.gz accesses_003.msgpack.gz : Create temp table &
 Import all records into it commit Move records from temp table to dest table delete Remove temp table perform (MR)
  37. 37. Bulk import worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) With - bulk-import tool accesses_001.msgpack.gz accesses_002.msgpack.gz accesses_003.msgpack.gz : Create temp table &
 Import all records into it commit Move records from temp table to dest table delete Remove temp table bulk import bucket (S3/RiakCS) perform (MR)
  38. 38. Bulk import worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) With - bulk-import tool accesses_001.msgpack.gz accesses_002.msgpack.gz accesses_003.msgpack.gz : Create temp table &
 Import all records into it commit Move records from temp table to dest table delete Remove temp table bulk import bucket (S3/RiakCS) perform (MR)
  39. 39. Bulk import worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) With - bulk-import tool accesses_001.msgpack.gz accesses_002.msgpack.gz accesses_003.msgpack.gz : Create temp table &
 Import all records into it commit Move records from temp table to dest table delete Remove temp table bulk import bucket (S3/RiakCS) perform (MR)
  40. 40. Bulk import worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) bulk import bucket (S3/RiakCS) With - bulk-import tool accesses_001.msgpack.gz accesses_002.msgpack.gz accesses_003.msgpack.gz : perform (MR) Create temp table &
 Import all records into it commit Move records from temp table to dest table delete Remove temp table
  41. 41. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto result bucket (S3/RiakCS) Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey
  42. 42. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey result bucket (S3/RiakCS)
  43. 43. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey result bucket (S3/RiakCS)
  44. 44. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey result bucket (S3/RiakCS) Decide the priority based on Users’ price plan Users’ resource usage
  45. 45. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey result bucket (S3/RiakCS)
  46. 46. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey result bucket (S3/RiakCS) Scan
  47. 47. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey result bucket (S3/RiakCS)
  48. 48. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey result bucket (S3/RiakCS)
  49. 49. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey result bucket (S3/RiakCS)
  50. 50. Query worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey result bucket (S3/RiakCS)
  51. 51. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Query with Result output
  52. 52. Query with Result output worker queue (MySQL) api server (Ruby on Rails) td worker process (Ruby + Java) plazmadb (PostgreSQL + S3/RiakCS) Presto Hive Pig With - td command select c.nationkey, count(1) from orders o join customer c on o.custkey = c.custkey where o.orderpriority = ‘1-URGENT' group by c.nationkey nationkey count 24 123456 12 98765 : : result bucket (S3/RiakCS)
  53. 53. Schema on read time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table User added a new column “user_id” in imported data User can select this column with only adding it to the schema (w/o reconstruct the table) Schema on read
  54. 54. Schema on read time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table User added a new column “user_id” in imported data User can select this column with only adding it to the schema (w/o reconstruct the table) Schema on read
  55. 55. Columnar file format time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table time code method user_id Columnar file format This query accesses only code column 
 select code, count(1) from tbl group by code
  56. 56. Provisioning & Deployment
  57. 57. How we deploy our application? • Chef server (chef-io) • Using deploy_revision provider • Github • Some repos are private • Blue-Green deployment (Presto)
  58. 58. Provisioning sequence server on cloudDev create server attach volume attach IP address bootstrap as chef node chef-client (install packages install recipes : ) (deploy application)
  59. 59. Deployment sequence Dev git push knife cookbook upload knife data bag from file knife environment from file chef-client (fetch files & execute recipes) git pull (triggered by deploy_revision provider) restart application cron cron server on cloud
  60. 60. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rc result bucket (S3/RiakCS)
  61. 61. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3/RiakCS) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rcTest Test Test!
  62. 62. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production! result bucket (S3/RiakCS) Release stable cluster No downtime
  63. 63. Analytics Infrastructure, Simplified in the Cloud We’re hiring! https://jobs.lever.co/treasure-data

×