Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Technologies, Data Analytics Service and Enterprise Business

3,810 views

Published on

Talk at Sendai IT Commune #2

Published in: Software
  • Be the first to comment

Technologies, Data Analytics Service and Enterprise Business

  1. 1. Technologies, Data Analytics Service and Enterprise Businesses SENDAI IT COMMUNE #2 2018-01-09 Satoshi Tagomori (@tagomoris) Treasure Data, Inc.
  2. 2. Satoshi Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, Woothee, ... Treasure Data, Inc.
  3. 3. Retry-able Failures or Not Idempotent Operations: (冪等な操作) べきとう
  4. 4. Technologies Data Analytics Service Enterprise Business
  5. 5. Technologies ↓ Data Analytics Service ↓ Enterprise Business
  6. 6. Enterprise Business ? • Many different definitions and discussions about "Enterprise"... :( • MY DEFINITION IN THIS TALK:
 
 "Businesses NOT about IT"
 • Thus, most of businesses are "Enterprise", everywhere, not only in Tokyo
  7. 7. Data Analytics Service ? • Provides ways to know: • How many people are reaching our products? • How many times are they seeing our advertisements? • And how many times do they buy our products? • When are they use our products? • When did they buy our products? • Where did they buy our products? • ... • Something helps our business using data
  8. 8. Data Analytics Service
 for Enterprise Business ? • Something helps "Business not about IT", using data (IT) • Staffs (using data analytics service) doesn't know about IT • and also don't take care about IT • but "need" result of analytics • Everyone are checking report about yesterday at 10:00 AM • We need results before 10:00AM • 10:10 AM is too late, but 2:00 AM is too early...
  9. 9. Deadline and Retries Big Job: Power 1 10:00AM00:00AM 05:30AM01:00AM Big Job: Power 1 Crash! Delay... Big Job: Power 2 Big Job: Power 2 Crash! OK! Small Jobs: Power 1 Small Jobs: Power 1 Crash! OK!
  10. 10. Missions of Data Analytics Service for Enterprise Business Fast "enough" Cheap "enough" Stable Easy to use "enough"
  11. 11. Technologies for Data Analytics Service • Data Management System • Distributed Processing System • Queue and Scheduler • Connecting Systems and Services • Controlling Jobs, Tasks and Workflows • Managing Retries
  12. 12. Data Management Systems • Data Collecting Systems • Fluentd, Embulk, ... • Distributed Database and Storage • Storing data in efficient format (MPC1, MessagePack columnar format) • Managing index • Managing schema • Providing transactional operations
  13. 13. Distributed Processing System • Running Analytics Queries • MapReduce engines: Hadoop + Hive • MPP (Massive Parallel Processing systems): Presto • Running Data Management Jobs • Converting data formats, re-index, detecting schema, ... • Computing Resource Management • Customer queries (and internal use) must be separated!
  14. 14. Queue and Scheduler • Queuing Queries • Allow to enqueue queries, run these next-to-next Power 1 Customer Request • Scheduling Queries • Run queries when it's ok to run Data for Queries 01:00AM 03:00AM
  15. 15. Connecting Systems and Services • Non-"connected" Data Analytics Service Ultra Super Great Analytics Service Database Query Result Not "easy enough"
  16. 16. Connecting Systems and Services • Data Analytics Service MUST be "connected" Treasure Data Database Query Result
  17. 17. Control Jobs/Tasks • A Job needs results of other jobs "Risky" Time based schedule A,B,C -> D,E -> F 01:00AM 03:10AM ? 03:30AM 06:30AM ? 07:00AM 10:00AM "Risky" Time based schedule A,B,C -> D,E -> F 01:00AM Crash! 03:30AM Oops, No Data... 10:00AM • "Risk" for failures 07:00AM Oops, No Data... 08:15AM ?
  18. 18. Control Jobs/Tasks • A Job needs results of other jobs Time based schedule A,B,C -> D,E -> F 01:00AM 03:10AM ? 06:00AM 08:30AM ? 11:00AM ??? • "Time based schedule" needs • Wide space for retries • Big resource for fast results (not cheap!) Space for Retries Space for Retries
  19. 19. Control Jobs/Tasks • Workflow pattern Workflow execution A,B,C -> D,E -> F 01:00AM 07:15AM ? 10:00AMWorkflow control barriers Workflow execution A,B,C -> D,E -> F 01:00AM 10:00AMWorkflow control barriers • Workflow pattern with retries Crash!
  20. 20. Retries !!!!!!!!!!!!!!!!!!!!!!!!
  21. 21. Retry-able Failures or Not • "Retry-able Failures" • Crash of compute nodes • Communication errors • Service down of "connected" services • ... • Non-"Retry-able Failures" • SQL syntax error • Missing data sources / Missing tables • Wrong API key of "connected" services • ...
  22. 22. Table B Table B Retry-able Operations ? • For example.... : • Run Query A • Append result of A into B • Count rows of B • Failures?: • Run Query A • Append result of A into B ... (Failed!) • Retry Query A • Retry to append result of A into B • Count rows of B 1 2 3 4 1 2 1 2 3 4
  23. 23. Idempotent Operations • "Idempotent" (冪等である) operation • can get "same" result when it's executed twice or more べきとう Table B 1 2 3 4 • Idempotent Operation: • Run Query A • "Replace" table B with result of A • Count rows of B Table B 1 2
  24. 24. Replay-able Data Analytics Workflow • Need to do many "try-and-error" • w/ updated queries • w/ updated data... • Idempotent operations makes workflow "Replay-able" • Fast try-and-error (PDCA!) cycles • → Fast business growth!
  25. 25. Enterprise Business ❤ Technologies Thank you! @tagomoris

×