Collecting app metricsin decentralized systemsDecision making based on factsSadayuki FuruhashiTreasuare Data, Inc.Founder ...
Self-introduction>   Sadayuki Furuhashi>   Treasure Data, Inc.    Founder & Software Architect>   Open source projects    ...
What’s our service?What’s the problems we faced?How did we solve them?          My TalkWhat did we learn?We open sourced t...
What’s Treasure Data?Treasure Data provides cloud-based data warehouseas a service.
Treasure Data Service Architecture                                                open sourced   Apache   App             ...
Example Use Case – MySQL to TDhundreds of app servers   Rails app           writes logs to text files                MySQL...
Example Use Case – MySQL to TDhundreds of app servers  Rails app           td-agent               sends event logs        ...
What’s Treasure Data?Key differentiators:>   TD delivers BigData analytics>   in days, not months>   without specialists o...
Problem 1:investigating problems took timeCustomers need support... >   “I uploaded data but can’t get on queries” >   “Do...
Problem 1:investigating problems took timeInvestigating these problems took timebecause:        doubts.count.times {      ...
* the actual facts>   Actually data were not uploaded    (clients had a problem; disk full)     We had ought to monitor up...
Problem 2:many tasks to do but hard to prioritizeWe want to do... > fix bugs > improve performance > increase number of si...
Problem 2:many tasks to do but hard to prioritizeWe need data to make decision. data: Performance is getting worse. decisi...
How did we solve?We collected application metrics.
Treasure Data’s backend architectureFrontend               Worker           Job Queue            Hadoop                   ...
Solution v1:   Frontend                               Worker                          Job Queue                           ...
What’s solvedWe can monitor overal behavior of servers.We can notice performance decreasing.We can get alerts when a probl...
What’s not solvedWe can’t get detailed information. > how large data is “this user” uploading?Configuration file is compli...
Solution v2:   Frontend                           Worker                          Job Queue                      Hadoop   ...
What’s solved by v2We can get detailed information directly fromapplications > graphs for each customersDRY - we can keep ...
APIMetricSense.value {:size=>32}MetricSense.segment {:account=>1}MetricSense.fact {:path=>‘/path1’}MetricSense.measure!
What did we learn?>   We always have lots of tasks    > we need data to prioritize them.>   Problems are usually complicat...
We open sourced     MetricSense      https://github.com/treasure-data/metricsense
Components of MetricSensemetricsense.gem > client library for Ruby to send metricsfluent-plugin-metricsense  > plugin for ...
RDB backend for MetricSenseAggregate metrics on RDBMS in optimizedform for time-series data.  > Borrowed concepts from Ope...
Solution v3 (future work):Alerting using historical data > simple machine largning to adjust threashold   values          ...
We’re Hiring!
Sales Engineer  Evangelize TD/Fluentd. Get everyone excited!  Help customers deploy and maintain TD successfully.  Preferr...
Competitive salary + equity packageWho we want  STRONG business and customer support DNA     Everyone is equally responsib...
contact: sales@treasure-data.com
Fluentd meetup #3
Fluentd meetup #3
Fluentd meetup #3
Upcoming SlideShare
Loading in …5
×

Fluentd meetup #3

3,011 views

Published on

Fluentd meetup #3 #fluentd

Published in: Technology
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,011
On SlideShare
0
From Embeds
0
Number of Embeds
45
Actions
Shares
0
Downloads
48
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide

Fluentd meetup #3

  1. 1. Collecting app metricsin decentralized systemsDecision making based on factsSadayuki FuruhashiTreasuare Data, Inc.Founder & Software Architect Fluentd meetup #3
  2. 2. Self-introduction> Sadayuki Furuhashi> Treasure Data, Inc. Founder & Software Architect> Open source projects MessagePack - efficient serializer (original author) Fluentd - event collector (original author)
  3. 3. What’s our service?What’s the problems we faced?How did we solve them? My TalkWhat did we learn?We open sourced the system
  4. 4. What’s Treasure Data?Treasure Data provides cloud-based data warehouseas a service.
  5. 5. Treasure Data Service Architecture open sourced Apache App Treasure Data td-agent columnar data App RDBMS warehouse Other data sources MAPREDUCE JOBS HIVE, PIG (to be supported) td-command Query Query Processing API JDBC, REST ClusterUser BI apps
  6. 6. Example Use Case – MySQL to TDhundreds of app servers Rails app writes logs to text files MySQL Daily/Hourly Google Nightly Batch Spreadsheet INSERT Rails app MySQL writes logs to text files MySQL MySQL Rails app writes logs to text files KPI Feedback rankings visualization- Limited scalability- Fixed schema- Not realtime- Unexpected INSERT latency
  7. 7. Example Use Case – MySQL to TDhundreds of app servers Rails app td-agent sends event logs Daily/Hourly Google Batch Spreadsheet Rails app td-agent Treasure Data sends event logs MySQL Rails app td-agent Logs are available sends event logs after several mins. KPI Feedback rankings visualization Unlimited scalability Flexible schema Realtime Less performance impact
  8. 8. What’s Treasure Data?Key differentiators:> TD delivers BigData analytics> in days, not months> without specialists or IT resources> for 1/10th the cost of the alternativesWhy? Because it’s a multi-tenant service.
  9. 9. Problem 1:investigating problems took timeCustomers need support... > “I uploaded data but can’t get on queries” > “Download query results take time” > “Our queries take longer time recently”
  10. 10. Problem 1:investigating problems took timeInvestigating these problems took timebecause: doubts.count.times { servers.count.times { ssh to a server grep logs } }
  11. 11. * the actual facts> Actually data were not uploaded (clients had a problem; disk full) We had ought to monitor uploading so that we immediately know we’re not getting data from the user.> Our servers were getting slower because of increasing load We had ought to notice it and add servers before having the problem.> There was a bug which occurs under a specific condition We had ought to collect unexpected errors and fix it as soon as possible so that both we and users save time.
  12. 12. Problem 2:many tasks to do but hard to prioritizeWe want to do... > fix bugs > improve performance > increase number of sign-ups > increase number of queries by customers > incrasse number of periodic queriesWhat’s the “bottleneck”, whch should besolved first?
  13. 13. Problem 2:many tasks to do but hard to prioritizeWe need data to make decision. data: Performance is getting worse. decision: Let’s add servers. data: Many customers upload data but few customers issue queries. decision: Let’s improve documents. data: A customer stopped to run upload data. decision: They might got a problem at the client side.
  14. 14. How did we solve?We collected application metrics.
  15. 15. Treasure Data’s backend architectureFrontend Worker Job Queue Hadoop Hadoop
  16. 16. Solution v1: Frontend Worker Job Queue Hadoop Hadoop Fluentd pulls metrics every minuts Fluentd (in_exec plugin) Treasure Data Librato Metricsfor historical analysis for realtime analysis
  17. 17. What’s solvedWe can monitor overal behavior of servers.We can notice performance decreasing.We can get alerts when a problem occurs.
  18. 18. What’s not solvedWe can’t get detailed information. > how large data is “this user” uploading?Configuration file is complicated. > we need to add lines to declare new metricsMonitoring server is SPOF.
  19. 19. Solution v2: Frontend Worker Job Queue Hadoop Hadoop Applications push metrics to Fluentd sums up data minuts (via local Fluentd) Fluentd Fluentd (partial aggregation) Treasure Data Librato Metricsfor historical analysis for realtime analysis
  20. 20. What’s solved by v2We can get detailed information directly fromapplications > graphs for each customersDRY - we can keep configuration files simple > Just add one line to apps > No needs to update fluentd.confDecentralized streaming aggregation > partial aggregation on fluentd, total aggregation on Librato Metrics
  21. 21. APIMetricSense.value {:size=>32}MetricSense.segment {:account=>1}MetricSense.fact {:path=>‘/path1’}MetricSense.measure!
  22. 22. What did we learn?> We always have lots of tasks > we need data to prioritize them.> Problems are usually complicated > we need data to save time.> Adding metrics should be DRY > otherwise you feel bored and will not add metrics.> Realtime analysis is useful, but we still need batch analysis. > “who are not issuing queries, despite of storing data last month?” > “which pages did users look before sign-up?” > “which pages did not users look before getting trouble?”
  23. 23. We open sourced MetricSense https://github.com/treasure-data/metricsense
  24. 24. Components of MetricSensemetricsense.gem > client library for Ruby to send metricsfluent-plugin-metricsense > plugin for Fluentd to collect metrics > pluggable backends:> Librato Metrics backend> RDBMS backend
  25. 25. RDB backend for MetricSenseAggregate metrics on RDBMS in optimizedform for time-series data. > Borrowed concepts from OpenTSDB and OLAP cube.metric_tags: segment_values: metric_id, metric_name, segment_name segment_id, name 1 “import.size” NULL 5 “a001” 2 “import.size” “account” 6 “a002”data: base_time, metric_id, segment_id, m0, m1, m2, ..., m59 19:00 1 5 25 31 19 ... 21 21:00 2 5 75 94 68 ... 72 21:00 2 6 63 82 55 ... 63
  26. 26. Solution v3 (future work):Alerting using historical data > simple machine largning to adjust threashold values Historical average Alert!
  27. 27. We’re Hiring!
  28. 28. Sales Engineer Evangelize TD/Fluentd. Get everyone excited! Help customers deploy and maintain TD successfully. Preferred experience: OS, DB, BI, statistics and data scienceDevops engineer Development, operation and monitoring of our large- scale, multi-tenant system Preferred experience: large-scale system development and management
  29. 29. Competitive salary + equity packageWho we want STRONG business and customer support DNA Everyone is equally responsible for customer support Customer success = our success Self-discipline and responsible Be your own manager Team player with excellent communication skills Distributed team and global customer baseContact me: sf@treasure-data.com
  30. 30. contact: sales@treasure-data.com

×