Firehose Engineering

497 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
497
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Firehose Engineering

  1. 1. FirehoseEngineeringdesigninghigh-volumedata collectionsystems Josh Berkus FOSS4G-NA Wash. DC 2012
  2. 2. Firehose Database Applications (FDA)(1) very high volume of data input from many automated producers(2) continuous processing and aggregation of incoming data
  3. 3. Mozilla Socorro
  4. 4. Upwind
  5. 5. GPS Applications
  6. 6. Firehose Challenges
  7. 7. 1. Volume● 100s to 1000s facts/second● GB/hour
  8. 8. 1. Volume● spikes in volume● multiple uncoorindated sources
  9. 9. 1. Volumevolume always grows over time
  10. 10. 2. Constant flowsince data arrives 24/7 …while the user interface can bedown, data collection can neverbe down
  11. 11. 2. Constant flow ● cant stop receiving to processETL ● data can arrive out of order
  12. 12. 3. Database size● terabytes to petabytes ● lots of hardware ● single-node DBMSes arent enough ● difficult backups, redundancy, migration ● analytics are resource-consumptive
  13. 13. 3. Database size● database growth ● size grows quickly ● need to expand storage ● estimate target data size ● create data ageing policies
  14. 14. 3. Database size “We will decide on a dataretention policy when we run out of disk space.” – every business user everywhere
  15. 15. 4.
  16. 16. many components = many failures
  17. 17. 4. Component failure● all components fail ● or need scheduled downtime ● including the network● collection must continue● collection & processing must recover
  18. 18. solving firehose problems
  19. 19. socorro project
  20. 20. http://crash-stats.mozilla.com
  21. 21. Mozilla Socorro webserverscollectors reports processors
  22. 22. socorro data volume● 3000 crashes/minute ● avg. size 150K● 40TB accumulated raw data ● 500GB accumulated metadata / reports
  23. 23. dealing with volume load balancers collectors
  24. 24. dealing with volumemonitor processors
  25. 25. dealing with size data 40TB expandible metadata views 500GB fixed size
  26. 26. dealing with component failureLots of hardware ... ● 30 Hbase ● 3 ES servers nodes ● 6 collectors ● 2 PostgreSQL ● 12 processors servers ● 8 middleware & ● 6 load web servers balancers … lots of failures
  27. 27. load balancing & redundancy load balancers collectors
  28. 28. elastic connections● components queue their data ● retain it if other nodes are down● components resume work automatically ● when other nodes come back up
  29. 29. elastic connections crash mover collectorreciever local file queue
  30. 30. server management● puppet ● controls configuration of all servers ● makes sure servers recover ● allows rapid deployment of replacement nodes
  31. 31. Upwind
  32. 32. Upwind ● speed ● wind speed ● heat ● vibration ● noise ● direction
  33. 33. Upwind 1. maximize power generation 2. make sure turbine isnt damaged
  34. 34. dealing with volumeeach turbine: 90 to 700 facts/secondwindmills per farm: up to 100number of farms: 40+est. total: 300,000 facts/second (will grow)
  35. 35. dealing with volume local historian analytic reports storage database local historian analytic reports storage database
  36. 36. dealing with volume local historian analytic storage database master database local historian analytic storage database
  37. 37. multi-tenant partitioning● partition the whole application ● each customer gets their own toolchain● allows scaling with the number of customers ● lowers efficiency ● more efficient with virtualization
  38. 38. dealing with: constant flow and size minute hours days months yearshistorian buffer table table table table minute hours days months historian buffer table table table minute hours days historian buffer table table
  39. 39. time-based rollups● continuously accumulate levels of rollup ● each is based on the level below it ● data is always appended, never updated ● small windows == small resources
  40. 40. time-based rollups● allows: ● very rapid summary reports for different windows ● retaining different summaries for different levels of time ● batch/out-of-order processing ● summarization in parallel
  41. 41. firehose tips
  42. 42. data collection must be: ● continuous ● parallel ● fault-tolerant
  43. 43. data processing must be: ● continuous ● parallel ● fault-tolerant
  44. 44. every component must be able to fail● including the network● without too much data loss● other components must continue
  45. 45. 5 tools to use1. queueing software2. buffering techniques3. materialized views4. configuration management5. comprehensive monitoring
  46. 46. 4 donts1. use cutting-edge technology2. use untested hardware3. run components to capacity4. do hot patching
  47. 47. master your own
  48. 48. Contact● Josh Berkus: josh@pgexperts.com ● blog: blogs.ittoolbox.com/database/soup● PostgreSQL: www.postgresql.org ● pgexperts: www.pgexperts.com● Postgres BOF 3:30PM Today ● pgCon, Ottawa, May 2012 ● Postgres Open, Chicago, Sept. 2012 The text and diagrams in this talk is copyright 2011 Josh Berkus and is licensed under the creative commons attribution license. Title slide image and other fireman images are licensed from iStockPhoto and may not be reproduced or redistributed separate from this presentation. Socorro images are copyright 2011 Mozilla Inc.

×