Your SlideShare is downloading. ×
Large scale-olap-with-kobayashi
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Large scale-olap-with-kobayashi

511
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
511
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Large-scale OLAP with Kobayashi Boundary Tech Talks Fri, May 18, 2012 Dietrich Featherston, Boundary @d2fnFriday, May 18, 12
  • 2. Monitoring is an analytics problemFriday, May 18, 12
  • 3. Historical PerspectiveFriday, May 18, 12
  • 4. 1 minute collection intervals Arbitrary OLAPFriday, May 18, 12
  • 5. Cassandra bitset indexes per dimension query-time samplingFriday, May 18, 12
  • 6. Friday, May 18, 12
  • 7. riak_core + fastbitFriday, May 18, 12
  • 8. apply intelligence to the problemFriday, May 18, 12
  • 9. Arbitrary OLAP requires 2n data cubes where n is dimensionalityFriday, May 18, 12
  • 10. dimensions (11) measurements (4) epoch seconds egress packets epoch minutes egress octets epoch hours ingress packets meter id ingress octets source ip source port dest ip dest port interface country networkFriday, May 18, 12
  • 11. Total Volume. by Host Port/Protocol Country Network + meter For each aggregation periodFriday, May 18, 12
  • 12. Friday, May 18, 12
  • 13. 15 < 2 11Friday, May 18, 12
  • 14. 24 hours 2 Months ~10 years 86,400 Observations (per monitored host per query)Friday, May 18, 12
  • 15. 86,400*15 ≈ 1.3M Observations (per monitored host )Friday, May 18, 12
  • 16. Total Observations (for half a million meters)Friday, May 18, 12
  • 17. Riak Key 10 seconds { { Layout 100 meters < 80KBFriday, May 18, 12
  • 18. Total ObservationsFriday, May 18, 12
  • 19. Friday, May 18, 12
  • 20. Friday, May 18, 12
  • 21. Friday, May 18, 12
  • 22. Friday, May 18, 12
  • 23. Friday, May 18, 12
  • 24. Bitcask would have been nice LevelDB backend Use leveldb cache to bound memoryFriday, May 18, 12
  • 25. Compute your keys Use secondary indexes sparinglyFriday, May 18, 12
  • 26. Friday, May 18, 12
  • 27. How do I query the database?Friday, May 18, 12
  • 28. Find 45 minutes of total traffic seen on meters 1, 2, 226, & 301 starting 18 hours ago broken down by traffic typeFriday, May 18, 12
  • 29. Atomic 10 seconds { { Unit of Storage 100 meters < 80KBFriday, May 18, 12
  • 30. Step 1: fetch appropriate blocks (riak) 45 min Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19Meter Id 1 0 2 (0,99) 100 (100,199) 200 226 (200,299) 301 300 (300,399) 400 (400,499)Friday, May 18, 12
  • 31. Step 2: filter 45 min Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19Meter Id 1 0 2 (0,99) 100 (100,199) 200 226 (200,299) 301 300 (300,399) 400 (400,499)Friday, May 18, 12
  • 32. Step 3: aggregate and perform top-k 45 min topk( , 10) 1 + 2 226 301 { epochMillis: 1337230140000 portProtocol: "4740:6" ingressPackets: 370482 ingressOctets: 3113782199 egressPackets: 343780 egressOctets: 37126033 }, { epochMillis: 1337230140000 portProtocol: "9092:6" ingressPackets: 440915 ingressOctets: 1816615857 egressPackets: 481237 egressOctets: 1312198133 }, ...Friday, May 18, 12
  • 33. In URL Form http://computers-r-terrible/ volume_1m_meter_port_protocol/ data? from=-18h& duration=45m parts=1,2,226,301& aggregations=observationDomainIdFriday, May 18, 12
  • 34. Arbitrary Aggregations http://computers-r-terrible/ volume_1m_meter_port_protocol/ data? from=-18h& duration=45m& parts=1,2,226,301& aggregations= observationDomainId, epochMillisFriday, May 18, 12
  • 35. “unfortunately the project has been blocked for weeks choosing a name”Friday, May 18, 12
  • 36. V V ʹ′ V ≃ Vʹ′Friday, May 18, 12
  • 37. V V ʹ′ V ≃ Vʹ′Friday, May 18, 12
  • 38. V V ʹ′ V ≃ Vʹ′Friday, May 18, 12
  • 39. Future -->Friday, May 18, 12
  • 40. Send expired data to cold storage output in arbitrary time resolutionFriday, May 18, 12
  • 41. Open source the data cubing and predicate matching code Query grammar for kobayashiFriday, May 18, 12
  • 42. questions?Friday, May 18, 12