SlideShare a Scribd company logo
1 of 42
Download to read offline
Large-scale
 OLAP with
 Kobayashi

 Boundary Tech Talks
 Fri, May 18, 2012




 Dietrich Featherston, Boundary
 @d2fn

Friday, May 18, 12
Monitoring is an
                     analytics problem


Friday, May 18, 12
Historical Perspective



Friday, May 18, 12
1 minute collection
                          intervals

                      Arbitrary OLAP

Friday, May 18, 12
Cassandra

                      bitset indexes per
                          dimension

                     query-time sampling

Friday, May 18, 12
Friday, May 18, 12
riak_core + fastbit




Friday, May 18, 12
apply intelligence to
                        the problem



Friday, May 18, 12
Arbitrary OLAP
                     requires 2n data

                          cubes
                      where n is dimensionality




Friday, May 18, 12
dimensions (11)   measurements (4)
     epoch seconds     egress packets
     epoch minutes     egress octets
     epoch hours       ingress packets
     meter id          ingress octets
     source ip
     source port
     dest ip
     dest port
     interface
     country
     network


Friday, May 18, 12
Total Volume.
                 by
                  Host
                  Port/Protocol
                  Country
                  Network
                 + meter
                 For each aggregation period

Friday, May 18, 12
Friday, May 18, 12
15 <   2 11



Friday, May 18, 12
24 hours            2 Months                ~10 years



                       86,400
                     Observations
                          (per monitored host per query)




Friday, May 18, 12
86,400*15 ≈ 1.3M
           Observations
                     (per monitored host )




Friday, May 18, 12
Total Observations
                        (for half a million meters)



Friday, May 18, 12
Riak Key                 10 seconds




                     {            {
          Layout

                     100 meters    < 80KB




Friday, May 18, 12
Total Observations

Friday, May 18, 12
Friday, May 18, 12
Friday, May 18, 12
Friday, May 18, 12
Friday, May 18, 12
Friday, May 18, 12
Bitcask would have
                          been nice

                      LevelDB backend

                     Use leveldb cache to
                       bound memory

Friday, May 18, 12
Compute your keys

                       Use secondary
                     indexes sparingly

Friday, May 18, 12
Friday, May 18, 12
How do I query the
                        database?



Friday, May 18, 12
Find 45 minutes of
                      total traffic seen on
                      meters 1, 2, 226, &
                     301 starting 18 hours
                     ago broken down by
                          traffic type
Friday, May 18, 12
Atomic                      10 seconds




                     {            {
      Unit of
      Storage
                     100 meters    < 80KB




Friday, May 18, 12
Step 1: fetch appropriate blocks (riak)                           45 min                                     Time
                     t0   t1   t2   t3   t4   t5   t6    t7   t8     t9   t10   t11   t12   t13   t14   t15   t18   t19
Meter Id
                                                          1


          0
                                                          2




         (0,99)


   100
      (100,199)


  200                                                   226



     (200,299)
                                                        301



   300
     (300,399)


  400
     (400,499)

Friday, May 18, 12
Step 2: filter                                                    45 min                                     Time
                     t0   t1   t2   t3   t4   t5   t6    t7   t8     t9   t10   t11   t12   t13   t14   t15   t18   t19
Meter Id
                                                          1


          0
                                                          2




         (0,99)


   100
      (100,199)


  200                                                   226



     (200,299)
                                                        301



   300
     (300,399)


  400
     (400,499)

Friday, May 18, 12
Step 3: aggregate and perform top-k         45 min




                     topk(                                , 10)
                                         1

                                  +
                                         2
                                       226
                                       301




                             {
                                  epochMillis: 1337230140000
                                  portProtocol: "4740:6"
                                  ingressPackets: 370482
                                  ingressOctets: 3113782199
                                  egressPackets: 343780
                                  egressOctets: 37126033
                             },
                             {
                                  epochMillis: 1337230140000
                                  portProtocol: "9092:6"
                                  ingressPackets: 440915
                                  ingressOctets: 1816615857
                                  egressPackets: 481237
                                  egressOctets: 1312198133
                             },
                             ...



Friday, May 18, 12
In URL Form
                     http://computers-r-terrible/
                       volume_1m_meter_port_protocol/
                       data?
                       from=-18h&
                       duration=45m
                       parts=1,2,226,301&
                       aggregations=observationDomainId




Friday, May 18, 12
Arbitrary Aggregations
                     http://computers-r-terrible/
                       volume_1m_meter_port_protocol/
                       data?
                       from=-18h&
                       duration=45m&
                       parts=1,2,226,301&
                       aggregations=
                         observationDomainId,
                         epochMillis




Friday, May 18, 12
“unfortunately the
                      project has been
                     blocked for weeks
                     choosing a name”


Friday, May 18, 12
V

                               V ʹ′
                     V ≃ Vʹ′
Friday, May 18, 12
V

                               V ʹ′
                     V ≃ Vʹ′
Friday, May 18, 12
V

                               V ʹ′
                     V ≃ Vʹ′
Friday, May 18, 12
Future -->



Friday, May 18, 12
Send expired data to
                        cold storage

                     output in arbitrary
                      time resolution

Friday, May 18, 12
Open source the data
                     cubing and predicate
                        matching code

                     Query grammar for
                         kobayashi
Friday, May 18, 12
questions?



Friday, May 18, 12

More Related Content

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Large scale-olap-with-kobayashi

  • 1. Large-scale OLAP with Kobayashi Boundary Tech Talks Fri, May 18, 2012 Dietrich Featherston, Boundary @d2fn Friday, May 18, 12
  • 2. Monitoring is an analytics problem Friday, May 18, 12
  • 4. 1 minute collection intervals Arbitrary OLAP Friday, May 18, 12
  • 5. Cassandra bitset indexes per dimension query-time sampling Friday, May 18, 12
  • 8. apply intelligence to the problem Friday, May 18, 12
  • 9. Arbitrary OLAP requires 2n data cubes where n is dimensionality Friday, May 18, 12
  • 10. dimensions (11) measurements (4) epoch seconds egress packets epoch minutes egress octets epoch hours ingress packets meter id ingress octets source ip source port dest ip dest port interface country network Friday, May 18, 12
  • 11. Total Volume. by Host Port/Protocol Country Network + meter For each aggregation period Friday, May 18, 12
  • 13. 15 < 2 11 Friday, May 18, 12
  • 14. 24 hours 2 Months ~10 years 86,400 Observations (per monitored host per query) Friday, May 18, 12
  • 15. 86,400*15 ≈ 1.3M Observations (per monitored host ) Friday, May 18, 12
  • 16. Total Observations (for half a million meters) Friday, May 18, 12
  • 17. Riak Key 10 seconds { { Layout 100 meters < 80KB Friday, May 18, 12
  • 24. Bitcask would have been nice LevelDB backend Use leveldb cache to bound memory Friday, May 18, 12
  • 25. Compute your keys Use secondary indexes sparingly Friday, May 18, 12
  • 27. How do I query the database? Friday, May 18, 12
  • 28. Find 45 minutes of total traffic seen on meters 1, 2, 226, & 301 starting 18 hours ago broken down by traffic type Friday, May 18, 12
  • 29. Atomic 10 seconds { { Unit of Storage 100 meters < 80KB Friday, May 18, 12
  • 30. Step 1: fetch appropriate blocks (riak) 45 min Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19 Meter Id 1 0 2 (0,99) 100 (100,199) 200 226 (200,299) 301 300 (300,399) 400 (400,499) Friday, May 18, 12
  • 31. Step 2: filter 45 min Time t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19 Meter Id 1 0 2 (0,99) 100 (100,199) 200 226 (200,299) 301 300 (300,399) 400 (400,499) Friday, May 18, 12
  • 32. Step 3: aggregate and perform top-k 45 min topk( , 10) 1 + 2 226 301 { epochMillis: 1337230140000 portProtocol: "4740:6" ingressPackets: 370482 ingressOctets: 3113782199 egressPackets: 343780 egressOctets: 37126033 }, { epochMillis: 1337230140000 portProtocol: "9092:6" ingressPackets: 440915 ingressOctets: 1816615857 egressPackets: 481237 egressOctets: 1312198133 }, ... Friday, May 18, 12
  • 33. In URL Form http://computers-r-terrible/ volume_1m_meter_port_protocol/ data? from=-18h& duration=45m parts=1,2,226,301& aggregations=observationDomainId Friday, May 18, 12
  • 34. Arbitrary Aggregations http://computers-r-terrible/ volume_1m_meter_port_protocol/ data? from=-18h& duration=45m& parts=1,2,226,301& aggregations= observationDomainId, epochMillis Friday, May 18, 12
  • 35. “unfortunately the project has been blocked for weeks choosing a name” Friday, May 18, 12
  • 36. V V ʹ′ V ≃ Vʹ′ Friday, May 18, 12
  • 37. V V ʹ′ V ≃ Vʹ′ Friday, May 18, 12
  • 38. V V ʹ′ V ≃ Vʹ′ Friday, May 18, 12
  • 40. Send expired data to cold storage output in arbitrary time resolution Friday, May 18, 12
  • 41. Open source the data cubing and predicate matching code Query grammar for kobayashi Friday, May 18, 12