Large scale-olap-with-kobayashi1. Large-scale
OLAP with
Kobayashi
Boundary Tech Talks
Fri, May 18, 2012
Dietrich Featherston, Boundary
@d2fn
Friday, May 18, 12
5. Cassandra
bitset indexes per
dimension
query-time sampling
Friday, May 18, 12
9. Arbitrary OLAP
requires 2n data
cubes
where n is dimensionality
Friday, May 18, 12
10. dimensions (11) measurements (4)
epoch seconds egress packets
epoch minutes egress octets
epoch hours ingress packets
meter id ingress octets
source ip
source port
dest ip
dest port
interface
country
network
Friday, May 18, 12
11. Total Volume.
by
Host
Port/Protocol
Country
Network
+ meter
For each aggregation period
Friday, May 18, 12
13. 15 < 2 11
Friday, May 18, 12
14. 24 hours 2 Months ~10 years
86,400
Observations
(per monitored host per query)
Friday, May 18, 12
17. Riak Key 10 seconds
{ {
Layout
100 meters < 80KB
Friday, May 18, 12
24. Bitcask would have
been nice
LevelDB backend
Use leveldb cache to
bound memory
Friday, May 18, 12
27. How do I query the
database?
Friday, May 18, 12
28. Find 45 minutes of
total traffic seen on
meters 1, 2, 226, &
301 starting 18 hours
ago broken down by
traffic type
Friday, May 18, 12
29. Atomic 10 seconds
{ {
Unit of
Storage
100 meters < 80KB
Friday, May 18, 12
30. Step 1: fetch appropriate blocks (riak) 45 min Time
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19
Meter Id
1
0
2
(0,99)
100
(100,199)
200 226
(200,299)
301
300
(300,399)
400
(400,499)
Friday, May 18, 12
31. Step 2: filter 45 min Time
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t18 t19
Meter Id
1
0
2
(0,99)
100
(100,199)
200 226
(200,299)
301
300
(300,399)
400
(400,499)
Friday, May 18, 12
32. Step 3: aggregate and perform top-k 45 min
topk( , 10)
1
+
2
226
301
{
epochMillis: 1337230140000
portProtocol: "4740:6"
ingressPackets: 370482
ingressOctets: 3113782199
egressPackets: 343780
egressOctets: 37126033
},
{
epochMillis: 1337230140000
portProtocol: "9092:6"
ingressPackets: 440915
ingressOctets: 1816615857
egressPackets: 481237
egressOctets: 1312198133
},
...
Friday, May 18, 12
33. In URL Form
http://computers-r-terrible/
volume_1m_meter_port_protocol/
data?
from=-18h&
duration=45m
parts=1,2,226,301&
aggregations=observationDomainId
Friday, May 18, 12
34. Arbitrary Aggregations
http://computers-r-terrible/
volume_1m_meter_port_protocol/
data?
from=-18h&
duration=45m&
parts=1,2,226,301&
aggregations=
observationDomainId,
epochMillis
Friday, May 18, 12
35. “unfortunately the
project has been
blocked for weeks
choosing a name”
Friday, May 18, 12
36. V
V ʹ′
V ≃ Vʹ′
Friday, May 18, 12
37. V
V ʹ′
V ≃ Vʹ′
Friday, May 18, 12
38. V
V ʹ′
V ≃ Vʹ′
Friday, May 18, 12
40. Send expired data to
cold storage
output in arbitrary
time resolution
Friday, May 18, 12
41. Open source the data
cubing and predicate
matching code
Query grammar for
kobayashi
Friday, May 18, 12