Your SlideShare is downloading. ×
0
Live analytical dashboards at scale - SQL style
Shashwat Agarwal
Live
Analytical
Live
Analytical
What we have
Services
(Alotofthem)
Events
(millions of updates)
Information
Challenges
• Metric Definition
• Scale
• Reliability
Metric Definition
• Not just count of events; but
• func of
• fields from one or more related events/entities
• on each ev...
Scale Challenges
• Dimensional Lookup
• High throughput (write),
• Low Latency (query)
• MultiDimensional Store
Reliability Challenges
• Accuracy
• Consistency
• Fault tolerance
Solution?
Real time + Scale == Stream Processing
Kafka Storm
Storage
• MultiDimensional support
• Optimized for Time series query
• Low query response times
• High write throughput
• ...
Metric Definition
• Not scalable to write storm topologies for
each metrics
• Require DSL for non-tech folks
Introducing.....
Storm Topology - 1
Dim
Lookup
Dim
Lookup
Kafka
Spouts
Enricher
Bolts
Kafka
Bolts
{
id: a123-234,
time: 1234,
entityId: OD1...
Storm Topology - 3’
TSDKafka
Spouts
Esper
Bolts
TSD Bolts
{
id: a123-234,
time: 1234,
entityId: OD12
…
}
Enriched
Event
( ...
Time Batching
• Event time
• Enables
• calculate statistics
• windowed join
• out of order events
Reliability
Faults
Upgrades
Metrics Def
changes
Last good
Checkpoint
Reset
Checkpoint
Replay
Transactional
Storm
Storm Topology - 2
Kafka
Spouts
TIme Batch
Bolt
HBase Bolt
{
id: a123-234,
time: 1234,
entityId: OD12
…
}
Enriched
Event
HBase Time Batch Schema
Table 1 - Event Queue
• Key
<event_ns>_slot_<batchId>
batchId is constructed from event timestamp
...
HBase Time Batch Schema
Table 2 - Event Queue Update Log
• Key
<event_ns>_log_<batchId>_<version>
batchId is constructed f...
Storm Topology - 3
TSD
Time Batch
Spout
Esper
Bolts
TSD Bolts
( metric name,
[dim name-value-pairs]*,
value, ts )
Learnings
• Replayability
• Event and Entity Schema
• Checkpointing
• Bootstrapping
• Sidelining
• Fault Tolerance
Questions ??
sb.lk/hasgeek
Upcoming SlideShare
Loading in...5
×

fifth elephant - 2014: Live analytical dashboards at scale

292

Published on

https://funnel.hasgeek.com/fifthel2014/1152-live-analytical-dashboards-at-scale-sql-style

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
292
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "fifth elephant - 2014: Live analytical dashboards at scale"

  1. 1. Live analytical dashboards at scale - SQL style Shashwat Agarwal
  2. 2. Live Analytical
  3. 3. Live Analytical
  4. 4. What we have Services (Alotofthem) Events (millions of updates) Information
  5. 5. Challenges • Metric Definition • Scale • Reliability
  6. 6. Metric Definition • Not just count of events; but • func of • fields from one or more related events/entities • on each event or a batch of events (for statistical analysis) • for a set of dimensions
  7. 7. Scale Challenges • Dimensional Lookup • High throughput (write), • Low Latency (query) • MultiDimensional Store
  8. 8. Reliability Challenges • Accuracy • Consistency • Fault tolerance
  9. 9. Solution? Real time + Scale == Stream Processing Kafka Storm
  10. 10. Storage • MultiDimensional support • Optimized for Time series query • Low query response times • High write throughput • Scalable TSD* * OpenTSDB does not support kerberose
  11. 11. Metric Definition • Not scalable to write storm topologies for each metrics • Require DSL for non-tech folks Introducing... Esper
  12. 12. Storm Topology - 1 Dim Lookup Dim Lookup Kafka Spouts Enricher Bolts Kafka Bolts { id: a123-234, time: 1234, entityId: OD12 … } Event { id: a123-234, time: 1234, entityId: OD12 … } Enriched Event Dim Store
  13. 13. Storm Topology - 3’ TSDKafka Spouts Esper Bolts TSD Bolts { id: a123-234, time: 1234, entityId: OD12 … } Enriched Event ( metric name, [dim name-value-pairs]*, value, ts )
  14. 14. Time Batching • Event time • Enables • calculate statistics • windowed join • out of order events
  15. 15. Reliability Faults Upgrades Metrics Def changes Last good Checkpoint Reset Checkpoint Replay Transactional Storm
  16. 16. Storm Topology - 2 Kafka Spouts TIme Batch Bolt HBase Bolt { id: a123-234, time: 1234, entityId: OD12 … } Enriched Event
  17. 17. HBase Time Batch Schema Table 1 - Event Queue • Key <event_ns>_slot_<batchId> batchId is constructed from event timestamp • Value (each column - Event JSON)
  18. 18. HBase Time Batch Schema Table 2 - Event Queue Update Log • Key <event_ns>_log_<batchId>_<version> batchId is constructed from event timestamp version is timestamp at which batch was updated • Value Version
  19. 19. Storm Topology - 3 TSD Time Batch Spout Esper Bolts TSD Bolts ( metric name, [dim name-value-pairs]*, value, ts )
  20. 20. Learnings • Replayability • Event and Entity Schema • Checkpointing • Bootstrapping • Sidelining • Fault Tolerance
  21. 21. Questions ?? sb.lk/hasgeek
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×