fifth elephant - 2014: Live analytical dashboards at scale
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

fifth elephant - 2014: Live analytical dashboards at scale

on

  • 146 views

https://funnel.hasgeek.com/fifthel2014/1152-live-analytical-dashboards-at-scale-sql-style

https://funnel.hasgeek.com/fifthel2014/1152-live-analytical-dashboards-at-scale-sql-style

Statistics

Views

Total Views
146
Views on SlideShare
135
Embed Views
11

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 11

https://hasgeek.tv 11

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

fifth elephant - 2014: Live analytical dashboards at scale Presentation Transcript

  • 1. Live analytical dashboards at scale - SQL style Shashwat Agarwal
  • 2. Live Analytical
  • 3. Live Analytical
  • 4. What we have Services (Alotofthem) Events (millions of updates) Information
  • 5. Challenges • Metric Definition • Scale • Reliability
  • 6. Metric Definition • Not just count of events; but • func of • fields from one or more related events/entities • on each event or a batch of events (for statistical analysis) • for a set of dimensions
  • 7. Scale Challenges • Dimensional Lookup • High throughput (write), • Low Latency (query) • MultiDimensional Store
  • 8. Reliability Challenges • Accuracy • Consistency • Fault tolerance
  • 9. Solution? Real time + Scale == Stream Processing Kafka Storm
  • 10. Storage • MultiDimensional support • Optimized for Time series query • Low query response times • High write throughput • Scalable TSD* * OpenTSDB does not support kerberose
  • 11. Metric Definition • Not scalable to write storm topologies for each metrics • Require DSL for non-tech folks Introducing... Esper
  • 12. Storm Topology - 1 Dim Lookup Dim Lookup Kafka Spouts Enricher Bolts Kafka Bolts { id: a123-234, time: 1234, entityId: OD12 … } Event { id: a123-234, time: 1234, entityId: OD12 … } Enriched Event Dim Store
  • 13. Storm Topology - 3’ TSDKafka Spouts Esper Bolts TSD Bolts { id: a123-234, time: 1234, entityId: OD12 … } Enriched Event ( metric name, [dim name-value-pairs]*, value, ts )
  • 14. Time Batching • Event time • Enables • calculate statistics • windowed join • out of order events
  • 15. Reliability Faults Upgrades Metrics Def changes Last good Checkpoint Reset Checkpoint Replay Transactional Storm
  • 16. Storm Topology - 2 Kafka Spouts TIme Batch Bolt HBase Bolt { id: a123-234, time: 1234, entityId: OD12 … } Enriched Event
  • 17. HBase Time Batch Schema Table 1 - Event Queue • Key <event_ns>_slot_<batchId> batchId is constructed from event timestamp • Value (each column - Event JSON)
  • 18. HBase Time Batch Schema Table 2 - Event Queue Update Log • Key <event_ns>_log_<batchId>_<version> batchId is constructed from event timestamp version is timestamp at which batch was updated • Value Version
  • 19. Storm Topology - 3 TSD Time Batch Spout Esper Bolts TSD Bolts ( metric name, [dim name-value-pairs]*, value, ts )
  • 20. Learnings • Replayability • Event and Entity Schema • Checkpointing • Bootstrapping • Sidelining • Fault Tolerance
  • 21. Questions ?? sb.lk/hasgeek