Your SlideShare is downloading. ×
  • Like
fifth elephant - 2014: Live analytical dashboards at scale
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

fifth elephant - 2014: Live analytical dashboards at scale

  • 196 views
Published

https://funnel.hasgeek.com/fifthel2014/1152-live-analytical-dashboards-at-scale-sql-style

https://funnel.hasgeek.com/fifthel2014/1152-live-analytical-dashboards-at-scale-sql-style

Published in Data & Analytics
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
196
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Live analytical dashboards at scale - SQL style Shashwat Agarwal
  • 2. Live Analytical
  • 3. Live Analytical
  • 4. What we have Services (Alotofthem) Events (millions of updates) Information
  • 5. Challenges • Metric Definition • Scale • Reliability
  • 6. Metric Definition • Not just count of events; but • func of • fields from one or more related events/entities • on each event or a batch of events (for statistical analysis) • for a set of dimensions
  • 7. Scale Challenges • Dimensional Lookup • High throughput (write), • Low Latency (query) • MultiDimensional Store
  • 8. Reliability Challenges • Accuracy • Consistency • Fault tolerance
  • 9. Solution? Real time + Scale == Stream Processing Kafka Storm
  • 10. Storage • MultiDimensional support • Optimized for Time series query • Low query response times • High write throughput • Scalable TSD* * OpenTSDB does not support kerberose
  • 11. Metric Definition • Not scalable to write storm topologies for each metrics • Require DSL for non-tech folks Introducing... Esper
  • 12. Storm Topology - 1 Dim Lookup Dim Lookup Kafka Spouts Enricher Bolts Kafka Bolts { id: a123-234, time: 1234, entityId: OD12 … } Event { id: a123-234, time: 1234, entityId: OD12 … } Enriched Event Dim Store
  • 13. Storm Topology - 3’ TSDKafka Spouts Esper Bolts TSD Bolts { id: a123-234, time: 1234, entityId: OD12 … } Enriched Event ( metric name, [dim name-value-pairs]*, value, ts )
  • 14. Time Batching • Event time • Enables • calculate statistics • windowed join • out of order events
  • 15. Reliability Faults Upgrades Metrics Def changes Last good Checkpoint Reset Checkpoint Replay Transactional Storm
  • 16. Storm Topology - 2 Kafka Spouts TIme Batch Bolt HBase Bolt { id: a123-234, time: 1234, entityId: OD12 … } Enriched Event
  • 17. HBase Time Batch Schema Table 1 - Event Queue • Key <event_ns>_slot_<batchId> batchId is constructed from event timestamp • Value (each column - Event JSON)
  • 18. HBase Time Batch Schema Table 2 - Event Queue Update Log • Key <event_ns>_log_<batchId>_<version> batchId is constructed from event timestamp version is timestamp at which batch was updated • Value Version
  • 19. Storm Topology - 3 TSD Time Batch Spout Esper Bolts TSD Bolts ( metric name, [dim name-value-pairs]*, value, ts )
  • 20. Learnings • Replayability • Event and Entity Schema • Checkpointing • Bootstrapping • Sidelining • Fault Tolerance
  • 21. Questions ?? sb.lk/hasgeek