Your SlideShare is downloading. ×
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Storm overview & integration
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Storm overview & integration

659

Published on

Published in: Technology, Design
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
659
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. STORM Buckle up Dorothy !!!
  • 2. Distributed real-time computation ABOUT By Nathan Marz Backtype => Twitter => Apache
  • 3. Real-time analytics WHAT IS IT GOOD FOR? Online machine learning Continuous computation Distributed RPC ETL (Extract, Transform, Load) …
  • 4. No data loss Fault-tolerantScalable PROMISES Robust
  • 5. VIEW FROM ABOVE StorageTopology Stream Source Storm Cluster Pull (Kafka,* MQ, …) Read/Write
  • 6. PRIMITIVES Field 1 / Value 1 Field 2 / Value 2 Field 3 / Value 3 Field 4 / Value 4 Field 5 / Value 5 Tuple Tuple Tuple Tuple Tuple Stream
  • 7. Topology Bolt PRIMITIVES Spout Bolt Spout Bolt Bolt
  • 8. ABSTRACTION PRIMITIVES Tuples Filters Transformation Incremental Distributed Scalable Functions Joins Chaining streams Small components EFFECTS Spouts Bolts
  • 9. CLUSTER Nimbus Zookeeper Cluster Worker Node Executor Supervisor Executor Executor Worker Node Executor Supervisor Executor Executor Worker Node Executor Supervisor Executor Executor
  • 10. NIMBUS / NODES CLUSTER Small No state Communication State RobustKill / Restart easy ZOOKEEPER
  • 11. No data loss Fault-tolerantScalable AS PROMISED? Robust
  • 12. GUARANTEES Message transforms into a tuple tree Storm tracks tuple tree Fully processed when tree exhausted
  • 13. FAILURES Task died – failed tuples replayed Acker task died – related tuples timeout and are replayed Spout task died – source replays, e.g. pending messages are placed back on the queue
  • 14. WHAT DO I HAVE TO DO? Inform about new links in tree Inform when finished with a tuple Every tuple must be acked or failed
  • 15. TRIDENT ANYTHING SIMPLER? High level abstraction Stateful persistence primitives Exactly-once semantics
  • 16. AS PROMISED? YES
  • 17. USER DASHBOARD PROBLEM Bad performance Uses core storage Pre-compute Customize Fast IDEA Isolate Quarterly agg.
  • 18. ARCHITECTURE Core Events Queue Kafka 4 Partitions 2 Replicas Storm 4 Workers MS SQL 4 Staging Dashboard Push Pull Write Read State in source
  • 19. KAFKA 9 8 7 6 5 4 3 2 1 New Client Topic Stacked Flushed Client offset Replicated Old Partitioned Fast
  • 20. TRANSFORMATION ORIGINAL { id: df45er87c78df, sender: “Info”, destination: “39345123456”, parts: 2, price: 100, client: “Demo”, time: “2014-06-02 14:47:58”, country: “IT”, network: “Wind”, type: “SMS”, … } { client: “Demo”, type: “SMS”, country: “IT”, network: “Wind”, bucket: “2014-06-02 14:45:00”, traffic: 2, expenses: 200 } COMPUTED
  • 21. CODE TridentState tridentState = topology .newStream("CoreEvents", buildKafkaSpout()) .parallelismHint(4) .each( new Fields("bytes"), new CoreEventMessageParser(), new Fields("time", "client", "network", "country", "type", "parts", "price")) .each( new Fields("time"), new QuarterTimeBucket(), new Fields("bucket")) .project(new Fields("bucket", "client", "network", "country", "type", "traffic", "expenses“)) .groupBy(new Fields("bucket", "client", "network", "country", "type")) .persistentAggregate(getStateFactory(), new Fields("traffic", "expenses"), new Sum(), new Fields("trafficExpenses")) .parallelismHint(8);
  • 22. PERFORMANCE 1.500 PEAKREGULAR KAFKA 60.000 4.500 160.000 STORAGE 2.000 10.000 DASHBOARD 1 1
  • 23. TUNING STORAGE 1st Issue - Storage Random access – 1.500 w/s limit Staged approach – 30.000 w/s limit No locks – isolated Scalable – each worker it’s stage Main table indexing nicely Doesn’t affect reading
  • 24. STAGED WRITES Worker 1 Main Table Merge Worker 2 Stage Table 1 Stage Table 2 MergeWrite Write
  • 25. TUNING TOPOLOGY 2nd Issue - Serialization 0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 Raw/s Expanded/s Writes/s 200 KB 1 MB 4 MB 8 MB 16 MB 24 MB Plateauing
  • 26. SERIALIZATION 0 200 400 600 800 1,000 1,200 S [s] S [byte] S [% CPU] D [s] D [% CPU] CSV (Plain) CSV (Deflate) CSV (GZip) Jackson (Plain) Jackson (GZip) Jackson Smile Java Object Kryo
  • 27. MEASURE AXIS Max spout pending SQL workers Kafka fetch speed DB write speed Kafka / DB ratio Capacity DB batch size Kafka fetch size Latency METRICS Serialization …
  • 28. MONITOR STORM UI TOPOLOGY
  • 29. METRICS GRAPHITE
  • 30. GOTCHAS Version 0.9.1 Partially in flux Kafka integration Message & topology versioning Performance tuning
  • 31. Lambda Architecture NEXT? Master Dataset Real-time Views Serving LayerBatch Layer Speed Layer New Data Query Query Batch Views
  • 32. http://storm.incubator.apache.org RESOURCES http://lambda-architecture.net http://kafka.apache.org
  • 33. http://www.gimp.org PRESENTATION TOOLS http://www.pictaculous.com http://www.colourlovers.com http://www.easycalculation.com http://paletton.com
  • 34. QUESTIONS?

×