Scalable distributed stream_processing

335 views
301 views

Published on

This is my presentation for week 3 of CSIS 638 presentations to be presented on 7/20/2011

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
335
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Scalable distributed stream_processing

  1. 1. Scalable Distributed Stream Processing Matt Garren
  2. 2. Aurora● Tuple Processor● Uses "Boxes and Arrows" paradigm for dataflow● Queries are continuous
  3. 3. Aurora: Operators● Filter● Union● WSort● Tumble● Map● XSection● Slide● Join● Resample
  4. 4. Aurora: Runtime● Scheduler● Storage Manager● Load Shedder● QoS
  5. 5. Distributed Stream Processing● Benefits of a Node Federation ○ Sharing/Multiplexing ○ Resiliency ○ Programmatically the same
  6. 6. Aurora*: Distributed Aurora● Federation of nodes running Aurora● Single Admin domain● Queries allowed on all nodes● Boxes partitioned to nodes● Overloaded nodes offload tasks● Decentralized reconfiguration
  7. 7. Medusa: Distributed Aurora● Multiple Administrative domains● "Participants"● Agoric system● "Contracts" with senders/receivers● Receiver "pays" sender● Goal: Stable "Economy"
  8. 8. Infrastructure Requirements● Naming and Discovery ○ Global namespace ○ Catalogs (Inter/Intra participant) ■ Operators ■ Schemas ■ Queries ■ Streams ■ Contacts ■ Event locations
  9. 9. Infrastructure Requirements● Routing ○ Before producing: register schema ○ Label events w/ stream name ○ Intermediary nodes forward (Intra) ○ Explicit connections to cross boundaries
  10. 10. Infrastructure Requirements● Message Transport ○ Individual TCP connections bad idea ○ Multiplex messages ■ Single TCP connection ■ Message scheduler● Remote Definition ■ Users only see what they care about
  11. 11. Load Sharing● Load Share Daemon per node● Box Sliding● Box Splitting
  12. 12. Load Sharing● Considerations ○ How often to load share? ○ When is it not ok to offload? ○ What should be split?
  13. 13. Availability● Primary/Back up servers● Heartbeats to neighbors● Additional work at failover
  14. 14. QoS● Intermediate/Output● Precision
  15. 15. Questions?

×