More Related Content Similar to Realtime Analytics with Storm and Hadoop Similar to Realtime Analytics with Storm and Hadoop(20) More from DataWorks Summit More from DataWorks Summit(20) Realtime Analytics with Storm and Hadoop13. Goals of data system
• Low latency reads
• Low latency writes
• Fault-tolerant
• Scalable
4
14. What is a data system?
Query = Function(All data)
5
15. Is there a general purpose way to
compute arbitrary functions in
realtime?
6
24. Precomputation
All Precomputed
Query
data Function
view
Function
15
27. Hadoop precomputation
Batch view #1
e wo rkflow
MapR educ
All data
MapRed
uce work
fl ow Batch view #2
18
34. Not quite...
• A batch workflow is too slow
• Views are out of date
Absorbed into batch views Not absorbed
Now
Time
25
35. Not quite...
Just a few hours
• A batch workflow is too slow of data!
• Views are out of date
Absorbed into batch views Not absorbed
Now
Time
25
41. Precomputation
All Precomputed
batch view
data
Query
Precomputed
realtime view
New data stream
30
42. Precomputation
All Hadoop Precomputed
batch view
data
Query
Precomputed
realtime view
New data stream
30
43. Precomputation
All Hadoop Precomputed
batch view
data
Query
Precomputed
realtime view
New data stream Storm
30
44. Storm
Realtime view #1
New data stream
Realtime view #2
Storm 31
46. Storm
Source stream
Source stream
Storm
33
54. Streams
Tuple Tuple Tuple Tuple Tuple Tuple Tuple
Unbounded sequence of tuples 41
55. Spouts
Source of streams 42
61. Stream grouping
When a tuple is emitted, to which task does it go to? 48
62. Stream grouping
• Shuffle grouping: pick a random task
• Fields grouping: mod hashing on a subset of tuple fields
• All grouping: send to all tasks
• Global grouping: pick task with lowest id
49
70. Precomputation
All Precomputed
Query
data Hadoop
views
Storm
+
Storm
57
74. Reach
Follower
Distinct
Tweeter Follower follower
Follower
Distinct
URL Tweeter follower
Follower
Follower Distinct
Tweeter follower
Follower
61
77. Storm + HDFS
HDFS
New data Storm Distributed RPC
Use HBase-like strategy to reliably store state
within Storm bolts
64
78. Storm + HDFS
https://github.com/nathanmarz/storm-contrib/tree/master/storm-state
storm-state library 65