The document discusses analyzing streaming data using streaming algorithms. It explains how to calculate the mean of a dataset incrementally as new data points are added by keeping a running sum and incrementing the count of data points. It also questions how to calculate the median incrementally as new data points are added. The document then discusses real-time tradeoffs around location, availability, throughput, and latency for high-volume data. Finally, it concludes that big data is also about analyzing small things quickly and making the data accessible.
18. V
where time 21:00 - 23:00
count(*)
Under the hood
21:00 all = 1345 :00 = 45 :01 = 62 ...
22:00 all = 3221 :00 = 22 :01 = 19 ...
... ...
UK all = 228 user01 = 1 user14 = 12 ...
US all = 354 user01 = 15 user14 = 0 ...
MY all = 28 user01 = 0 user02 = 0 ...
...
#bigdataMY
19. Streaming algorithms
A = [a1, a2, a3, a4, a5]
mean(A) = sum it up / number of things
#bigdataMY
20. Streaming algorithms
A = [a1, a2, a3, a4, a5]
mean(A) = sum it up / number of things
now add another item a6...???
#bigdataMY
21. Streaming algorithms
A = [a1, a2, a3, a4, a5]
mean(A) = sum it up / number of things
now add another item a6...???
sum = sum + a6
inc(number of things)
#bigdataMY
22. Streaming algorithms
A = [a1, a2, a3, a4, a5]
mean(A) = sum it up / number of things
now add another item a6...???
sum = sum + a6
inc(number of things)
try this with median?
#bigdataMY
23. V Realtime tradeoffs
ity
loc
Ad
-ve
-ho
gh
c
Hi
High-volume
#bigdataMY
24. V Conclusion
Big Data also about the Little Things, done fast.
The devil is in the details.
Make it accessible.
#bigdataMY