4. Desiderata
4
Faster Answers
Fewer Computations per Datum
From Analytics to Active Monitoring
From Batch Cycles to Sense and Respond
5. Faster Turnaround
9/18/20135
Act on the 80% of Data That Arrives Quickly
Then Correct as Late-Landing Data Arrive
Pull for Initial Result; Push for Updates?
6. Online Updates to Models
9/18/20136
Each day produces Big Data.
Whole history: HUMONGOUS DATA.
Update models based on new data only.
And perhaps exceptions / borderline cases from history.
7. “Embedded” Computation
9/18/20137
Move Computation Closer to Where Data are Generated
Monitor for Anomalies Where they Occur
(Sometimes) Compress into Sketches before Transmitting Data
Hadoop as Part of Serving vs Isolated Clusters?
8. Propagate Data Among Logical Neighbors Quickly
Multi-Resolution Approach at Different Time Scales
Challenge: Clustering into Logical Neighborhoods to Fit Problem
Localized / Contextual Computation
9/18/20138
12. Hadoop in Five Years
9/18/201312
Will Hadoop grow by adding features / options?
Will it branch: faster, lighter, approximate, embedded versions?
Truly huge version? With approximation / sampling / multi-resolution?
13. The Right Fit
9/18/201313
Multi-resolution sense and respond.
Details to neighbors, sketches and aggregates globally.
Migrate processes and storage to ingest points or logical neighbors.
Tune system-wide performance through human-machine dialog .