1. Clustering
• Sensor signal is not labeled. For classification, we need to
label first, e.g. by clustering events
2. Clustering
• Sensor signal is not labeled. For classification, we need to
label first, e.g. by clustering events
Truck? Car? Noise?
3. Clustering
Windowing Windowing Windowing Windowing Windowing
Convolute Convolute Convolute Convolute Convolute
random Calculate Calculate Calculate Calculate Calculate update
clusters distance distance distance distance distance clusters
Average Average Average Average Average
per cluster per cluster per cluster per cluster per cluster
4. Clustering in Hadoop
Subsequences
Raw data Data massage
(with lead-in/out)
...
<ts, {s1, s2, ...}>
<tsi-1,{1,s1}>
<ts1, >
<tsi , {0,s1}> Reduce
split 1 Map <ts ,{-1,s }>
i+1 1 (per tsi ) <ts2, >
...
<ts3, >
split 2 Map Reduce <ts4, >
... <ts5, >
split n Map Reduce <ts6, >
lead-in ts lead-out
lead-in/out needed to center
bumps (snapping)
5. Clustering in Hadoop
Subsequences
Raw data Data massage
(with lead-in/out)
...
<ts, {s1, s2, ...}>
<tsi-1,{1,s1}>
<ts1, >
<tsi , {0,s1}> Reduce
split 1 Map <ts ,{-1,s }>
i+1 1 (per tsi ) <ts2, >
...
<ts3, >
split 2 Map Reduce <ts4, >
... <ts5, >
split n Map Reduce <ts6, >
Some clever
partitioning of keys lead-in ts lead-out
lead-in/out needed to center
bumps (snapping)
18. Performance
• MapReduce: Techniques scale linearly (6 node cluster)
• Noticeable overhead on small amounts of data
Convolution Clustering
40,00
Runtime (hours)
30,00
20,00
10,00
0
3 days 10 days 1 month 3 months
Amount of sensor data
19. Performance
• MapReduce: Techniques scale linearly (6 node cluster)
• Noticeable overhead on small amounts of data
Convolution Clustering
40,00
Runtime (hours)
30,00
20,00 66 node
10,00
cluster
0
3 days 10 days 1 month 3 months
Amount of sensor data
20. Performance
• MapReduce: Techniques scale linearly (6 node cluster)
• Noticeable overhead on small amounts of data
Convolution Clustering
40,00
Runtime (hours)
30,00
20,00 66 node
10,00
cluster
0
3 days 10 days 1 month 3 months
Amount of sensor data
21. Multi-scale analysis
• Sensor signal is composite of events that happen at
different time-scales
• Passing truck (small), traffic jam (medium), seasonal
change (long scale)
• Try to de-compose signals in ‘natural’ timescales
• Basic idea:
• Convolute data at different scales (scale space)
• Subtract key convolutions (band-pass filters)