Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Semantic Approach to
Big Data and Event Processing
Listening to the pulse of our
cities fusing Social Media
Streams and Ca...
Agenda
 Context
 Problem
 Experimental setting
 Solution
 Evaluation
 Conclusions
8/10/2015 @manudellavalle - http:/...
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
8/10/2015 @...
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
because the...
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams...
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams...
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams...
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams...
and it is tracking changes with a decreasing delay
8/10/2015 @manudellavalle - http://emanueledellavalle.org
and it is tracking changes with a decreasing delay
Data source By when Frequency Delay
Census data 100s year years months
...
Data pile up without making decision any easier
I have to decide:
A or B?
Why not C?
What if D?
mayor
8/10/2015 @manudella...
But smarter Big Data can …
…advance our ability to feel the pulse of our cities
fusing all those
data sources
making sense...
Can we collect, analyse and repurpose
• social media and
• Call Data Records
to allow
• perceiving emerging patterns and
•...
Can we collect, analyse and repurpose
• social media captured at place and events and
• privacy-preserving aggregates of C...
How to set up an experiment?
[photo: https://www.flickr.com/photos/myfuturedotcom/6053042920]
Question Answer
Which city? ...
What's Milan Design Week?
[map: http://www.fuorisalone.it]
The Milan Design Week (MDW) is a city-scale event
• held yearly...
Ingredients of the proposed solution
 Big Data technologies
- Address "velocity" of data streams in memory
- Address "vol...
CitySensing - a solution for event managers (2013)
F. Antonelli, M.Azzi,
M.Balduini, P.Ciuccarelli,
E.Della Valle, R. Larc...
CitySensing - a solution for casual audience (2014)
M.Balduini, E.Della Valle, M.Azzi, R.Larcher, F.Antonelli, and P.Ciucc...
How CitySensing works – step 0
Set up a conceptual model (FraPPE) to master the variety in the data sources
M.Balduini, E....
How CitySensing works – step 0
 FraPPE
• Goal: a vocabulary to represent heterogeneous spatio-
temporal data to support v...
How CitySensing works – step 1
For every pixel compute the volume of Call Data Records
(using privacy-preserving aggregati...
How CitySensing works – step 2
Find the anomalous pixels comparing the current
volumes with a model of the volumes in this...
How CitySensing works – step 3
Map anomalies to the districts of Milano Design Week
Brera
Tortona
What's
this?
Real data r...
How CitySensing works – step 4
For every anomalous pixel capture the hashtags and semantic
entities named in the social me...
How CitySensing works – step 5
Take away the hashtags and semantic entities that are
systematically used
Brera
Tortona
Rea...
Logical architecture of CitySensing – setup time
Analyse Data Stream
Build Models
Capture Data Stream Capture Static Data
...
Logical architecture of CitySensing – run time
Analyse Data Stream
Build Models
Detect Anomalies
Capture Data Stream
Visua...
Capturing static data via FraPPE
 The frame duration was fixed to
15 minutes
 Milano area was covered with
• 1 grid (100...
Processing Telecom Italia Call Data Records
 1.92 Mln Gaussian models were built
• one for each pixel (i.e., for each fra...
Processing Telecom Italia Call Data Records
 Volume of CDR captured in Milan during the Design Week
 Calls, SMS and Inte...
Do CDR-anomalous pixels relate to events?
 CDR-anomalous pixels =pixels in which the anomaly
index is high (>+2σ and <-2σ...
Do CDR-anomalous pixels relate to events?
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001...
Do CDR-anomalous pixels relate to events?
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001...
Do CDR-anomalous pixels relate to events?
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001...
Processing Social Streams
 The machinery: the Streaming Linked Data framework
M.Balduini, E.Della Valle, D.Dell'Aglio, M....
Processing Social Streams
 Decoration at work
Happily into a bottle of Heineken
bear #heinekendesignweek
@ the Heineken M...
Processing Social Streams
 predictive models were built
• For hastags and semantic entities systematically present
• Usin...
Processing Social Streams
 Usage of #milan in the weeks around Milan Design Week
 Subtracting the predicted usage of #mi...
Processing Social Streams
 The difference between the observed and the predicted
usage of #milan perfectly fits the usage...
Processing Social Streams
 Geo-references micro-posts captured, semantically annotated,
cleansed using the predictive mod...
Do socially active pixels relate to events?
 socially active pixels =pixels in which we captured social
media that talk a...
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:0...
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:0...
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:0...
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:0...
Anomalous Socially active Intersection Similar?




Are CDR-anomalous and socially active pixels similar?
 Which of t...
Are CDR-anomalous and socially active pixels similar?
 More formally
• Jaccard
• E.g.,
J(A,B) = 8/11 J(A,B) = 3/11
A B A
...
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
10...
Visualizing for a casual audience
8/10/2015 @manudellavalle - http://emanueledellavalle.org
See it in action!
http://youtu.be/MOBie09NHxM
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Evaluation methodology for the casual audience
 Guessability study
• Can you guess what I mean without any explanation?
...
Evaluation of interface guessability
8/10/2015 @manudellavalle - http://emanueledellavalle.org
The patters you should have got
 The CDR-anomaly and the social activity is
Correlated Partially correlated Not correlate...
Evaluation of interface guessability
Q: In Brera District
the volume of social
media signal is
partially correlated
with t...
Evaluation of interface guessability
Q: In Porta Romana
the volume of social
media signal is
strongly correlated
with the ...
Evaluation of interface guessability
Q: In Tortona District
the volume of social
media signal is
strongly correlated
with ...
Back to the research question
[photo: https://www.flickr.com/photos/debord/4932655275]
Can we collect, analyse and repurpo...
Take home message … guess it :-)
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Take home message … guess it :-)
Emanuele Della Valle
emanuele.dellavalle@polimi.it
http://emanueledellavalle.org
8/10/201...
Acknowledgements
 Politecnico di Milano
• DEIB
– What
- Scientific direction
- Semantic technologies
- Stream Processing
...
Semantic Approach to
Big Data and Event Processing
Listening to the pulse of our
cities fusing Social Media
Streams and Ca...
Upcoming SlideShare
Loading in …5
×

Listening to the pulse of our cities fusing Social Media Streams and Call Data Records

229 views

Published on

Listening to the pulse of our cities fusing Social Media Streams and Call Data Records
Prof Emanuele Della Valle - DEIB Politecnico di Milano

Published in: Data & Analytics
  • Be the first to comment

Listening to the pulse of our cities fusing Social Media Streams and Call Data Records

  1. 1. Semantic Approach to Big Data and Event Processing Listening to the pulse of our cities fusing Social Media Streams and Call Data Records Emanuele Della Valle DEIB - Politecnico di Milano @manudellavalle emanuele.dellavalle@polimi.it http://emanueledellavalle.org 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  2. 2. Agenda  Context  Problem  Experimental setting  Solution  Evaluation  Conclusions 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  3. 3. The digital reflection of our cities is sharpening [photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg] 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  4. 4. The digital reflection of our cities is sharpening [photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg] because the urban environment is captured in open datasets 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  5. 5. The digital reflection of our cities is sharpening [photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg] and streams of information flows through our cities thanks to 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  6. 6. The digital reflection of our cities is sharpening [photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg] and streams of information flows through our cities thanks to the pervasive deployment of sensors 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  7. 7. The digital reflection of our cities is sharpening [photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg] and streams of information flows through our cities thanks to the wide adoption of smart phones 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  8. 8. The digital reflection of our cities is sharpening [photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg] and streams of information flows through our cities thanks to the usage of (location-based) social networks 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  9. 9. and it is tracking changes with a decreasing delay 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  10. 10. and it is tracking changes with a decreasing delay Data source By when Frequency Delay Census data 100s year years months Newspaper 100s year days 1 day Weather sensors 10s year hours/minutes hours/minutes TV news 10s years hours minutes Traffic sensors years 15 minutes minutes Call Data Recors years 15 minutes hours Social media years seconds seconds IoT recently milliseconds milliseconds 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  11. 11. Data pile up without making decision any easier I have to decide: A or B? Why not C? What if D? mayor 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  12. 12. But smarter Big Data can … …advance our ability to feel the pulse of our cities fusing all those data sources making sense of the fused information mayor Definitely E! to improve decision making and deliver innovative services 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  13. 13. Can we collect, analyse and repurpose • social media and • Call Data Records to allow • perceiving emerging patterns and • observing their dynamics? Let's focus on a concrete research question [photo: https://www.flickr.com/photos/debord/4932655275] 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  14. 14. Can we collect, analyse and repurpose • social media captured at place and events and • privacy-preserving aggregates of Call Data Records to allow visually • perceiving emerging patterns and • observing their dynamics? More precisely, the research question is [photo: https://www.flickr.com/photos/debord/4932655275] 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  15. 15. How to set up an experiment? [photo: https://www.flickr.com/photos/myfuturedotcom/6053042920] Question Answer Which city? Milan Comparing what? Milan Design Week vs. Milan in general Experimental subjects? Event Managers & casual audience 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  16. 16. What's Milan Design Week? [map: http://www.fuorisalone.it] The Milan Design Week (MDW) is a city-scale event • held yearly in Milan, • featuring around 1,200 events • in 500+ places spread across the city and • attracting about half a million people from all over the world. 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  17. 17. Ingredients of the proposed solution  Big Data technologies - Address "velocity" of data streams in memory - Address "volume" of data that do not fit in memory  semantic technologies - Address "variety" using Ontology Based Data Access - Named Entity Recognition and Linking  data science - Statistical modelling - detecting anomalies  Visual analytics - Allow no-expert access to data - Tell stories out of data 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  18. 18. CitySensing - a solution for event managers (2013) F. Antonelli, M.Azzi, M.Balduini, P.Ciuccarelli, E.Della Valle, R. Larcher: City sensing: visualising mobile and social data about a city scale event. AVI 2014: 337-338 http://jol.telecomitalia.com/jols kil/citysensing/ 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  19. 19. CitySensing - a solution for casual audience (2014) M.Balduini, E.Della Valle, M.Azzi, R.Larcher, F.Antonelli, and P.Ciuccarelli: CitySensing: Fusing City Data for Visual Storytelling. IEEE MultiMedia. TO APPEAR http://jol.telecomitalia.com/jolskil/citysensing/ http://citysensing.fuorisalone.it/ 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  20. 20. How CitySensing works – step 0 Set up a conceptual model (FraPPE) to master the variety in the data sources M.Balduini, E. Della Valle: FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to support visual analytics. ISWC 2015 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  21. 21. How CitySensing works – step 0  FraPPE • Goal: a vocabulary to represent heterogeneous spatio- temporal data to support visual analytics  FraPPE offers an homogenous view to the visual analytics interface built on heterogeneous data 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  22. 22. How CitySensing works – step 1 For every pixel compute the volume of Call Data Records (using privacy-preserving aggregation) Real data recorded on 13 April 2013 between 13:00 and 00:00 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  23. 23. How CitySensing works – step 2 Find the anomalous pixels comparing the current volumes with a model of the volumes in this time period Real data recorded on 13 April 2013 between 13:00 and 00:00 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  24. 24. How CitySensing works – step 3 Map anomalies to the districts of Milano Design Week Brera Tortona What's this? Real data recorded on 13 April 2013 between 13:00 and 00:00 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  25. 25. How CitySensing works – step 4 For every anomalous pixel capture the hashtags and semantic entities named in the social media streams Brera Tortona What's this? Real data recorded on 13 April 2013 between 13:00 and 00:00 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  26. 26. How CitySensing works – step 5 Take away the hashtags and semantic entities that are systematically used Brera Tortona Real data recorded on 13 April 2013 between 13:00 and 00:00 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  27. 27. Logical architecture of CitySensing – setup time Analyse Data Stream Build Models Capture Data Stream Capture Static Data MDW 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  28. 28. Logical architecture of CitySensing – run time Analyse Data Stream Build Models Detect Anomalies Capture Data Stream Visualize Analysis Store Analysis Capture Static Data MDW 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  29. 29. Capturing static data via FraPPE  The frame duration was fixed to 15 minutes  Milano area was covered with • 1 grid (100x100) • 10,000 cells • 250x250 meters in each cell (the size of the mobile network cells in the centre of Milan)  During the Milano Design Week a total of 5.76 Mln pixel were captured  +1000 events in +600 places where collected using the crowd-sourced databases of fuorisalone.it, breradesigndistrict.it and tortonaroundesign.com thanks to a partnership with studiolabo Cells in which there are places hosting Milan Design Week 2013 events 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  30. 30. Processing Telecom Italia Call Data Records  1.92 Mln Gaussian models were built • one for each pixel (i.e., for each frame and cell) • grouping the frames by working and week-end days • using two months of Call Data Records, and • verifying volume of CDR has a Gaussian distribution with an Anderson-Darling test with a significance of 0.05  Built on Pig, R e Cascalog  The processing on 7 m1.large EC2 machines took 24 hours Bad case Good case Histogram Histogram Q-QPlot Q-Qplot 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  31. 31. Processing Telecom Italia Call Data Records  Volume of CDR captured in Milan during the Design Week  Calls, SMS and Internet access were aggregated (with privacy-preserving methods) and an anomaly index was computed for each of the 5.76 Mln pixel  The processing of 1 day on 7 m1.large EC2 took 20 mins What 2013 2014 Calls 16,743,875 19,719,629 SMSs 19,454,497 20,240,485 Internet data accesses 137,381,761 197,767,245 [image: https://cerijayne.files.wordpress.com/2011/10/outliersss.png] 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  32. 32. Do CDR-anomalous pixels relate to events?  CDR-anomalous pixels =pixels in which the anomaly index is high (>+2σ and <-2σ)  To test if the anomalous pixels were related to the events of the Milan Design Week • We used three ground truth – the pixel of Milan – the pixels of Brera district – the pixels of Tortona district where there was at least an event of Milan Design Week 2013 • We compute – Precision – Recall of the anomalous pixels to find pixels in those three ground truths 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  33. 33. Do CDR-anomalous pixels relate to events? 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 MilanBreraTorotna 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 Tuesday Wednesday Thursday Friday Saturday Sunday precision 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  34. 34. Do CDR-anomalous pixels relate to events? 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 MilanBreraTorotna 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 Tuesday Wednesday Thursday Friday Saturday Sunday recall 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  35. 35. Do CDR-anomalous pixels relate to events? 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 MilanBreraTorotna 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 Tuesday Wednesday Thursday Friday Saturday Sunday precision recall 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  36. 36. Processing Social Streams  The machinery: the Streaming Linked Data framework M.Balduini, E.Della Valle, D.Dell'Aglio, M.Tsytsarau, T.Palpanas, and C.Confalonieri: Social Listening of City Scale Events Using the Streaming Linked Data Framework. International Semantic Web Conference (2) 2013: 1-16 Stream Bus AnalyserDecorator Adapter Publisher VisualizerStream HTTP HTTP Data Source Streaming Linked Data Server HTML5 Browser 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  37. 37. Processing Social Streams  Decoration at work Happily into a bottle of Heineken bear #heinekendesignweek @ the Heineken Magazzini City-Scale Event: Milano Design Week Event: Heineken Design Week Location: The Magazzini hosts takesPlaceIn  M.Balduini, A.Bozzon, E.Della Valle, Y.Huang, G-J Houben: Recommending Venues Using Continuous Predictive Social Media Analytics. IEEE Internet Computing 18(5): 28-35 (2014) 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  38. 38. Processing Social Streams  predictive models were built • For hastags and semantic entities systematically present • Using a Holt-Winter method • grouping the frames by – working and week-end days and – Early morning, morning, afternoon, evening, and late night • Analysing 300,000 geo-located micro-posts collected other 6 months in Milano area (november 2013, aprile 2014) • It takes few seconds per hashtag/semantic entity on a 60€/month VM in a IaaS Data Fitted Forecast Lower 2,5% Upper 97,5% 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  39. 39. Processing Social Streams  Usage of #milan in the weeks around Milan Design Week  Subtracting the predicted usage of #milan 200 – 700 700 – 1100 1100 – 1400 1400 – 1900 1900 – 200 200 – 700 700 – 1100 1100 – 1400 1400 – 1900 1900 – 200 WD WE WD WE WD WE WD WE WD Milan Design Week WD WE WD WE WD WE WD WE WD 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  40. 40. Processing Social Streams  The difference between the observed and the predicted usage of #milan perfectly fits the usage of #mdw (the official hashtag of Milan Design Week) 200 – 700 700 – 1100 1100 – 1400 1400 – 1900 1900 – 200 200 – 700 700 – 1100 1100 – 1400 1400 – 1900 1900 – 200 WD WE WD WE WD WE WD WE WD Milan Design Week Anomalous usage of #milan Usage of #mdw 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  41. 41. Processing Social Streams  Geo-references micro-posts captured, semantically annotated, cleansed using the predictive models and analyzed in Milan area  For each pixel with at least 1 micro-post we computed  The volume related to Milano Design Week  The top-10 hashtags  The top-3 locations/events  Real-time processing was possible with our in-memory C-SPARQL engine and the Streaming Linked Data framework on a 20€/month VM in a IaaS What 2013 2014 Geo-located micropost 57,154 21,782 Linked to Milano Design Week 3,569 3,499 Linked to a specific location/event 761 547 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  42. 42. Do socially active pixels relate to events?  socially active pixels =pixels in which we captured social media that talk about Milan Design Week  To computes • precision • recall of the socially active pixels in find pixels in pixels in the three ground truths about Milan, Brera district and Tortona district 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  43. 43. 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 Do socially active pixels relate to events? MilanBreraTorotna Tuesday Wednesday Thursday Friday Saturday Sunday precision 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  44. 44. 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 Do socially active pixels relate to events? MilanBreraTorotna Tuesday Wednesday Thursday Friday Saturday Sunday recall 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  45. 45. 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 Do socially active pixels relate to events? MilanBreraTorotna Tuesday Wednesday Thursday Friday Saturday Sunday precision recall 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  46. 46. 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.2 0.4 0.6 0.8 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 Do socially active pixels relate to events? MilanBreraTorotna Tuesday Wednesday Thursday Friday Saturday Sunday precision recall 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  47. 47. Anomalous Socially active Intersection Similar?     Are CDR-anomalous and socially active pixels similar?  Which of the following four scenarios? 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  48. 48. Are CDR-anomalous and socially active pixels similar?  More formally • Jaccard • E.g., J(A,B) = 8/11 J(A,B) = 3/11 A B A B J(A,B) = |A ∩ B| |A∪B| 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  49. 49. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0904:00 0907:00 0910:00 0913:00 0916:00 0919:00 0922:00 1001:00 1004:00 1007:00 1010:00 1013:00 1016:00 1019:00 1022:00 1101:00 1104:00 1107:00 1110:00 1113:00 1116:00 1119:00 1122:00 1201:00 1204:00 1207:00 1210:00 1213:00 1216:00 1219:00 1222:00 1301:00 1304:00 1307:00 1310:00 1313:00 1316:00 1319:00 1322:00 1401:00 1404:00 1407:00 1410:00 1413:00 1416:00 1419:00 1422:00 1501:00 Are CDR-anomalous and socially active pixels similar? BreraTorotna Tuesday Wednesday Thursday Friday Saturday Sunday recall CDR-anomalous recall socially active Jaccard 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  50. 50. Visualizing for a casual audience 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  51. 51. See it in action! http://youtu.be/MOBie09NHxM 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  52. 52. Evaluation methodology for the casual audience  Guessability study • Can you guess what I mean without any explanation?  E.g. Dinosaur extinction "The Shining" by Stephen King 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  53. 53. Evaluation of interface guessability 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  54. 54. The patters you should have got  The CDR-anomaly and the social activity is Correlated Partially correlated Not correlated 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  55. 55. Evaluation of interface guessability Q: In Brera District the volume of social media signal is partially correlated with the value of mobile anomaly signal A: 0 0.2 0.4 0.6 0.8 1 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  56. 56. Evaluation of interface guessability Q: In Porta Romana the volume of social media signal is strongly correlated with the value of mobile anomaly signal A: 0 0.2 0.4 0.6 0.8 1 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  57. 57. Evaluation of interface guessability Q: In Tortona District the volume of social media signal is strongly correlated with the value of mobile anomaly signal A: 0 0.2 0.4 0.6 0.8 1 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  58. 58. Back to the research question [photo: https://www.flickr.com/photos/debord/4932655275] Can we collect, analyse and repurpose • social media captured at place and events and • privacy-preserving aggregates of Call Data Records to allow visually • perceiving emerging patterns and • observing their dynamics? Yes! at least, in Milano Design Week 2013 and 2014 [photo: https://flic.kr/p/beuDaX ] 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  59. 59. Take home message … guess it :-) 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  60. 60. Take home message … guess it :-) Emanuele Della Valle emanuele.dellavalle@polimi.it http://emanueledellavalle.org 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  61. 61. Acknowledgements  Politecnico di Milano • DEIB – What - Scientific direction - Semantic technologies - Stream Processing - Data science – Who - Emanuele Della Valle - Marco Balduini • Density Design Lab – What - Visual analytics – Who - Paolo Ciuccarelli - Matteo Azzi  Telecom Italia • SKIL Lab – What - Big Data technology - Data Science – Who - Fabrizio Antonelli - Roberto Larker  Funding agency 8/10/2015 @manudellavalle - http://emanueledellavalle.org
  62. 62. Semantic Approach to Big Data and Event Processing Listening to the pulse of our cities fusing Social Media Streams and Call Data Records Emanuele Della Valle DEIB - Politecnico di Milano @manudellavalle emanuele.dellavalle@polimi.it http://emanueledellavalle.org 8/10/2015 @manudellavalle - http://emanueledellavalle.org

×