Uploaded on

Big data seminar Trento :

Big data seminar Trento :

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • weather channel app is the fifth most downloaded app of all time -
  • An associative map could be done by the level of significance for the linear association calculated between a generic time vector (DKTN) and a time coupled gridded data stack (Weather time consecutive layers). The significant impacted areas emerges geographically where the social unidimensional signal is coherent and linearly correlated with the correspondent time vector of pixel/box values inside the grid. Tree significance levels are generally considered in function to the p.value of null significance statistical test (NO:p >0.1 Weak :p >0.05 Strong : p<0.05)
  • Method 's strength is due to the validity of assumption that exist a linear association between weather process and " verbosity " concerning words semantically linked to weather sensations felt by people. This is the basis of geographical matching between social streams and weather data. Method 's gain is also the overcoming of the bias due to "false tweet" and population density lackness.


  • 1. Seminars BigDataTrento 26/03/2013Via Sommarive, 18 – sala EIT ICT Labs.Alfonso Crisci - a.crisci@ibimet.cnr.itValentina Grasso - grasso@lamma.rete.toscana.itImage: weather events and social media streams: bigdataapproach for impact mapping
  • 2. Social media and SEO are theinformation web rivers available.Are they useful or not?That is the question ( W. Shakespeare).
  • 3. Social Media are Data• contents (UGC)• conversation• connection• collaboration• communityA big lens for crowd behaviour essentialy by:COMMUNITY membership and CONVERSATION5C
  • 4. Why Social Media & Big Data?User’s Generated Content is the actuallargest world mine of data for everypurposes.Perfoming data mining on these kind ofdata involves many tasks and computationalservices for parsing and informationextration concerning:GeoreferencingSocial Network analiticsSemantic processingInformation Rendering and visualisationsCo-Inference with other informative sourcesRetrieval and stocking SM streams
  • 5. Now there are platforms to realize thedata meet-upMapReduceParallel computationRHadoop
  • 6. Today code working….RHadoopRevolution Analytics is only an example!
  • 7. Social MediaWeatherversusIntrinsically Big Data
  • 8. SM & weather are connected!!Plenty of weather content on SM• Weather is a common conversation topic• Services push the personalization of weatherforecast• Weather perceived has local dimension• Weather could become a "emergency" issue
  • 9. Considering severe weather events..WhereWhoWhenThey happensin the space and in the timeand troughout media a SM buildtrigger an informative frameon the Web-spherea deep analogywith WEB processes exist!
  • 10. Weather as emergency issuemain features•FREQUENT: vs to other emergencies•FAMILIAR: people deal with weather daily•PREDICTABLE: important for warnings•LOCATED: specific spatial and temporaldimension#fires#earthquake#chemical#nuclear#disaster#health#terrorism
  • 11. Weather as an operational context where community mayincrease "resilience" attitude.In emergency "behaviours" modulate "impacts" on society.If Im aware and prepared I act responsibly.US tornado warning:people get used to "weatherwarnings" and they learnt to beproactive in protection.Enhance the resilience of communities as the aim
  • 12. Changing climate - changing awarenessIn Italy and Europe in the last 10 years climate changemade us more exposed to extreme weather events -"preparedness"Tornado hits: US - Italy 1999-2009Geographical spreading andmagnitude of eventsare importantfor awareness
  • 13. Lovely (or less) Meteo SM fakes ..are everywhere…Information verification become a must!WelcomeBigdata!
  • 14. Verification is a questionof time event shape and coherencystartpeakdeclineweather phenomena andsocial/communication streamsas "analogue" time delayedinformation wavestime
  • 15. …..and geography as well
  • 16. real physical process& information flows… dynamic informations warpingmeans to explore theTime coherence between[ or its mathematical representation!!!!]
  • 17. In a multidimesional space orbetter in every time-varyingsystems ( as the atmosphere oras the “WEB information seas” )some structures ever could bedetected.Uncovering the Lagrangian Skeleton of TurbulenceMarthur et al.Phys Rev Lett. 2007 Apr 6;98(14):144502.Epub 2007 Apr 4.Lagrangian coherent structures (LCS)well knownin ecologyand fluid dynamicsWhen two or more time-varying systemsare connected a supercoherence could bedetected if processes are linked.
  • 18. The link structurebetween SM and weathercould be donehypothetically by aopportune Hierarchymodel (Theory of middle-number systemsWeinberg 1975).Social media and weatherrelationships are surelyan Organized Complexity.Many parts to bedeterministicallypredicted, too few to bestatistically forecasted.Agent-Based Modeling of Complex SpatialSystems Yuan, University of Oklahoma
  • 19. SMERST 2013: Social Media and Semantic Technologies in15-16 April 2013, University of Warwick, CoventryUKDisaster 2.0project
  • 20. Weather event: early heat wave on 5-7 April 2011Working case on Italian Twitter-sphere• investigate time/spacecoherence between theevent extension and itssocial footprint on Twitter• semantic analysis ofTwitter stream on/offpeaks daysResearch objectives
  • 21. Heat wave as a good caseEmergency as consequence of "behaviour"Communication is key: "how to act"
  • 22. Heat wave: definitionits a period with persistent T° above theseasonal mean. Local definition dependsby regional climatic context.Severe weatherrefers to any dangerousmeteorological phenomenawith the potential to causedamage, serious socialdisruption, or loss ofhuman life.[WMO]Types of severe weatherphenomena vary,depending on the latitude,altitude, topography, andatmospheric conditions.Ref:
  • 23. To overcome every SM& Weather complexitiesa 5-point :road map• Identify a 1-dimensional time flux of information from SM’sworld• Detection of every local statistical linear association of this onein a parametric –physical- spacetime representation ( timespatial grid of data).• Mapping the significance in classes previously determined.• Pattern verification with observations.• Semantics and textual mining confirms.• Community analisys of SM streams to detect users filters
  • 24. Target and ProductsStakeholders:•forecasters•institutional stakeholders•EM communities•media agentsProducts:•DNKT sematic based SMstream metric•The significant areas whereassociation of the SM timevector (DNKT) and coupledtime gridded data stack ofweather paraemeters = spatialassociative map•A semantic analysis Twitterstream:- clustering-word clouds-SNA improvesDetect areas where its worthfocusing attention, also forcommunication purpose.Target
  • 25. Data usedHeat wave period considered (7-13 April 2011)Social- Using Twitter API key-tagged (CALDO-AFA-SETE) 6069tweets collected through geosearchservice for italian area.- Retweets and replies included (full volume stream)Climate & Weather (7-10 April 2011)- Urban daily maximum T°- Daily gridded data (lon 5-20 W lat 35-50)WRF-ARW model T°max daily data (box 9km)
  • 26. Semantized Twitter streammetricsDNKT shows time coherence with daily profiles of areal averaged temperature*Critical days identified as numerical neighbour of peaks (7-8-9-April):social "heaty days"DNKT - "daily number of key-tagged tweets"***
  • 27. The associative map as a toolSemantic based social stream in1D * time space (DNKT)Weather informativelayers in 2D time* spaceLinearAssociationStatisticallybasedVerifierby pixelGeographicAssociativeMap(2D space)
  • 28. Impacted areas in evidenceIts a weather map atX-rays:Twitter streamis used as a"contrast medium"to visualize impactedareas.This is not a Twitter map
  • 29. Associative maps fits wellUrban maximum T°over 28 C° on 9 Aprilwhere & when
  • 30. Semantic analitics- Corpus creationDNKT classification by heat-wave peak days:heat days ( 7-8-9 April) no-heat days (6-10-11 April).- Terms Word Clouds (min wd frequency>30)heat days vs no-heat daysClustering associated termsTerm frequency ranking comparison- Hashtag Word Cloudsheat days vs no-heat daysR Stat 15.2 Packages used:tm (Feinerer and Hornik, 2012) & wordcloud (Fellows , 2012)heat days
  • 31. WordClouds of terms(excluding key-tag caldo-afa-sete)heat daysno-heat days
  • 32. Terms association clusteringheat days no heat days"heat" is THE conversation topic "heat" is marginal to the conversation topic
  • 33. heat days
  • 34. Terms frequency rankingno heat N=2608 heat N=3461oggi 6.0% oggi 8.3% 1°sole 5.5% troppo 7.7% 2°troppo 4.1% sole 5.9% 3°
  • 35. Hashtags WordCloudsheat daysno-heat days
  • 36. On peak days:- widening of lexical base during "heat critical days" - heat as aconversation topic- ranking of terms (i.e.:adjectives as "troppo"!) is useful to detect change incommunication during climatic stress- geographic names appears in terms and hashtags wordsets ("#milano" !).This fits with recent researches on "social media contribution tosituational awareness during emergencies".Semantic results
  • 37. Snow eventsSNA of keytagged social media streamsBegin 10 feb 2013End 11 feb 2013The Graph metrics of SM streams are dynamics.The graph centrality analisys of Media and Istitutionsmay provide very useful parametersforWeather Event follow-up.#firenzeneve
  • 38. Conclusions- Methodology for a social "x-rays" ofa weather event: Semantized SMstream could become as a "contrastmedium" to understand the socialimpact of severe weather events- Methodology of social geosensingmining is able to map the severeweather impacts and overcome theweakening inside social media data.Weather as a key emergency context where its worth working oncommunity resilience - also with the help of social insightful contents.
  • 39. Reproducible R codesocialsensing Code & Data Recipes in
  • 40. #thanksContacts:Crisci Alfonso & Valentina Grassomail: a.crisci@ibimet.cnr.itmail: grasso@lamma.rete.toscana.itTwitter: @alf_crisci
  • 41. #nowquestions(slowly please if is possible)