Published on

Big data seminar Trento :

Published in: Education, Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • weather channel app is the fifth most downloaded app of all time -
  • An associative map could be done by the level of significance for the linear association calculated between a generic time vector (DKTN) and a time coupled gridded data stack (Weather time consecutive layers). The significant impacted areas emerges geographically where the social unidimensional signal is coherent and linearly correlated with the correspondent time vector of pixel/box values inside the grid. Tree significance levels are generally considered in function to the p.value of null significance statistical test (NO:p >0.1 Weak :p >0.05 Strong : p<0.05)
  • Method 's strength is due to the validity of assumption that exist a linear association between weather process and " verbosity " concerning words semantically linked to weather sensations felt by people. This is the basis of geographical matching between social streams and weather data. Method 's gain is also the overcoming of the bias due to "false tweet" and population density lackness.

    1. 1. Seminars BigDataTrento 26/03/2013Via Sommarive, 18 – sala EIT ICT Labs.Alfonso Crisci - a.crisci@ibimet.cnr.itValentina Grasso - grasso@lamma.rete.toscana.itImage: weather events and social media streams: bigdataapproach for impact mapping
    2. 2. Social media and SEO are theinformation web rivers available.Are they useful or not?That is the question ( W. Shakespeare).
    3. 3. Social Media are Data• contents (UGC)• conversation• connection• collaboration• communityA big lens for crowd behaviour essentialy by:COMMUNITY membership and CONVERSATION5C
    4. 4. Why Social Media & Big Data?User’s Generated Content is the actuallargest world mine of data for everypurposes.Perfoming data mining on these kind ofdata involves many tasks and computationalservices for parsing and informationextration concerning:GeoreferencingSocial Network analiticsSemantic processingInformation Rendering and visualisationsCo-Inference with other informative sourcesRetrieval and stocking SM streams
    5. 5. Now there are platforms to realize thedata meet-upMapReduceParallel computationRHadoop
    6. 6. Today code working….RHadoopRevolution Analytics is only an example!
    7. 7. Social MediaWeatherversusIntrinsically Big Data
    8. 8. SM & weather are connected!!Plenty of weather content on SM• Weather is a common conversation topic• Services push the personalization of weatherforecast• Weather perceived has local dimension• Weather could become a "emergency" issue
    9. 9. Considering severe weather events..WhereWhoWhenThey happensin the space and in the timeand troughout media a SM buildtrigger an informative frameon the Web-spherea deep analogywith WEB processes exist!
    10. 10. Weather as emergency issuemain features•FREQUENT: vs to other emergencies•FAMILIAR: people deal with weather daily•PREDICTABLE: important for warnings•LOCATED: specific spatial and temporaldimension#fires#earthquake#chemical#nuclear#disaster#health#terrorism
    11. 11. Weather as an operational context where community mayincrease "resilience" attitude.In emergency "behaviours" modulate "impacts" on society.If Im aware and prepared I act responsibly.US tornado warning:people get used to "weatherwarnings" and they learnt to beproactive in protection.Enhance the resilience of communities as the aim
    12. 12. Changing climate - changing awarenessIn Italy and Europe in the last 10 years climate changemade us more exposed to extreme weather events -"preparedness"Tornado hits: US - Italy 1999-2009Geographical spreading andmagnitude of eventsare importantfor awareness
    13. 13. Lovely (or less) Meteo SM fakes ..are everywhere…Information verification become a must!WelcomeBigdata!
    14. 14. Verification is a questionof time event shape and coherencystartpeakdeclineweather phenomena andsocial/communication streamsas "analogue" time delayedinformation wavestime
    15. 15. …..and geography as well
    16. 16. real physical process& information flows… dynamic informations warpingmeans to explore theTime coherence between[ or its mathematical representation!!!!]
    17. 17. In a multidimesional space orbetter in every time-varyingsystems ( as the atmosphere oras the “WEB information seas” )some structures ever could bedetected.Uncovering the Lagrangian Skeleton of TurbulenceMarthur et al.Phys Rev Lett. 2007 Apr 6;98(14):144502.Epub 2007 Apr 4.Lagrangian coherent structures (LCS)well knownin ecologyand fluid dynamicsWhen two or more time-varying systemsare connected a supercoherence could bedetected if processes are linked.
    18. 18. The link structurebetween SM and weathercould be donehypothetically by aopportune Hierarchymodel (Theory of middle-number systemsWeinberg 1975).Social media and weatherrelationships are surelyan Organized Complexity.Many parts to bedeterministicallypredicted, too few to bestatistically forecasted.Agent-Based Modeling of Complex SpatialSystems Yuan, University of Oklahoma
    19. 19. SMERST 2013: Social Media and Semantic Technologies in15-16 April 2013, University of Warwick, CoventryUKDisaster 2.0project
    20. 20. Weather event: early heat wave on 5-7 April 2011Working case on Italian Twitter-sphere• investigate time/spacecoherence between theevent extension and itssocial footprint on Twitter• semantic analysis ofTwitter stream on/offpeaks daysResearch objectives
    21. 21. Heat wave as a good caseEmergency as consequence of "behaviour"Communication is key: "how to act"
    22. 22. Heat wave: definitionits a period with persistent T° above theseasonal mean. Local definition dependsby regional climatic context.Severe weatherrefers to any dangerousmeteorological phenomenawith the potential to causedamage, serious socialdisruption, or loss ofhuman life.[WMO]Types of severe weatherphenomena vary,depending on the latitude,altitude, topography, andatmospheric conditions.Ref:
    23. 23. To overcome every SM& Weather complexitiesa 5-point :road map• Identify a 1-dimensional time flux of information from SM’sworld• Detection of every local statistical linear association of this onein a parametric –physical- spacetime representation ( timespatial grid of data).• Mapping the significance in classes previously determined.• Pattern verification with observations.• Semantics and textual mining confirms.• Community analisys of SM streams to detect users filters
    24. 24. Target and ProductsStakeholders:•forecasters•institutional stakeholders•EM communities•media agentsProducts:•DNKT sematic based SMstream metric•The significant areas whereassociation of the SM timevector (DNKT) and coupledtime gridded data stack ofweather paraemeters = spatialassociative map•A semantic analysis Twitterstream:- clustering-word clouds-SNA improvesDetect areas where its worthfocusing attention, also forcommunication purpose.Target
    25. 25. Data usedHeat wave period considered (7-13 April 2011)Social- Using Twitter API key-tagged (CALDO-AFA-SETE) 6069tweets collected through geosearchservice for italian area.- Retweets and replies included (full volume stream)Climate & Weather (7-10 April 2011)- Urban daily maximum T°- Daily gridded data (lon 5-20 W lat 35-50)WRF-ARW model T°max daily data (box 9km)
    26. 26. Semantized Twitter streammetricsDNKT shows time coherence with daily profiles of areal averaged temperature*Critical days identified as numerical neighbour of peaks (7-8-9-April):social "heaty days"DNKT - "daily number of key-tagged tweets"***
    27. 27. The associative map as a toolSemantic based social stream in1D * time space (DNKT)Weather informativelayers in 2D time* spaceLinearAssociationStatisticallybasedVerifierby pixelGeographicAssociativeMap(2D space)
    28. 28. Impacted areas in evidenceIts a weather map atX-rays:Twitter streamis used as a"contrast medium"to visualize impactedareas.This is not a Twitter map
    29. 29. Associative maps fits wellUrban maximum T°over 28 C° on 9 Aprilwhere & when
    30. 30. Semantic analitics- Corpus creationDNKT classification by heat-wave peak days:heat days ( 7-8-9 April) no-heat days (6-10-11 April).- Terms Word Clouds (min wd frequency>30)heat days vs no-heat daysClustering associated termsTerm frequency ranking comparison- Hashtag Word Cloudsheat days vs no-heat daysR Stat 15.2 Packages used:tm (Feinerer and Hornik, 2012) & wordcloud (Fellows , 2012)heat days
    31. 31. WordClouds of terms(excluding key-tag caldo-afa-sete)heat daysno-heat days
    32. 32. Terms association clusteringheat days no heat days"heat" is THE conversation topic "heat" is marginal to the conversation topic
    33. 33. heat days
    34. 34. Terms frequency rankingno heat N=2608 heat N=3461oggi 6.0% oggi 8.3% 1°sole 5.5% troppo 7.7% 2°troppo 4.1% sole 5.9% 3°
    35. 35. Hashtags WordCloudsheat daysno-heat days
    36. 36. On peak days:- widening of lexical base during "heat critical days" - heat as aconversation topic- ranking of terms (i.e.:adjectives as "troppo"!) is useful to detect change incommunication during climatic stress- geographic names appears in terms and hashtags wordsets ("#milano" !).This fits with recent researches on "social media contribution tosituational awareness during emergencies".Semantic results
    37. 37. Snow eventsSNA of keytagged social media streamsBegin 10 feb 2013End 11 feb 2013The Graph metrics of SM streams are dynamics.The graph centrality analisys of Media and Istitutionsmay provide very useful parametersforWeather Event follow-up.#firenzeneve
    38. 38. Conclusions- Methodology for a social "x-rays" ofa weather event: Semantized SMstream could become as a "contrastmedium" to understand the socialimpact of severe weather events- Methodology of social geosensingmining is able to map the severeweather impacts and overcome theweakening inside social media data.Weather as a key emergency context where its worth working oncommunity resilience - also with the help of social insightful contents.
    39. 39. Reproducible R codesocialsensing Code & Data Recipes in
    40. 40. #thanksContacts:Crisci Alfonso & Valentina Grassomail: a.crisci@ibimet.cnr.itmail: grasso@lamma.rete.toscana.itTwitter: @alf_crisci
    41. 41. #nowquestions(slowly please if is possible)
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.