Towards Context-Aware Search and Analysis on Social Media Data


Published on

Social media has changed the way we communicate. Social media data capture our social interactions and utterances in machine readable format. Searching and analysing massive and frequently updated social media data brings significant and diverse rewards across many different application domains, from politics and business to social science and epidemiology.A notable proportion of social media data comes with explicit or implicit spatial annotations, and almost all social media data has temporal metadata. We view social media data as a constant stream of data points, each containing text with spatial and temporal con-texts. We identify challenges relevant to each context, which we intend to subject to context aware querying and analysis, specifically including longitudinal analyses on social media archives, spatial keyword search, local intent search, and spatio-temporal intent search. Finally, for each context, emerging applications and further avenues for investigation are discussed.

Published in: Education
  • Be the first to comment

Towards Context-Aware Search and Analysis on Social Media Data

  1. 1. Towards Context-Aware Search and Analysis on Social Media Data Leon Derczynski Bin Yang 杨彬 Christian S. Jensen
  2. 2. Evolution of communicationFunctional utterancesVowelsVelar closure: consonantsSpeechNew modality: writing IncreasedDigital text machine- ?E-mail readableSocial media information
  3. 3. Social Media = Big DataGartner 3V definition:1.Volume2.Velocity3.VarietyHigh volume & velocity of messages: Twitter has ~20 000 000 users per month They write ~500 000 000 messages per dayMassive variety: Stock markets; Earthquakes; Social arrangements; … Bieber
  4. 4. What is machine-readable now?Messages now contain- not only linguistic content- but also: Links (e.g. URI) Topic markers (e.g. hashtags) Meta-informationWhat kind of meta-information? User profile (including home location) Images Messages replied to Message language Time of message Location of message
  5. 5. What resources do we have now?Large, content-rich, linked, digital streams of human communicationWe transfer knowledge via communicationSampling communication gives a sample of human knowledge Youve only done that which you can communicateThe metadata (time – place – imagery) gives a richer resource: → A sampling of human behaviour
  6. 6. What can we do with this resource?Context increases the datas richnessIncreased richness enables novel applicationsTime and Place are interesting parts of message context1.What kinds of applications are there?2.What are the practical challenges?
  7. 7. Temporal ContextMessages have timestamps: +Two temporal retrieval scenarios: 1. Historical analyses 2. Emerging data
  8. 8. Historical searchAbility to retrieve from archives: Longitudinal query mode 0Retrieve information on: ● Lifecycle of socially connected groups ● Analyse precursors to events, post-hoc 2008 20110. Weikum et al. 2011: Longitudinal analytics on web archive data: It’s about time, Proc. CIDR
  9. 9. Historical searchRetrospective analyses into cause and effect Theres a dead crow in my gardenSocial media mentions of dead crows predict WNV in humans 11. Sugumaran & Voss 2012: Real-time spatio-temporal analysis of West Nile Virus using Twitter Data, Proc.Intl conference on Computing for Geospatial Research and Applications
  10. 10. Emerging searchData emerging at high velocity: 185 000 documents per minuteGives a high temporal densitySearch over this info enables: ● Live coverage of events ● Realtime identification of emerging events 22. Cohen at al. 2011: Computational journalism: A call to arms to database researchers, Proc. CIDR
  11. 11. Temporal indexingWhat are our requirements? ● High-frequency document creation ● Temporal cross-sections of varying size ● Time-sensitive TF/IDF: stopwords are fluidHow can we do this? - Open challenge ● Tree indexing hard to distribute ● Maybe with adaptive multi-resolution grids?
  12. 12. Spatial ContextDemand for spatial information: 20% of all Google searches 53% of Bing mobile searchesHeterogeneous spatial context sources GPS locations (most reliable) Origin bounding boxes (e.g. city) User profile text??? 3 Authors friends locations 43. Hecht at al. 2011: Tweets from Justin Bieber’s Heart: The Dynamics of the “Location” Field in UserProfiles, Proc. ACM CHI ; 4. Rout et al. 2013: Wheres @wally? A Graph Based Method for GeolocatingUsers in Social Networks, Proc. ACM Hypertext
  13. 13. Spatial Keyword SearchHow can we query a set of social media messages? Treat as a a set of objects, each having Text  Location  Query parameters: Query text Query locationGiven query and set of messages, rank by similarity: Text similarity (Cosine, Siamese Learning Net, Oriented PCA) Separating distance (Haversine, Manhattan, Eco-routed) Blend this with balancing coeff  (just like conventional spatial keyword search)
  14. 14. Spatial Keyword SearchQuery: E good bar in north copenhagen BIssued from location Five candidate messages A CQuery region established DRank by blend of location and textual similarity Message loca text A So drunk last night at @BarSyv 0.7 0.6 B Out shoe shopping!!! #louboutintime 0.9 0.0 C Who pays $9 for a beer?! 0.6 0.5 D wow found cphs greatest cocktail bar lol 0.1 1.0 E Traffic. Traffic everywhere. Need a drink. 0.4 0.2
  15. 15. Continuous Spatial QueriesSocial media scenario characterised by: Streaming data New spatial objects constantly appearingTwo new spatial keyword query types: Static Continuous (SCSKQ) - Fixed query location - Tracks newly appearing objects Moving Continuous (MCSKQ) - Query location transits locus - Result updated with new objectsNovel part: fresh objects continuously introduced
  16. 16. Location DiversityLocation data unreliableReliability of location data... is also unreliableThere are known knowns.. we also know there are known unknowns.. but there are also unknown unknowns – Donald RumsfeldText mentions require disambiguation ● In profile ● In messages ● In queriesRequirement is to rank vague points given vague query
  17. 17. Willingness to travelDetermines useful search radiusBased on mode of transport: 14.9km 22.0km 40.6km 61.5km >100kmDifferent for varying classes of Point Of Interest?ST Social media = huge dataset Easy data collection Useful for e.g. town planning
  18. 18. Spatio-temporal ChallengesWeve seen temporal and spatial challenges; lets combine!Given all these spatio-temporal utterances, what can we do? - Spatial gives relevance from physical or travel proximity - Temporal gives relevance from recency and historicalAdding text to the spatio-temporal points gives explicit semantic contextNot only are ST patterns in the data, we are told what they mean!
  19. 19. Topic-based RetrievalRetrieving results on a topic is useful; Tell me about XSpecific terms vary between places and over time2007 England Jelly2011 US English … Spatio-temporally sensitive indexing?
  20. 20. Sentiment MonitoringMeasure how attitudes change over time and over locationBusiness uses: where to send marketingPolitical uses: data-driven democratic.. campaigningGovernance uses: what are citizen priorities in a regionTemporal dimension enables tracking of trends and reactions red = upbeat; blue = complaint. - no normalisation for vocality!
  21. 21. Local Computational JournalismSocial media is quickSocial media is uncuratedCitizen JournalismNews has relevance scope: Recency ProximityDifferent events relevant in different contexts: Rain in London Rain in Addis AbabaAutomatic event detection5 - and also reporting!5. Ritter at al. 2012: Open domain event extraction from Twitter, Proc. ACM SIGKDD
  22. 22. SummarySocial media is a rich source of big dataA small sampling of all human discourseIt comes with temporal and spatial contextContext-aware search and analysis is very demanding! - Novel, powerful applications - Wide variety of domains - An open set of challenges
  23. 23. Thank you!Thank you for listening! Do you have any questions?