Your SlideShare is downloading. ×
  • Like
Towards Context-Aware Search and Analysis on Social Media Data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Towards Context-Aware Search and Analysis on Social Media Data

  • 395 views
Published

Social media has changed the way we communicate. Social media data capture our social interactions and utterances in machine readable format. Searching and analysing massive and frequently updated …

Social media has changed the way we communicate. Social media data capture our social interactions and utterances in machine readable format. Searching and analysing massive and frequently updated social media data brings significant and diverse rewards across many different application domains, from politics and business to social science and epidemiology.A notable proportion of social media data comes with explicit or implicit spatial annotations, and almost all social media data has temporal metadata. We view social media data as a constant stream of data points, each containing text with spatial and temporal con-texts. We identify challenges relevant to each context, which we intend to subject to context aware querying and analysis, specifically including longitudinal analyses on social media archives, spatial keyword search, local intent search, and spatio-temporal intent search. Finally, for each context, emerging applications and further avenues for investigation are discussed.

Published in Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
395
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Towards Context-Aware Search and Analysis on Social Media Data Leon Derczynski Bin Yang 杨彬 Christian S. Jensen
  • 2. Evolution of communicationFunctional utterancesVowelsVelar closure: consonantsSpeechNew modality: writing IncreasedDigital text machine- ?E-mail readableSocial media information
  • 3. Social Media = Big DataGartner 3V definition:1.Volume2.Velocity3.VarietyHigh volume & velocity of messages: Twitter has ~20 000 000 users per month They write ~500 000 000 messages per dayMassive variety: Stock markets; Earthquakes; Social arrangements; … Bieber
  • 4. What is machine-readable now?Messages now contain- not only linguistic content- but also: Links (e.g. URI) Topic markers (e.g. hashtags) Meta-informationWhat kind of meta-information? User profile (including home location) Images Messages replied to Message language Time of message Location of message
  • 5. What resources do we have now?Large, content-rich, linked, digital streams of human communicationWe transfer knowledge via communicationSampling communication gives a sample of human knowledge Youve only done that which you can communicateThe metadata (time – place – imagery) gives a richer resource: → A sampling of human behaviour
  • 6. What can we do with this resource?Context increases the datas richnessIncreased richness enables novel applicationsTime and Place are interesting parts of message context1.What kinds of applications are there?2.What are the practical challenges?
  • 7. Temporal ContextMessages have timestamps: +Two temporal retrieval scenarios: 1. Historical analyses 2. Emerging data
  • 8. Historical searchAbility to retrieve from archives: Longitudinal query mode 0Retrieve information on: ● Lifecycle of socially connected groups ● Analyse precursors to events, post-hoc 2008 20110. Weikum et al. 2011: Longitudinal analytics on web archive data: It’s about time, Proc. CIDR
  • 9. Historical searchRetrospective analyses into cause and effect Theres a dead crow in my gardenSocial media mentions of dead crows predict WNV in humans 11. Sugumaran & Voss 2012: Real-time spatio-temporal analysis of West Nile Virus using Twitter Data, Proc.Intl conference on Computing for Geospatial Research and Applications
  • 10. Emerging searchData emerging at high velocity: 185 000 documents per minuteGives a high temporal densitySearch over this info enables: ● Live coverage of events ● Realtime identification of emerging events 22. Cohen at al. 2011: Computational journalism: A call to arms to database researchers, Proc. CIDR
  • 11. Temporal indexingWhat are our requirements? ● High-frequency document creation ● Temporal cross-sections of varying size ● Time-sensitive TF/IDF: stopwords are fluidHow can we do this? - Open challenge ● Tree indexing hard to distribute ● Maybe with adaptive multi-resolution grids?
  • 12. Spatial ContextDemand for spatial information: 20% of all Google searches 53% of Bing mobile searchesHeterogeneous spatial context sources GPS locations (most reliable) Origin bounding boxes (e.g. city) User profile text??? 3 Authors friends locations 43. Hecht at al. 2011: Tweets from Justin Bieber’s Heart: The Dynamics of the “Location” Field in UserProfiles, Proc. ACM CHI ; 4. Rout et al. 2013: Wheres @wally? A Graph Based Method for GeolocatingUsers in Social Networks, Proc. ACM Hypertext
  • 13. Spatial Keyword SearchHow can we query a set of social media messages? Treat as a a set of objects, each having Text  Location  Query parameters: Query text Query locationGiven query and set of messages, rank by similarity: Text similarity (Cosine, Siamese Learning Net, Oriented PCA) Separating distance (Haversine, Manhattan, Eco-routed) Blend this with balancing coeff  (just like conventional spatial keyword search)
  • 14. Spatial Keyword SearchQuery: E good bar in north copenhagen BIssued from location Five candidate messages A CQuery region established DRank by blend of location and textual similarity Message loca text A So drunk last night at @BarSyv 0.7 0.6 B Out shoe shopping!!! #louboutintime 0.9 0.0 C Who pays $9 for a beer?! 0.6 0.5 D wow found cphs greatest cocktail bar lol 0.1 1.0 E Traffic. Traffic everywhere. Need a drink. 0.4 0.2
  • 15. Continuous Spatial QueriesSocial media scenario characterised by: Streaming data New spatial objects constantly appearingTwo new spatial keyword query types: Static Continuous (SCSKQ) - Fixed query location - Tracks newly appearing objects Moving Continuous (MCSKQ) - Query location transits locus - Result updated with new objectsNovel part: fresh objects continuously introduced
  • 16. Location DiversityLocation data unreliableReliability of location data... is also unreliableThere are known knowns.. we also know there are known unknowns.. but there are also unknown unknowns – Donald RumsfeldText mentions require disambiguation ● In profile ● In messages ● In queriesRequirement is to rank vague points given vague query
  • 17. Willingness to travelDetermines useful search radiusBased on mode of transport: 14.9km 22.0km 40.6km 61.5km >100kmDifferent for varying classes of Point Of Interest?ST Social media = huge dataset Easy data collection Useful for e.g. town planning
  • 18. Spatio-temporal ChallengesWeve seen temporal and spatial challenges; lets combine!Given all these spatio-temporal utterances, what can we do? - Spatial gives relevance from physical or travel proximity - Temporal gives relevance from recency and historicalAdding text to the spatio-temporal points gives explicit semantic contextNot only are ST patterns in the data, we are told what they mean!
  • 19. Topic-based RetrievalRetrieving results on a topic is useful; Tell me about XSpecific terms vary between places and over time2007 England Englishen.wikipedia.org/wiki/President_of_the_United_States Jelly2011 US English … Spatio-temporally sensitive indexing?
  • 20. Sentiment MonitoringMeasure how attitudes change over time and over locationBusiness uses: where to send marketingPolitical uses: data-driven democratic.. campaigningGovernance uses: what are citizen priorities in a regionTemporal dimension enables tracking of trends and reactions red = upbeat; blue = complaint. - no normalisation for vocality!
  • 21. Local Computational JournalismSocial media is quickSocial media is uncuratedCitizen JournalismNews has relevance scope: Recency ProximityDifferent events relevant in different contexts: Rain in London Rain in Addis AbabaAutomatic event detection5 - and also reporting!5. Ritter at al. 2012: Open domain event extraction from Twitter, Proc. ACM SIGKDD
  • 22. SummarySocial media is a rich source of big dataA small sampling of all human discourseIt comes with temporal and spatial contextContext-aware search and analysis is very demanding! - Novel, powerful applications - Wide variety of domains - An open set of challenges
  • 23. Thank you!Thank you for listening! Do you have any questions?