Your SlideShare is downloading. ×
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience  - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

540

Published on

SocialSensor: Sensing User Generated Input for Improved Media Discovery and Experience …

SocialSensor: Sensing User Generated Input for Improved Media Discovery and Experience

Social Multimedia Crawling & Mining

EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
540
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Benefits: (i) Intelligent extraction of objects and events from the social Web, (ii) multimodal indexing and organization, (iii) personalized access and presentation of content (incl. media delivery and caching), and (iv) concrete and real integration of the social dimension of the current Web.
  • ----- Besprechungsnotizen (03.04.12 14:41) ----- In the course of the project we have interviewed a considerable number of journalists and executives from some of the worlds biggest media outlets like CNN, the BBC, The New York Times and others... Here are some of the quotes. 3x klicken (bis alle 3 Quotes sichtbar). But journalists are not only describing the positive side. There are also huge challenges. And you can see from the following slide what is most challenging...
  • ----- Besprechungsnotizen (03.04.12 16:44) ----- Or we can turn it the other way round: We have a known source whom the reporter trusts...
  • Transcript

    • 1. SocialSensor: Sensing User Generated Input for Improved Media Discovery and Experience Social Multimedia Crawling & Mining EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams Dr. Yiannis Kompatsiaris, Project Coordinator Samos 2013 Summit on Digital Innovation for Government, Business and Society
    • 2. #2 Overview • Motivation • Objectives • Architecture • Use Cases and Requirements • News – Social Multimedia Crawling & Mining • Infotainment – EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams • Conclusions
    • 3. #3 What is SocialSensor? • 3-year FP7 European Integrated Project – http://www.socialsensor.eu • Members: CERTH, ATC (Greece), Deutsche Welle, University Koblenz, Research Center for Artificial Intelligence (Germany), The City University London, Alcatel – Lucent Bell Labs, JCP Consult (France), University of Klagenfurt (Austria), IBM Israel, Yahoo Iberia + Robert Gordon University Aberdeen (UK) • 1.5 years into the project (Development of user requirements, use case scenarios, architecture and implementation and first R&D components and prototypes. Currently: Evaluation and 2nd round)
    • 4. Motivation: Social Networks as Sensors • Social Networks is a data source with an extremely dynamic nature that reflects events and the evolution of community focus (user’s interests) • Transform individually rare but collectively frequent media to meaningful topics, events, points of interest, emotional states and social connections • Mine the data and their relations and exploit them in the right context • Scalable mining and indexing approaches taking into account the content and social context of social networks
    • 5. Relevant Applications Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using flickr for prediction and forecast, International conference on Multimedia (MM '10). ACM. Federal Emergency Management Agency plans to engage the public more in disaster response by sharing data and leveraging reports from mobile phones and social media 5 “…if you're more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twitter before it hits you…”
    • 6. Objective SocialSensor quickly surfaces trusted and relevant material from social media – with context DySCODySCO behaviou r location timecontent usage social context Massive social media and unstructured web Social media mining Aggregation & indexing News - Infotainment Personalised access Ad-hoc P2P networks
    • 7. #7 The SocialSensor Vision SocialSensor quickly surfaces trusted and relevant material from social media – with context. •“quickly”: in real time •“surfaces”: automatically discovers, clusters and searches •“trusted”: automatic support in verification process •“relevant”: to the users, personalized •“material”: any material (text, image, audio, video = multimedia), aggregated with other sources (e.g. web) •“social media”: across many relevant social media platforms •“with context”: location, time, sentiment, influence, trust
    • 8. #10 Conceptual Architecture and Main components SEMANTIC MIDDLEWARE Public Data In-project Data SEARCH & RECOMMENDATION USER MODELLING & PRESENTATION INDEXINGMINING STORAGE DATA COLLECTION / CRAWLING • Real time dynamic topic and event clustering • Trend, popularity and sentiment analysis • Calculate trust/influence scores around people • Personalized search, access & presentation based on social network interactions • Semantic enrichment and discovery of services
    • 9. DySCO concept • Integrate social content mining, search and intelligent presentation in a personalized, context and network- aware way, through the new concept of Dynamic Social COntainers (DySCOs) • Composite objects containing a number of items (e.g. articles, tweets, images, videos) • Focused on a particular topic of interest (e.g. an event, a story) • Contain all available information about the topic • Metadata can be added dynamically • Ability to search for DySCOS by matching “DySCO features” with “search features” • Ability to check and recommend similar DySCOs recommendations #11
    • 10. #13 Use Cases: News
    • 11. #14 “It has changed the way we do news”(MSN) “Social media is the key place for emerging stories – internationally, nationally, locally” (BBC) “Social media is transforming the way we do journalism” (New York Times) Source: picture alliance / dpa
    • 12. #15
    • 13. #16 Source: Getty Images “It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC) “Things that aren’t relevant crowd out the content you are looking for” (MSN) “The filters aren’t configurable enough” (CNN)
    • 14. Verification was simpler in the past... Source: Frank Grätz #17
    • 15. #18 An example: BBC Verification Procedure: Arab Spring Coverage • Referencing locations against maps and existing images from, in particular, geo-located ones. • Working with our colleagues in BBC Arabic and BBC Monitoring to ascertain that accents and language are correct for the location. • Searching for the original source of the upload/sequences as an indicator of date. • Examining weather reports and shadows to confirm that the conditions shown fit with the claimed date and time. • Maintaining lists of previously verified material to act as reference for colleagues covering the stories. • Checking scenery, weaponry, vehicles and licence plates against those known for the given country.
    • 16. News application • Real-time Search/browse news items crawled from different social media • Automatically discovered trending topics • Web analytics • Sentiment scores for topics #19
    • 17. Alethiometer • Measuring the degree of truth behind tweets • Overall trust score for a tweet • Various Contributor, Content, Context validity metrics #20
    • 18. #21 Social Multimedia Crawling & Mining Case study: #OccupyGezi E. Schinas, S. Papadopoulos, I. Tsampoulatidis, K. Iliakopoulou, Y. Kompatsiaris
    • 19. #22 Multimedia crawling & mining • Monitor/query multiple sources for shared media content: Twitter, Facebook, Flickr, YouTube, etc. • Multiple indexing schemes: – text-based (Solr) – visual content-based (SURF+VLAD for feature extraction, ADC for similarity-based indexing) • Clustering – geo-spatial (BIRCH) – visual (SCAN) • Web-based presentation of results
    • 20. #23 Crawling & mining system deployment StreamManager Twitter Facebook Flickr YouTube RSS Instagram 160.xx.xx.207 MongoDBWrapper 160.xx.xx.207 TextIndexer (Solr) 160.xx.xx.207 160.xx.xx.207 MediaFetcher, FeatureExtractor (HDFS) 160.xx.xx.58 160.xx.xx.107 Social Focused Crawler (HDFS) 160.xx.xx.187 Nutch Nutch VLAD FeatureIndexer (HDFS) 160.xx.xx.207 IVFADC Data Mining 160.xx.xx.191 Visual Clust. Geo Clust. Statistics Web server 160.xx.xx.116 API (3)API (4) API (1) API (2)
    • 21. #24 #OccupyGezi • Monitors: Keywords: gezipark, taksimgezipark, Taksim, Taksim Gezi Park Location: Istanbul • Current statistics:
    • 22. #25 Geographical spread of event • Seems like it is not a localized event (as several official Turkish news sources claimed), but spreads all over Turkey and even in major cities abroad
    • 23. #26 Different granularities
    • 24. #27 Trending media by use of clustering
    • 25. #28 Visual Memes
    • 26. #29 Statistics
    • 27. #31 Use Cases: Infotainment
    • 28. #32 Capturing & mining large-scale events • Large-scale events  attended by thousands of people  captured by mobile devices in the form of status updates, photos, ratings, etc. • Challenge: – Organize information around entities of interest – Extract meaningful insights, obtain informative summaries • EventSense framework
    • 29. #33 Infotainment • Thessaloniki International Film Festival – 80,000 viewers / 100,000 visitors in 10 days – 150 films, 350 screenings • Fete de la Musique Berlin – 100,000 visitors every year – 5,000 musicians
    • 30. #35 ThessFest ThessFest• Thessaloniki International Film Festival • Support twitter/comment usage within the app • Ratings and comments per film • Feedback aggregation – Votes – Tweets • Real-time feedback to the organisation and visitors
    • 31. #36 ThessFest • Gather “realistic” user requirements • Early showcase and evaluation of SocialSensor technologies in real-world event scale • Engage users and create an informed user basis • TDF14, 15 + TIFF53 – 1400+ users – 40K+ user sessions – positive response to social media • Next version – Updated features bases on SocialSensor prototype
    • 32. Fête de la Musique Berlin app • FETEberlin in App Store and Google Play • More than 100K visitors • About 5K musicians • More than 5K app downloads App features •Browse and filter detailed program •Interactive maps and routing •Social Sharing •Artists’ and Stages Details •Social Monitoring Main benefits for attendants •Visitors can browse through maps and don’t get lost as stages are numerous •Event schedule is available always and per stage – Very useful when the server was down and there was no access to the online schedule #37
    • 33. Fête de la Musique Berlin app
    • 34. FETEberlin Facts & User Feedback Unique Users Sessions Frequency of Use App Store 2904 13751 2,5 sessions per day Google Play 2210 12097 2,9 sessions per day Total 5114 25848 Avg: 2,7 sessions per day Future Plans •Enhanced Event and Visitor Engagement •Send Last minute updates •Create buzz around the event and make users the event ambassadors •Gain insightful knowledge on the impact that the event made via social media analysis – Organize better events #39
    • 35. #40 EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams Case study: Thessaloniki International Film Festival E. Schinas, S. Papadopoulos, S. Diplaris, Y. Kompatsiaris, Y. Mass, J. Herzig, L. Boudakidis
    • 36. #41 Entity Detection • Entities are defined as lists of properties: – a film consists of a title, description, names of director(s)/actors • Matching status updates (tweets) to entities relies on representing both as vectors, using cosine similarity, and thresholding: m: message (tweet), f: feature (term), M: set of all event messages boost(f): boosting factor when f is a named entity
    • 37. #42 Topic detection • For each new message, find the Nearest Neighbour (NN) using Locality Sensitive Hashing (LSH) • If similarity exceeds an empirically selected threshold, assign to the topic of NN, otherwise create new topic • Clusters of one messages are discarded as outliers • Cluster-merging is conducted as a post-processing step to compensate for topic over-segmentation • Similar approach to (Petrovic et al., 2010) S. Petrovic, M. Osborne, V. Lavrenko. Streaming first story detection with application to twitter. In Human Language Technologies, HLT ’10, pages 181–189, Stroudsburg, PA, USA, 2010. ACL
    • 38. #43 Sentiment detection (1/2) • Build positive/negative sentiment classifiers using emoticons • Build neutral classifier using positive/negative classifiers • Feature extraction: – Remove stop words, emoticons, terms occurring only once, trim repeated letters – Negation terms (“not”, “isn’t”) are attached to subsequent terms to form new unigrams (e.g. “nothappy”) – Treat user mentions, URLs, punctuation, repeated letters and all-caps words as additional features
    • 39. #44 Sentiment detection (2/2) • Naïve Bayes classifier per sentiment • P(f|c) estimate using ML and Laplace correction • Probability estimate for special features (e.g. user mentions) using Bernoulli model and Laplace correction • Classification using maximum log-likelihood • Neutral messages: – Mutual Information (MI) – MI of features – Sentiment intensity of message – Use thresholding for decision
    • 40. #45 Evaluation • Case study: 53rd Thessaloniki International Film Festival (TIFF53), Nov 2-11, 2012 • 168 films included in the program (titles, descriptions in Greek and English) • 3974 tweets using #tiff53 • Manual annotation regarding: – film – sentiment (pos/neg/neut) • Additional data using ThessFest mobile app: – #bookmarks per film (number of times a user added the film to their schedule) – #ratings + avg. rating per film
    • 41. #46 Tweet-film matching • film = <title, description, directors, actors> • Multiple entity representations using Greek/English/both, uni-/bi-grams • Similarity threshold sensitivity analysis Pooling multiple representationsthreshold ∈ (0.1, 0.3)
    • 42. #47 Topic analysis • Top-10 topics • Manual inspection of clusters: – 53.8% of topic titles considered informative – 98.5% of clusters were found to be “clean” • Topics in time
    • 43. #48 Sentiment analysis • Training (using emoticons and Twitter API) – 800K positive & negative tweets for English – 12K positive & negative tweets for Greek • Tuning (for threshold) – Manually annotated dataset from Thessaloniki Documentary Festival (similar event) – 325/73/553 in English and 781/216/781 in Greek • Testing – 324/33/724 in English and 901/315/1667 in Greek – Best accuracy (English) ~ 0.75 – Performance in Greek much poorer compared to English  need for richer training corpus pos neg neut
    • 44. #49 Aggregation & summarization (1/2) #T: number of tweets Pol: polarity of film tweets Subj: subjectivity of film tweets R: average rating #R: number of ratings #F: number of times the film was bookmarked • Films with positive polarity are rated higher. • Films that are tweeted a lot are also more likely to be rated. • Films that are tweet a lot are also more likely to be added to the users’ bookmarks.
    • 45. #50 Aggregation & summarization (2/2) Most active & influential Twitter accounts (+sentiment per user) Most shared photos (+number of retweets)
    • 46. Conclusions • Great interest in both use cases • In news social media have transformed both news generation and consumption • Social media data mining can provide interesting results in many applications • Not all data always available (e.g. User queries, fb) – Infrastructure, Policy issues • Technical challenges – Fusion (multi-modality, context), real-time, noise, big data, aggregation (web, Linked Open Data) • Applications challenges • User engagement, visualization, become part of existing workflows, privacy, copyright, commercialization
    • 47. Thank you!

    ×