Successfully reported this slideshow.
Lecture @ International Hellenic University
Thessaloniki, 8 May 2014
Social Media Crawling and Mining
Motivation – Use Cas...
IHU SocialSensor Seminar – May 2014 #2
Introduction
Motivation
Example Applications
Conceptual Architecture
Challenges
IHU SocialSensor Seminar – May 2014
http://www.puzzlemarketer.com/digital-social-brands-in-60-seconds/ (Apr, 2012)
IHU SocialSensor Seminar – May 2014
Social Networks as Real-Life Sensors
• Social Networks is a data source with an
extrem...
IHU SocialSensor Seminar – May 2014 #5
Pope Francis
Pope Benedict
2007: iPhone release
2008: Android release
2010: iPad re...
IHU SocialSensor Seminar – May 2014
Social Networks as Graphs
10
social web as a graph
nodes = twi er users
edges = retwee...
IHU SocialSensor Seminar – May 2014 #7
Social Networks as Graphs
“Social networks have emergent
properties. Emergent prope...
IHU SocialSensor Seminar – May 2014
Examples - Science
Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and
Jiawei Ha...
IHU SocialSensor Seminar – May 2014
Example – News (Boston bombing)
#9
“Following the Boston Marathon bombings, one quarte...
IHU SocialSensor Seminar – May 2014
Events - Festivals
#10
http://www.eventmanagerblog.com/uploads/2012/12/event-technolog...
IHU SocialSensor Seminar – May 2014
API Wrapper
Website Wrapper
Scheduler
CRAWLING
Visual Indexing
Near-duplicates
Text In...
IHU SocialSensor Seminar – May 2014
Challenges – Content (Mining)
• Multi-modality: e.g. image + tags
• Rich social contex...
IHU SocialSensor Seminar – May 2014
Policy – Licensing – Legal challenges
• Fragmented access to data
– Separate wrappers/...
IHU SocialSensor Seminar – May 2014 #14
Social Sensor Project
Use Cases
IHU SocialSensor Seminar – May 2014
SocialSensor Project Objective
SocialSensor quickly surfaces trusted and relevant mate...
IHU SocialSensor Seminar – May 2014 #16
The SocialSensor Vision
SocialSensor quickly surfaces trusted and relevant
materia...
IHU SocialSensor Seminar – May 2014 #17
Conceptual Architecture and Main components
SEMANTIC MIDDLEWARE
Public
Data
In-pro...
IHU SocialSensor Seminar – May 2014
Use Cases
Casual News
application
Casual News Readers
Professional
News application
Jo...
IHU SocialSensor Seminar – May 2014 #19
“It has changed the way we do
news”(MSN)
“Social media is the key place for emergi...
IHU SocialSensor Seminar – May 2014 #20
Source: Getty
Images
“It’s really hard to find the nuggets of useful stuff
in an o...
IHU SocialSensor Seminar – May 2014
Verification was simpler in the past...
Source: Frank Grätz
#21
IHU SocialSensor Seminar – May 2014 #22
Infotainment
• Events with large numbers
of visitors
• Thessaloniki International
...
IHU SocialSensor Seminar – May 2014
Other Application Areas
• Science
– Sociology, machine learning (machine as a teacher)...
IHU SocialSensor Seminar – May 2014
Conclusions – Further topics
• Social media data useful in many applications
• Not all...
IHU SocialSensor Seminar – May 2014
Reusable results
• Starting point: http://www.socialsensor.eu/results
– Deliverables
–...
IHU SocialSensor Seminar – May 2014
European Centre for Social Media
• Topics
– Social media analytics
– Verification
– Vi...
Thank you for your attention!
ikom@iti.gr
http://mklab.iti.gr
Upcoming SlideShare
Loading in …5
×

Social Media Crawling and Mining Seminar (Motivation Part)

748 views

Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Social Media Crawling and Mining Seminar (Motivation Part)

  1. 1. Lecture @ International Hellenic University Thessaloniki, 8 May 2014 Social Media Crawling and Mining Motivation – Use Cases Symeon (Akis) Papadopoulos, Manos Schinas, Katerina Iliakopoulou, Yiannis Kompatsiaris Information Technologies Institute (ITI) Centre for Research & Technologies Hellas (CERTH)
  2. 2. IHU SocialSensor Seminar – May 2014 #2 Introduction Motivation Example Applications Conceptual Architecture Challenges
  3. 3. IHU SocialSensor Seminar – May 2014 http://www.puzzlemarketer.com/digital-social-brands-in-60-seconds/ (Apr, 2012)
  4. 4. IHU SocialSensor Seminar – May 2014 Social Networks as Real-Life Sensors • Social Networks is a data source with an extremely dynamic nature that reflects events and the evolution of community focus (user’s interests) • Huge smartphones and mobile devices penetration provides real-time and location-based user feedback • Transform individually rare but collectively frequent media to meaningful topics, events, points of interest, emotional states and social connections • Present in an efficient way for a variety of applications (news, marketing, entertainment)
  5. 5. IHU SocialSensor Seminar – May 2014 #5 Pope Francis Pope Benedict 2007: iPhone release 2008: Android release 2010: iPad release http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
  6. 6. IHU SocialSensor Seminar – May 2014 Social Networks as Graphs 10 social web as a graph nodes = twi er users edges = retweets on #jan25 hashtag announcement of Mubarak’s resigna on h p://gephi.org/2011/the-egyp an-revolu on-on-twi er/
  7. 7. IHU SocialSensor Seminar – May 2014 #7 Social Networks as Graphs “Social networks have emergent properties. Emergent properties are new attributes of a whole that arise from the interaction and interconnection of the parts” •Emotions, Health, Sexual relationships do not depend just on our connections (e.g. number of them) but on our position - structure in the social graph – Central – Hub – Outlier – Transitivity (connections between friends)
  8. 8. IHU SocialSensor Seminar – May 2014 Examples - Science Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using flickr for prediction and forecast, International conference on Multimedia (MM '10). ACM. 8 “…if you're more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twitter before it hits you…”
  9. 9. IHU SocialSensor Seminar – May 2014 Example – News (Boston bombing) #9 “Following the Boston Marathon bombings, one quarter of Americans reportedly looked to Facebook, Twitter and other social networking sites for information, according to The Pew Research Center. When the Boston Police Department posted its final “CAPTURED!!!” tweet of the manhunt, more than 140,000 people retweeted it.” “Authorities have recognized that one the first places people go in events like this is to social media, to see what the crowd is saying about what to do next” "I have been following my friend's Facebook [account] who is near the scene and she is updating everyone before it even gets to the news”
  10. 10. IHU SocialSensor Seminar – May 2014 Events - Festivals #10 http://www.eventmanagerblog.com/uploads/2012/12/event-technology-infographic.jpg
  11. 11. IHU SocialSensor Seminar – May 2014 API Wrapper Website Wrapper Scheduler CRAWLING Visual Indexing Near-duplicates Text Indexing INDEXING Media Fetcher SNA Sentiment - Influence Trends - Topics MINING Model Building Concepts Relevance Diversity Popularity RANKING Veracity Crawling Specs Sources Interaction Responsiveness Aggregation VISUALIZATION Aesthetics Conceptual Architecture
  12. 12. IHU SocialSensor Seminar – May 2014 Challenges – Content (Mining) • Multi-modality: e.g. image + tags • Rich social context: spatio-temporal, social connections, relations and social graph • Inconsistent quality: noise, spam, ambiguity, fake, propaganda • Huge volume: Massively produced and disseminated • Multi-source: may be generated by different applications and user communities • Also connected to other sources (e.g. LOD, web) • Dynamic: Fast updates, real-time
  13. 13. IHU SocialSensor Seminar – May 2014 Policy – Licensing – Legal challenges • Fragmented access to data – Separate wrappers/APIs for each source (Twitter, Facebook, etc.) – Different data collection/crawling policies • Limitations imposed by API providers (“Walled Gardens”) • Full access to data impossible or extremely expensive (e.g. see data licensing plans for GNIP and DataSift • Non-transparent data access practices (e.g. access is provided to an organization/person if they have a contact in Twitter) • Constant change of model and ToS of social APIs – No backwards compatibility, additional development costs • Ephemeral nature of content • Social search results often lead to removed content  inconsistent and unreliable referencing • User Privacy & Purpose of use • Fuzzy regulatory framework regarding mining user-contributed data
  14. 14. IHU SocialSensor Seminar – May 2014 #14 Social Sensor Project Use Cases
  15. 15. IHU SocialSensor Seminar – May 2014 SocialSensor Project Objective SocialSensor quickly surfaces trusted and relevant material from social media – with context. DySCO behaviour location timecontent usage social context Massive social media and unstructured web Social media mining Aggregation & indexing News - Infotainment Personalised access Ad-hoc P2P networks
  16. 16. IHU SocialSensor Seminar – May 2014 #16 The SocialSensor Vision SocialSensor quickly surfaces trusted and relevant material from social media – with context. •“quickly”: in real time •“surfaces”: automatically discovers, clusters and searches •“trusted”: automatic support in verification process •“relevant”: to the users, personalized •“material”: any material (text, image, audio, video = multimedia), aggregated with other sources (e.g. web) •“social media”: across all relevant social media platforms •“with context”: location, time, sentiment, influence
  17. 17. IHU SocialSensor Seminar – May 2014 #17 Conceptual Architecture and Main components SEMANTIC MIDDLEWARE Public Data In-project Data SEARCH & RECOMMENDATION USER MODELLING & PRESENTATION INDEXINGMINING STORAGE DATA COLLECTION / CRAWLING • Real time dynamic topic and event clustering • Trend, popularity and sentiment analysis • Calculate trust/influence scores around people • Personalized search, access & presentation based on social network interactions • Semantic enrichment and discovery of services
  18. 18. IHU SocialSensor Seminar – May 2014 Use Cases Casual News application Casual News Readers Professional News application Journalists, Editors, etc. NEWS EventLiveDashboard Festival organizers INFOTAINMENT Social Media Walls Festival attendants
  19. 19. IHU SocialSensor Seminar – May 2014 #19 “It has changed the way we do news”(MSN) “Social media is the key place for emerging stories – internationally, nationally, locally” (BBC) “Social media is transforming the way we do journalism” (New York Times) Source: picture alliance / dpa
  20. 20. IHU SocialSensor Seminar – May 2014 #20 Source: Getty Images “It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC) “Things that aren’t relevant crowd out the content you are looking for” (MSN) “The filters aren’t configurable enough” (CNN)
  21. 21. IHU SocialSensor Seminar – May 2014 Verification was simpler in the past... Source: Frank Grätz #21
  22. 22. IHU SocialSensor Seminar – May 2014 #22 Infotainment • Events with large numbers of visitors • Thessaloniki International Film Festival – 80,000 viewers / 100,000 visitors in 10 days – 150 films, 350 screenings • Discovery and presentation of relevant aggregated social media – Trending Topics – Sentiment – Tweet – film matching – Visualization (Social Walls)
  23. 23. IHU SocialSensor Seminar – May 2014 Other Application Areas • Science – Sociology, machine learning (machine as a teacher), computer vision (annotation) • Tourism – Leisure – Culture – Off-the-beaten path POI extraction • Marketing – Brand monitoring, personalised ads • Prediction – Politics: election results • News – Topics, trends event detection • Others – Environment, emergency response, energy saving, etc
  24. 24. IHU SocialSensor Seminar – May 2014 Conclusions – Further topics • Social media data useful in many applications • Not all data always available (e.g. User queries, fb) – Infrastructure – Policy - Privacy issues • Real-time and scalable approaches – Efficiency of semantics and analysis vs. performance vs. infrastructure • Fusion of various modalities – Content, social, temporal, location • Verification & Linking other sources (web, Linked Open Data) • Visualization - Interfaces • Applications and commercialization • User engagement
  25. 25. IHU SocialSensor Seminar – May 2014 Reusable results • Starting point: http://www.socialsensor.eu/results – Deliverables – Publications – Datasets – Software – e-letter: http://stcsn.ieee.net/e-letter/vol-1-no-3 • Open-source projects (Apache License v2): https://github.com/socialsensor – Data collection (stream-manager, storm-focused-crawler) – Indexing (framework-client, multimedia-indexing) – Mining (topic-detection, multimedia-analysis, community-evolution- analysis, social-event-detection)
  26. 26. IHU SocialSensor Seminar – May 2014 European Centre for Social Media • Topics – Social media analytics – Verification – Visualisation – Applications in different domains • Activities – Listings of project, results, institutions, events – Community building – Support/organise events – Common social media presence (e.g. LinkedIn) – Funding from subscriptions, training, commercialisation – Supporting projects: SocialSensor, Reveal, MULTISENSOR, PHEME, DecarboNet, MWCC, uComp, – Website: http://www.socialmediacentre.eu/ – Research-academic: STCSN http://stcsn.ieee.net/
  27. 27. Thank you for your attention! ikom@iti.gr http://mklab.iti.gr

×