Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SOCIAL MEDIA MINING AND         MULTIMEDIA ANALYSIS      RESEARCH AND APPLICATIONS                          Yiannis Kompat...
Contents•  Introduction•  Emergent Semantics from Social Media     •  Opportunities and Challenges     •  Applications•  R...
Social networks and media •  Users upload, tag, share,    connect and search      •    Over 800 million unique users visit...
Web 2.0 Content  •    Multi-modality: e.g. image + tags, image + video  •    Rich (Social) Context: spatio-temporal, socia...
s                                                   Comm                                                        FavsTime  ...
Social Web as a graphUniversity of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
social#web#as#a#graph#     announcement&of&Mubarak’s&resigna<on&     nodes&=&twi+er&users&     edges&=&retweets&on&#jan25&...
blogosphere"as"a"graph"                                      technical&4&gadgets&       nodes&=&blogs&                    ...
Two main directions •  1. Improve access to social media      §  Tag refinement, suggestion, propagation, concept        ...
Tags everywhere                           Sharing, describe content and searchUniversity of Surrey, CVSSP Seminar         ...
Very low precisionUniversity of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Very low recallUniversity of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Can we improve things?     By combining information from many      photos - tags, it seems that we can                Stab...
“Single” media item analysis •  Use features of large number of similar content      §  E.g. visual and textual features ...
Social Networks and Collective Intelligence •    Social Networks is a data source with an extremely      dynamic nature th...
Social Networks and Collective Intelligence                                      •    “If a group has a means of          ...
Social Networks and Collective Intelligence                                      •    “Social networks have emergent      ...
Extraction of implicit information  trace Flickr users from a chronologically ordered set of  geographically referenced ph...
What else we can do?                                       Contribute to our                                       underst...
Sensors and automatically user generated content Uses the GPS in cellular phones   to gather traffic information,   proces...
Applications Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using ...
Social Media as real-time Sensors “…if youre more than 100 km away from the   epicenter [of an earthquake] you can read ab...
Applications•  Science     •  Sociology, machine learning, computer vision (annotation)•  Tourism – Leisure – Culture     ...
Research Fields and Issues•  Statistical analysis, machine learning, data mining,   pattern recognition, social network an...
Social Media Community                     DetectionUniversity of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Examples of Social Media networks              Folksonomy (Delicious)                                                MetaG...
What is a community in a network? Group of vertices that are more densely connected to each   other than to the rest of th...
Subgraphs                                    k=3)(triangle))   k=4)           k=5)   •  k"clique)Each node isconnected to ...
Approach illustration (1/2)  Two-step process:  • 1st step:       (µ, ε) – core detection  •  2nd step:     Local expansio...
Approach illustration (2/2) •  Structural	  similarity	  +	  Local	      expansion	  	           (highly	  efficient	  and	 ...
LYCOS iQ Tag Network                                                       Computers:                                     ...
Hybrid photo ClusteringUniversity of Surrey, CVSSP Seminar   Guildford, 31 July, 2012                                     ...
landmark	  University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012                                             even...
Photo clustering results Most clusters correspond to landmarks or events                                                  ...
Sample results:   [Visual] vs. [Tag] vs. [Visual + Tag]                  VISUAL                                      HYBRI...
Numericalonresults: Geospatial clusternot include   Table 1. Cluster quality comparison between SCAN and k-means approache...
clusour.gr	  applica/on	                                         PHOTOS	  	  METADATA	                                    ...
DIVERSE	  SET	  OF	  AREA	  PHOTOS	  	                                           PHOTO	  CLUSTER	  SUMMARY	  TIME	  SLICES...
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Available	  on	  AppStore	                                http://clusttour.gr/itunes	  University of Surrey, CVSSP Seminar...
Social Media “teacher” of the                 machineUniversity of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Self training + Social Media                         Manually labelled         High Quality Annotations                   ...
Challenges Region instead of image annotation      E.g. tags are global annotations, while local ones        are needed Im...
Proposed approach Adapted self training for region selection      − Initial models are trained using labelled regions     ...
Dismiss non-                                      informative regions                                                     ...
Experimental Setup •      SAIAPR TC-12 dataset (imageCLEF) 20k manually        labelled images split into 3 subsets       ...
Performance Comparison of Retrained models                                The configuration incorporating both visual and ...
Current work •    Application to global (whole image) annotation •    Introduction of visual ambiguity for improved select...
Semi-supervised machine       learning for concept detectionUniversity of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Concept	  Detec/on	     •  Use	  of	  similarity	  graph	  structure	  for	  machine	        learning	     •  Exploit	  mu...
Spectral	  Graph	  Clustering	                           Example:	  Values	  of	  second	  eigenvector	                   ...
Fusion	  (1)	  University of Surrey, CVSSP Seminar             Guildford, 31 July, 2012                                   ...
Fusion	  (2)	  University of Surrey, CVSSP Seminar             Guildford, 31 July, 2012                                   ...
MIR-­‐Flickr	  Experimental	  Results	   25000	  images	  +	  labels,	  38	  concepts	  University of Surrey, CVSSP Semina...
Proposed	  Approach	  Vs.	  Hare	  	  Lewis,	  2010	  University of Surrey, CVSSP Seminar                  Guildford, 31 J...
Proposed	  Approach	  Vs.	  Guillaumin	  et	  al.,	  2010	  University of Surrey, CVSSP Seminar                   Guildfor...
Other	  relevant	  approaches	   •  S.	  Nikolopoulos,	  E.	  Giannakidou,	  I.	  Kompatsiaris,	  I.	  Patras,	  and	  A.	...
VLAD+SIFT	  vs.	  VLAD+SURF	   	   	  	   Accuracy	  vs.	  dimensionality	   VLAD+SURF	  improves	  VLAD+SIFT	  and	  FV+S...
SocialSensor          Applications and Use Cases                      hp://www.socialsensor.eu	  	  University of Surrey, ...
“Social media is transforming the way we do journalism”(New York Times)	  	    	              “Social media is the key pla...
University of Surrey, CVSSP Seminar         Guildford, 31 July, 2012                                      #65
“It’s really hard to find the nuggets of useful stuff            in an ocean of content” (BBC)	  	    	  “Things that aren...
Verifica/on	  was	  simpler	  in	  the	  past...	  Source: Frank GrätzUniversity of Surrey, CVSSP Seminar                  ...
Infotainment	    Events	  with	  large	  numbers	      of	  visitors	    Thessaloniki	  Interna-onal	      Film	  Fes-val	...
Conclusions and Issues•  Social media data mining provides interesting   results in many applications•  Not all data alway...
Colleagues•  Dr. Symeon Papadopoulos     •    Community detection     •    Graph-based concept detection     •    Visual F...
Thank	  you!	                               hp://mklab.i-.gr	                                      	  University of Surrey...
Scalability Challenges  •  Network, crawling, data collection      •     Streaming data  •  Users      •     High numbers ...
Some StatisticsUniversity of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Datasift ArchitectureUniversity of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Datasift processing•    Process the whole firehose: +250 MTweets/day•    40+ services run in the system•    handling the f...
Datasift statistics•  Current Peak Delivery of 120,000 Tweets Per Second   (260Mbit bandwidth)•  Performs 250+ million sen...
Frameworks•  MapReduce (Hadoop)     •  Computation distribution     •  Batch processing of huge datasets     •  Parallel p...
Scalability Processing Approaches •  Sampling •  Dimensionality reduction      •    E.g. VLAD •  Local computations •  Ite...
Image Representation Approaches Bag-Of-Words (BOW)      The most widely used      Memory usage and search time are usually...
Experimental Results (holidays dataset)           method k              descriptor dimension MAP           BOW          1K...
Experimental Results (holidays dataset)     method k              descriptor D              D’ (pca) MAP     BOW          ...
Upcoming SlideShare
Loading in …5
×

Social media mining and multimedia analysis research and applications

7,550 views

Published on

In this talk, research and applications in social media mining and multimedia analysis are going to be presented. Social media sharing websites host billions of images and videos, which have been annotated and shared among friends, or published in groups that cover a specific topic of interest. The fact that users annotate and comment on the content in the form of tags, ratings, preferences and so on, and that these activities are performed on a daily basis, gives such social media data source an extremely dynamic nature that reflects topics of interests, events and the evolution of community opinion and focus.

The talk will present research challenges and activities and will focus on multi-modal graph-based community detection methods for social media mining, concept and event detection. Clusttour, a mobile and web application integrating research results with appropriate interface design will be demonstrated as a relevant use case. The talk will also include approaches for object/region classifiers learned using the self-training paradigm with loosely annotated training samples automatically selected from social media.

Published in: Technology, Business
  • Be the first to comment

Social media mining and multimedia analysis research and applications

  1. 1. SOCIAL MEDIA MINING AND MULTIMEDIA ANALYSIS RESEARCH AND APPLICATIONS Yiannis Kompatsiaris Informatics and Telematics Institute Centre for Research and Technology - Hellas h"p://mklab.i-.gr    University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  2. 2. Contents•  Introduction•  Emergent Semantics from Social Media •  Opportunities and Challenges •  Applications•  Research Approaches •  Community detection in Social Media •  Social media “teacher” of the machine •  Concept detection•  SocialSensor Applications•  Conclusions - IssuesUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  3. 3. Social networks and media •  Users upload, tag, share, connect and search •  Over 800 million unique users visit YouTube each month •  Over 3 billion hours of video are watched each month on YouTube •  72 hours of video are uploaded to YouTube every minute •  Emphasis is on uploading, visualization of results and interfaces •  User engagement •  Single media item analysis •  Usage of the Collective nature of Social NetworksUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  4. 4. Web 2.0 Content •  Multi-modality: e.g. image + tags, image + video •  Rich (Social) Context: spatio-temporal, social connections, relations and social graph •  Huge volume: Massively produced and shared •  Dynamic: Fast updates, real-time, streaming feeds •  Multi-source: may be generated by different applications, user communities, e.g. delicious, StumbleUpon and reddit are all social bookmarking sites •  Also connected to other sources (e.g. LOD, web) •  Inconsistent quality: noise, spam, ambiguityUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  5. 5. s Comm FavsTime Tags Capti on User ProfileUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  6. 6. Social Web as a graphUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  7. 7. social#web#as#a#graph# announcement&of&Mubarak’s&resigna<on& nodes&=&twi+er&users& edges&=&retweets&on&#jan25&hashtag& h1p://gephi.org/2011/the7egyp9an7revolu9on7on7twi1er/# 10#University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  8. 8. blogosphere"as"a"graph" technical&4&gadgets& nodes&=&blogs& society&4&poli5cs& edges&=&hyperlinks& h-p://datamining.typepad.com/gallery/blog8map8gallery.html" 9"University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  9. 9. Two main directions •  1. Improve access to social media §  Tag refinement, suggestion, propagation, concept detection §  Result apply to single media items •  2. Extract implicit information, capture emergent semantics §  Exploit explicit and implicit relations §  Not explicitly identifiable by users §  Data mining, Collective Intelligence Scalable approaches taking into account the content and social context of social networksUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  10. 10. Tags everywhere Sharing, describe content and searchUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  11. 11. Very low precisionUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  12. 12. Very low recallUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  13. 13. Can we improve things? By combining information from many photos - tags, it seems that we can Stable patterns in tagging systems over timeUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  14. 14. “Single” media item analysis •  Use features of large number of similar content §  E.g. visual and textual features and similarity §  Tag refinement, suggestion, propagation, concept detectionUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  15. 15. Social Networks and Collective Intelligence •  Social Networks is a data source with an extremely dynamic nature that reflects events and the evolution of community focus (user’s interests) •  Web 2.0 data consists of individually rare but collectively frequent events and topics •  Potential for much more if we mine the data and their relations and exploit them in the right context •  Search and Discovery of meaningful topics, entities, points of interest, social connections and events •  Rather than search for isolated or directly connected social media itemsUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  16. 16. Social Networks and Collective Intelligence •  “If a group has a means of aggregating different opinions, the group collective solution may well be smarter than even the smartest person’s solution” •  Conditions •  Diversity (large-scale) •  Independence •  Aggregation •  Motivation for best guess •  GamificationUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  17. 17. Social Networks and Collective Intelligence •  “Social networks have emergent properties. Emergent properties are new attributes of a whole that arise from the interaction and interconnection of the parts” •  Emotions, Health, Sexual relationships do not depend just on our connections (e.g. number of them) but on our position - structure in the social graph •  Central – Hub •  Outlier •  Transitivity (connections between friends)University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  18. 18. Extraction of implicit information trace Flickr users from a chronologically ordered set of geographically referenced photos Who are the Italians and who are the Americans? MIT SENSEABLE CITY LAB, “The Worlds eyes”University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  19. 19. What else we can do? Contribute to our understanding of Tags that are “representative” the world for a geographical area •  1. Clustering of photos §  K-means, based on their location [Kennedy07] •  2. Rank each cluster’s tags •  3. Get tags above a certain Representative tags for San Francisco [Kennedy07] thresholdUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  20. 20. Sensors and automatically user generated content Uses the GPS in cellular phones to gather traffic information, process it, and distribute it back to the phones in real time •  online, real-time data processing •  privacy-preservation •  data efficiency, i.e. not requiring excessive cellular network Mobile Century Project: http:// traffic.berkeley.edu/mobilecentury.htmlUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  21. 21. Applications Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using flickr for prediction and forecast, International conference on Multimedia (MM 10). ACM. Federal Emergency Management Agency plans to engage the public more in disaster response by sharing data and leveraging reports from mobile phones and social media Gogobot: Travel Discovery Goes Social And Visual ”The service raised $4 million in funding (Google CEO Eric Schmidt is one of the investors)…This is a $100 billion a year industry in the U.S. It’s something like $350 billion worldwide.”University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 21
  22. 22. Social Media as real-time Sensors “…if youre more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twitter before it hits you…”University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  23. 23. Applications•  Science •  Sociology, machine learning, computer vision (annotation)•  Tourism – Leisure – Culture •  Off-the-beaten path POI extraction•  Marketing •  Brand monitoring, personalised ads•  E-Gov and e-participation •  Direct citizens feedback (fixmystreet app)•  News •  Topics, trends event detection•  Others •  Environment, emergency response, energy saving, etcUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  24. 24. Research Fields and Issues•  Statistical analysis, machine learning, data mining, pattern recognition, social network analysis •  Clustering•  Image, text, video feature extraction and analysis•  Representation, modeling, data reduction •  Graph theory•  Fusion techniques•  Stream processing and real-time architectures•  Performance, scalability•  Multi-disciplinarity (sociologists)•  Security, privacyUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  25. 25. Social Media Community DetectionUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  26. 26. Examples of Social Media networks Folksonomy (Delicious) MetaGraph (Digg) Lin, Y., Sun, J., Castro, P., Konuru, R., Sundaram, H., and Mika, P. (2005) Ontologies Are Us: A Unified Model of Social Networks Kelliher, A. (2009) MetaFac: community discovery via relational and Semantics. Proceedings of the 4th International Semantic Web hypergraph factorization. Proceedings of KDD 09, ACM, pp. Conference (ISWC 2005), Springer Berlin / Heidelberg, pp. 522-536 527-536University of Surrey, CVSSP Seminar Guildford, 31 July, 26 2012
  27. 27. What is a community in a network? Group of vertices that are more densely connected to each other than to the rest of the network. Multiple definitions to quantify communities: Fortunato S. (2010) Community detection in graphs. Physics Reports486: 75-174 S. Papadopoulos, Y. Kompatsiaris, A. Vakali, P. Spyridonos. “Community Detection in Social Media”. In Data Mining and Knowledge Discovery, DOI: 10.1007/s10618-011-0224-z intra-community edge inter-community edgeUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012 27
  28. 28. Subgraphs k=3)(triangle)) k=4) k=5) •  k"clique)Each node isconnected to all k-1nodes •  N"clique) N=2)(star)) N is the length of the path allowed to all other members 2"core) •  k"core) 4"core) 1"core)all vertices have 3"core)degree at least k 0"core) 31)University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  29. 29. Approach illustration (1/2) Two-step process: • 1st step: (µ, ε) – core detection •  2nd step: Local expansion •  3rd step: Characterization of remaining vertices as hubs or outliersUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  30. 30. Approach illustration (2/2) •  Structural  similarity  +  Local   expansion     (highly  efficient  and  scalable  approach)   •  Not  necessary  to  know  the  number   +  of  clusters   •  Noise  resilient   (not  all  nodes  need  to  be  part  of  a   community)   •  Generic  approach  adaptable  to      many  applica-ons   (depending  on  node  –  edge   representa-on)   S.  Papadopoulos,  Y.  Kompatsiaris,  A.  Vakali.  “A  Graph-­‐based  Clustering  Scheme  for  Iden-fying  Related  Tags  in  Folksonomies”.   In  Proceedings  of  DaWaK10,  Springer-­‐Verlag,  65-­‐76    University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  31. 31. LYCOS iQ Tag Network Computers: A densely interconnected community History: A star-shaped communityUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  32. 32. Hybrid photo ClusteringUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012 32
  33. 33. landmark  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 event  
  34. 34. Photo clustering results Most clusters correspond to landmarks or events EVENTS baptism LANDMARKS conference castelsUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012 34
  35. 35. Sample results: [Visual] vs. [Tag] vs. [Visual + Tag] VISUAL HYBRID TAGUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012 35
  36. 36. Numericalonresults: Geospatial clusternot include Table 1. Cluster quality comparison between SCAN and k-means approaches. The performance is evaluated separately visual and tag-based features and for multiple values of k. We could coherence in the tag cluster comparison because the large number of K led to an estimated k-means with K ‫3 ؍‬M execution time of over a week. Geospatial cluster coherence Clustering method (m stands for meters) Subjective cluster quality Cluster type (number of clusters) md (m) sd (m) P R F Visual SCANVIS (560) 357.1 1185.7 1.000 0.110 0.199 1.000 KMVIS,1M (560) 2470.0 1734.4 0.806 0.324 0.462 0.226 KMVIS,2M (1,120) 2249.7 1893.7 0.899 0.294 0.443 0.544 KMVIS,3M (1,680) 2183.1 2027.4 0.929 0.271 0.420 0.719 Tag SCANTAG-C (1,774) 767.4 1712.0 0.898 0.253 0.394 0.642 SCANTAG-LSI (4,027) 456.3 1151.1 0.950 0.182 0.306 0.820 KMTAG,1M (4,027) 766.8 1762.7 0.848 0.307 0.451 0.564 KMTAG,2M (8,054) 563.2 1528.7 0.903 0.258 0.401 0.707 For 29 landmark clusters, the automatically generated cluster center more precise than the ones produced by similarity graphs. We found that the best fell on average within 80 meters of the actual landmark position k-means clustering. In terms of the GCC mea- information-retrieval performance is achieved S. Papadopoulos, C. Zigkolis, Y. are clearly su- by use of the“Cluster-based graph. More spe- sure, the SCAN-produced clusters Kompatsiaris, A. Vakali. hybrid similarity Landmark and perior to the k-means Tagged Photo Collections”. In IEEE Multimedia Magazine 18(1), Event Detection on ones, which indicates cifically, the F-measure of the HYB image clus- pp. 52-63, 2011 better geographical focus and thus better corre- ters was 28.5 percent higher than the one of spondence to landmarks and events (which are VIS clusters and 19.8 percent higher than the usually highly localized). The difference in GCC one of TAG-C clusters. The interannotatorUniversity of Surrey, CVSSPfor visual clusters. The is especially pronounced Seminar Guildford, 31 July, 2012 agreement for these results was substantial, be- 36 actual GCC performance of k-means clustering cause in all cases the obtained -statistic values
  37. 37. clusour.gr  applica/on   PHOTOS    METADATA   SPATIAL  CLUSTERING  +  TEMPORAL  ANALYSIS   tags:  sagrada  familia,   cathedral,  barcelona   taken:  12  May  2009   lat:  41.4036,  lon:  2.1743   CLASSIFICATION  TO  LANDMARKS/EVENTS   #users  /  #photos   COMMUNITY  DETECTION   ]   0  photos 50  u sers  /  12 [2  years,   VISUAL   TAG   HYBRID   0  photos]   [1  day,  2  users  /  1 dura-on   S.   Papadopoulos,   C.   Zigkolis,   Y.   Kompatsiaris,   A.   Vakali.   “Cluster-­‐based   Landmark   and   Event   Detec-on   on   Tagged   Photo   Collec-ons”.  In  IEEE  Mul-media  Magazine  18(1),  pp.  52-­‐63,  2011  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  38. 38. DIVERSE  SET  OF  AREA  PHOTOS     PHOTO  CLUSTER  SUMMARY  TIME  SLICES   ORIGINAL  PHOTO  METADATA  PHOTO  CLUSTERS  RANKED  BY   AREA  TAGS  POPULARITY   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  39. 39. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  40. 40. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  41. 41. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  42. 42. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  43. 43. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  44. 44. Available  on  AppStore   http://clusttour.gr/itunes  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  45. 45. Social Media “teacher” of the machineUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  46. 46. Self training + Social Media Manually labelled High Quality Annotations data Expensive to generate Enhance training set with unlabelled data based on the Train classifier classifier’s decision Visual + Apply on unlabelled Crowdsourcing – data Textual social mediaUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  47. 47. Challenges Region instead of image annotation E.g. tags are global annotations, while local ones are needed Imperfect segmentation •  Adaptive size region selection •  Visual and textual similarity ambiguity •  Fusion of scoresUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  48. 48. Proposed approach Adapted self training for region selection − Initial models are trained using labelled regions − The models are applied on regions extracted by loosely tagged images (obtained at almost no cost) ü Dismiss regions that are relatively too small to be useful − Tags add an extra layer of confidence in the selection process ü Semantic relatedness between concepts and tags is calculated using either WordNet or a modified version of Google Similarity Distance − Select regions based on visual and textual information and use them to enhance the positive training setUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  49. 49. Dismiss non- informative regions Combine Visual and Textual information Use the selectedsamples to enhance the positivetraining setUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  50. 50. Experimental Setup •  SAIAPR TC-12 dataset (imageCLEF) 20k manually labelled images split into 3 subsets •  train 14k images (used for testing the proposed approach directly) – 70% •  validation 2k images (used as the initial training set) – 10% •  test 4k images (used for evaluation) – 20% •  MIRFlickr-1m •  1 million loosely tagged images (used for selecting regions to enhance the initial classifiers) E. Chatzilari, S. Nikolopoulos, Y. Kompatsiaris, J. Kittler. Multi-Modal Region Selection Approach for Training Object Detectors, ICMR 2012, Hong Kong - China, June 2012University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  51. 51. Performance Comparison of Retrained models The configuration incorporating both visual and textual information exhibits the highest performance in 44 out of the 63 examined concepts, compared to 4 for the typical self training configuration and 15 for the configuration based on the initial classifiers. Validation Visual Visual*Textual The proposed approach forWithout ARD 4.9 6 adaptive region dismissal greatly With ARD 5.7 5.1 7 increases the performance of the resulting classifiers.University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  52. 52. Current work •  Application to global (whole image) annotation •  Introduction of visual ambiguity for improved selection of training samples •  Learning of concepts which are visually similar and co-occur in images •  E.g. “sea” – “sky” •  Do not select such training samplesUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  53. 53. Semi-supervised machine learning for concept detectionUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  54. 54. Concept  Detec/on   •  Use  of  similarity  graph  structure  for  machine   learning   •  Exploit  mul--­‐modal  informa-on  through  different   fusion  techniques    University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #54  
  55. 55. Spectral  Graph  Clustering   Example:  Values  of  second  eigenvector   of  normalized  Laplacian  matrix  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #55  
  56. 56. Fusion  (1)  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #56  
  57. 57. Fusion  (2)  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #57  
  58. 58. MIR-­‐Flickr  Experimental  Results   25000  images  +  labels,  38  concepts  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #58  
  59. 59. Proposed  Approach  Vs.  Hare    Lewis,  2010  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #59  
  60. 60. Proposed  Approach  Vs.  Guillaumin  et  al.,  2010  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #60  
  61. 61. Other  relevant  approaches   •  S.  Nikolopoulos,  E.  Giannakidou,  I.  Kompatsiaris,  I.  Patras,  and  A.   Vakali,  “Combining  mul/-­‐modal  features  for  social  media   analysis,  in  book  Social  Media  Modeling  and  Compu-ng,  Springer   2011   •  pLSA-­‐based  aspect  models  to  define  a  latent  seman-c  space  where   heterogeneous  types  of  informa-on  can  be  effec-vely  combined   •  Georgios  Petkos,  Symeon  Papadopoulos,  Yiannis  Kompatsiaris,   “Social  Event  Detec/on  using  Mul/modal  Clustering  and   Integra/ng  Supervisory  Signals”,  ICMR  2012.   •   E.  Spyromitros-­‐Xioufis,  S.  Papadopoulos,  I.  Kompatsiaris,  G.   Tsoumakas,  I.  Vlahavas.  An  Empirical  Study  on  the  Combina/on  of   SURF  Features  with  VLAD  Vectors  for  Image  Search”  WIAMIS  2012,   Dublin,  Ireland,  May  2012  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #61  
  62. 62. VLAD+SIFT  vs.  VLAD+SURF         Accuracy  vs.  dimensionality   VLAD+SURF  improves  VLAD+SIFT  and  FV+SIFT  across  all  dimensions  in  both   Holidays  and  Oxford  datasets   Results  in  rows  star-ng  with  *  are  taken  from  Jégou  et  al.,  2011,    hence  the  missing  values  for  some  entries.   SIFT  corresponds    to  PCA  reduced  SIFT  which  yielded  beer  results  than  standard  SIFT  in  Jegou  et  al.,  2011  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #62
  63. 63. SocialSensor Applications and Use Cases hp://www.socialsensor.eu    University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  64. 64. “Social media is transforming the way we do journalism”(New York Times)     “Social media is the key place for emerging stories – internationally, nationally, locally” (BBC) “It has changed the way we donews”(MSN) Source: picture alliance / dpa University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #64
  65. 65. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #65
  66. 66. “It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC)    “Things that aren’t relevant crowd out the content  you are looking for” (MSN)       “The filters aren’t configurable enough” (CNN)      University of Surrey, CVSSP Seminar Guildford, 31 July, 2012   #66                                              Source:  Gey  Images  
  67. 67. Verifica/on  was  simpler  in  the  past...  Source: Frank GrätzUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #67
  68. 68. Infotainment   Events  with  large  numbers   of  visitors   Thessaloniki  Interna-onal   Film  Fes-val     80,000  viewers  /  100,000   visitors  in  10  days   150  films,  350  screenings   Discovery  and  presenta-on   of  relevant  aggregated   social  media  (e.g.  film   ra-ngs  from  tweets)  University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #68
  69. 69. Conclusions and Issues•  Social media data mining provides interesting results in many applications•  Not all data always available (e.g. User queries, fb) •  Infrastructure •  Policy issues•  Scalability and Real-time approaches•  Fusion of various modalities •  Content, social, temporal, location•  Linking other sources (web, Linked Open Data)•  Applications and commercialization •  Proven functionality for the organization •  User engagementUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  70. 70. Colleagues•  Dr. Symeon Papadopoulos •  Community detection •  Graph-based concept detection •  Visual Features•  Dr. Georgios Petkos •  Multimodal event detection•  Dr. Spiros Nikolopoulos •  pLSA fusion•  Elisavet Chatzilari (PhD Student) •  Social media for learning•  Lefteris Spyromitros (PhD Student) •  Visual Features•  Juxhin Bakalli and Manos Schinas •  Applications development (Clusttour and ThessFest)•  Prof. Athina Vakali (Informatics Dept, AUTh) •  Collaboration in Community Detection / ClusttourUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  71. 71. Thank  you!   hp://mklab.i-.gr    University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  72. 72. Scalability Challenges •  Network, crawling, data collection •  Streaming data •  Users •  High numbers of users •  Processing (e.g. NLP, clustering, etc) •  Links •  Web sites •  Retweets, mentions, etc •  Multimedia content (e.g. images, YouTube videos)University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  73. 73. Some StatisticsUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  74. 74. Datasift ArchitectureUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  75. 75. Datasift processing•  Process the whole firehose: +250 MTweets/day•  40+ services run in the system•  handling the firehose•  low latency natural language processing and entity extraction on tweets•  low latency in-line augmentation of tweets•  low latency handling very large individual filters•  keeping a history of the firehose by persisting the 1TB of data it sends each day•  allowing analytics to be run on the history of the firehose•  real-time billing•  streaming filter results to 1000s of clients•  http://highscalability.com/blog/2011/11/29/datasift- architecture-realtime-datamining-at-120000-tweets-p.htmlUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  76. 76. Datasift statistics•  Current Peak Delivery of 120,000 Tweets Per Second (260Mbit bandwidth)•  Performs 250+ million sentiment analysis with sub 100ms latency•  1TB of augmented (includes gender, sentiment, etc) data transits the platform daily•  Data Filtering Nodes Can process up to 10,000 unique streams (with peaks of 8000+ tweets running through them per second)•  Can do data-lookups on 10,000,000+ username lists in real- time•  Links Augmentation Performs 27 million link resolves + lookups plus 15+ million full web page aggregations per day.•  http://highscalability.com/blog/2011/11/29/datasift- architecture-realtime-datamining-at-120000-tweets-p.htmlUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  77. 77. Frameworks•  MapReduce (Hadoop) •  Computation distribution •  Batch processing of huge datasets •  Parallel processing on large clusters of compute nodes•  Cassandra, Tokyo Cabinet •  Key value stores •  Horizontal scaling for many users •  Huge Data indexing •  Fault tolerance •  Not sophisticated query possibilities•  MongoDB •  JSON native support •  Large-Scale data storage•  Memcached •  Efficient caching •  ClusteringUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  78. 78. Scalability Processing Approaches •  Sampling •  Dimensionality reduction •  E.g. VLAD •  Local computations •  Iterative scanning/processing •  stream based •  Multi-level – Hierarchical •  DistributedUniversity of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  79. 79. Image Representation Approaches Bag-Of-Words (BOW) The most widely used Memory usage and search time are usually prohibitive for 10M images Vector of Locally Aggregated Descriptors VLAD More accurate than BOW for the same representation size Cheaper to compute Dimensionality can be further reduced with PCA without noticeable impact in accuracy.University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  80. 80. Experimental Results (holidays dataset) method k descriptor dimension MAP BOW 1K SIFT-pca 1K 40.1 BOW 20K SIFT-pca 20K 43.7 BOW 200K SIFT-pca 200K 54.0 VLAD 64 SIFT-pca 4096 55.6 VLAD 64 SIFT 8192 55.2 VLAD 128 SIFT 16384 56.7 VLAD 64 SURF 4096 63.2 VLAD 128 SURF 8192 65.6 Fisher 64 SIFT-pca 4096 59.7 Fisher 256 SIFT-pca 16384 62.5University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 Eleftherios Spyromitros-Xioufis 1/8/12
  81. 81. Experimental Results (holidays dataset) method k descriptor D D’ (pca) MAP BOW 20K SIFT-pca 20K 512 44.9 128 45.2 64 44.4 VLAD 64 SIFT-pca 4096 512 59.8 128 55.7 64 55.3 VLAD 64 SURF 4096 512 63.4 128 58.6 64 55.6 Fisher 64 SIFT-pca 4096 512 61.0 128 56.5 64 52.0University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 Eleftherios Spyromitros-Xioufis 1/8/12

×