ESSIR 2013 - IR and Social Media

  • 901 views
Uploaded on

Social media sites (by some referred to as the web 2.0) allow their users to interact with each other, for example in collecting and sharing so-called user-generated content - these can be just …

Social media sites (by some referred to as the web 2.0) allow their users to interact with each other, for example in collecting and sharing so-called user-generated content - these can be just bookmarks, but also blogs, images, and videos. Social media support co-creation: processes where customers (or users, if you prefer) do not just consume but play an active role in defining and shaping the end product. Famous examples include Six Degrees, LiveJournal, Digg, Epinions, Myspace, Flickr, YouTube, Linked-in, and Pinterest. Of course, today's internet giants Facebook and Twitter are key new developments. Finally, Wikipedia should not be overlooked - a major resource in many language technologies including information retrieval!

The second part of the lecture looks into the opportunities for information retrieval research. Social media platforms tend to provide access to user profiles, connections between users, the content these users publish or share, and how they react to each other's content through commenting and rating. Also, the large majority of social media platforms allow their users to categorize content by means of tags (or, in direct communication, through hash-tags), resulting in collaborative ways of information organization known as folksonomies. However, these social media also form a challenge for information retrieval research: the many platforms vary in functionalities, and we have only very little understanding of clearly desirable features like combining tag usage and ratings in content recommendation! A unifying approach based on random walks will be discussed to illustrate how we can answer some of these questions [1], but clearly the area has ample opportunity to leave your own marks.

In the final part of the lecture I will briefly touch upon an even wider range of opportunities, where data derived from social media form a key component to enable new research and insights. I will review a few important results from research centered on Wikipedia, facebook and twitter data, as well as a diverse range of new information sources including the geo- and temporal information derived from images and tweets, product reviews and comments on youtube videos, and how url shorteners may give a view on what is popular on the web.

[1] Maarten Clements, Arjen P. De Vries, and Marcel J. T. Reinders. 2010. The task-dependent effect of tags and ratings on social media access. ACM Trans. Inf. Syst. 28, 4, Article 21 (November 2010), 42 pages. http://doi.acm.org/10.1145/1852102.1852107

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
901
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
43
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 9th European Summer School in Information Retrieval September 4th, 2013 http://bit.ly/ESSIR13IRSocMedia IR and Social Media Arjen P. de Vries arjen@acm.org Centrum Wiskunde & Informatica Delft University of Technology Spinque B.V.
  • 2. On slideshare, IR = Investor Relations
  • 3. Social Media Noun social media (plural only) Interactive forms of media that allow users to interact with and publish to each other, generally by means of the Internet. The early 21st century saw a huge increase in social media thanks to the widespread availability of the Internet.
  • 4. http://www.webanalyticsworld.net/2010/11/history-of-social-media-infographic.html
  • 5. Social Media  “Social bookmarking” sites  “User generated content”  Images (flickr) and videos (youtube, vimeo), but also blogs  Social network services  Twitter, facebook
  • 6. Not just one beast!
  • 7. IR and Social Media?
  • 8. Red Hot Chili Peppers
  • 9. “Rock group” in author’s metadata... Organisation in groups may help disambiguate query! More implicit metadata...
  • 10. Information Science “Search for the fundamental knowledge which will allow us to postulate and utilize the most efficient combination of [human and machine] resources”  M.E. Senko. Information systems: records, relations, sets, entities, and things. Information systems, 1(1):3–13, 1975.
  • 11. Core Questions  How to represent information?  The information need and search requests  The objects to be shown in response to an information request  How to match information representations?
  • 12. IR and Social Media  Richer information representations!
  • 13. Richer representations  User profiles  User name, full name, description, image, homepage url, etc.  Connections between users  Networks of friends, followers, etc  Comments/reactions  Endorsing and sharing
  • 14. Q: Web ancient social media?
  • 15. (C) 2008, The New York Times Company Anchor tekst: “continue reading”
  • 16. Not a lot of info to represent the page… Een fan’s hyves page: Kyteman's HipHop Orchestra: www.kyteman.com Kaartverkoop luxor theater: 22 mei - Kyteman's hiphop Orkest - www.kyteman.com Kluun.nl: De site van Kyteman Blog Rockin’ Beats: De 21-jarige Kyteman (trompettist, componist en Producer Colin Benders), heeft drie jaar gewerkt aan zijn debuut: the Hermit sessions. Jazzenzo: ...een optreden van het populaire Kyteman’s Hiphop Orkest
  • 17. ‘Co-creation’  Social Media:  Consumer becomes a co-creator  ‘Data consumption’ traces  In essence: many new sources to play the role of anchor text  Tags and/or ratings  Tweets  Comments, reviews
  • 18. Potential Benefits for IR  Expand content representation  Reduce the vocabulary gap(s) between creators of content, indexers, and users  More diverse views on the same content
  • 19. Potential Benefits for IR  Relevance depends on user context  User task  User knowledge
  • 20. Potential Benefits for IR  Relevance depends on user context  User task  User knowledge  Social media provide an opportunity to make much better assumptions about user context  A specific user’s context  The variety of user contexts that may exist
  • 21. Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders. The task dependent effect of tags and ratings on social media access. TOIS 28, 4, article 21 (November 2010), 42 pages.
  • 22. LibraryThing
  • 23. LibraryThing  Items  People  Tags  Ratings See also: http://www.macle.nl/tud/LT/
  • 24. Synonyms
  • 25. Synonyms
  • 26. Examples  Humour  Classic
  • 27. LibraryThing  Items  People  Tags  Ratings See also: http://www.macle.nl/tud/LT/
  • 28. Search with Random Walk  Present nodes according to estimated probability that a random walk that starts from (task dependent) starting nodes, would end at this node  E.g., tag suggestion starts in a tag node; personalized search in tag and user nodes
  • 29. Tagging Relationships
  • 30. An item recommendation walk
  • 31. Ratings  Ratings may enhance the graph, or just be used for evaluation
  • 32. Personalized Search  Assume a user who types a single tag as query
  • 33. Personalized Search
  • 34.  A soft clustering effect smoothly relates similar concepts before converging to the background probability
  • 35.  Homographs like “Java” are disambiguated because the walk starts in both the query tag and the target user  So, content that matches the user’s preference is more likely to be found first
  • 36. Common System Designs
  • 37. Analysis results  Allowing all users to tag all available content improves retrieval tasks  Combining tags and ratings may improve both search and recommendation tasks
  • 38. Ternary relation lost!  The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices
  • 39. Ternary relation lost!  The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices  Potentially a problem if tags express opinion about an item; e.g.,  “poetry” can independent from item still describe the user  “awful” requires to know what item the term belongs to
  • 40. Tags vs. rating  Most tags do not deviate far from the mean rating  Only few tags strongly correlated with opinion  Note: poetry higher quality than chicklit
  • 41. Metadata  Scientific articles have many types of metadata associated:  Abstract  Author  Booktitle  Description  Journal  Tags  Are all these types of metadata useful for item recommendation?
  • 42. Metadata  According to Toine Bogers’ PhD thesis:  Concatenate all fields associated to a single user’s profile’s items into one huge text field, and use an off-the-shelf IR model to match the profile against metadata of the items. “Profile-centric Matching”  Or, construct item profiles from meta-data of all users for that item, and apply an item- based collaborative filtering approach “Item-based Hybrid Filtering”  Author, description, tags, title, url, journal and booktitle all contribute
  • 43. Finally: a recent case study
  • 44. Artist Popularity?  Let’s ask widely used social media music platforms!  I.e., query their APIs
  • 45. Artist Popularity (1-3)  Top-5 popular artists in dataset  Jan 21 – Mar 21  3 hourly timestamped popularity indices
  • 46. http://bit.ly/ESSIR13IRSocMedia
  • 47. Artist Popularity
  • 48. Artist Popularity (?!)  Top-5 popular artists in dataset  Jan 21 – Mar 21  3 hourly timestamped popularity indices
  • 49. The Black Keys
  • 50. The Black Keys  Three grammy awards received!
  • 51. The Black Keys  Web responds, while service based popularity index is static
  • 52. Implications  An “artist popularity” index depends on the platform and its user population  Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events  Suitable as an academics’ search log replacement?
  • 53. Implications  An “artist popularity” index depends on the platform and its user population  Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events  Suitable as an academics’ search log replacement?  Q: What is the most useful popularity – one that changes dynamically or one that lasts?
  • 54. Many topics I skipped…
  • 55. Tweets about blip.tv  “Twanchor text”  E.g.: http://blip.tv/file/2168377  Amazing  Watching “World’s most realistic 3D city models?”  Google Earth/Maps killer  Ludvig Emgard shows how maps/satellite pics on web is done (learn Google and MS!)  and ~120 more Tweets
  • 56. Wikipedia  Wikipedia contains semantically very rich annotations:  Wikipedia Categories  Wikipedia Lists  Times (1930, 1931, 1932, etc. etc.)  Names  Disambiguation pages Etc.  Note: DBPedia is just Wikipedia 
  • 57. Wikipedia  People have used Wikipedia edit history to look for events
  • 58. Geotags / POIs  Many social media items carry explicit geo information  Geotags are low-level “coordinates”  POIs are high-level “point-of-interest” labels  Applications  Recommend geo-locations to people  Predict POI tags from (tweet) text  Predict where a user will go next
  • 59. Map text to locations  Build a language model from all tags assigned to flickr images that belong to a predefined grid cell  Neighbouring cells used for smoothing (like hierarchic language models used previously for video / scene / shot)  User frequency of a term in a location (instead of term frequency) Neil O’Hare and Vanessa Murdock Modeling Locations with Social Media Information Retrieval, February 2013, Volume 16, Issue 1, pp 30-62
  • 60. Placing Images: Easy http://www.flickr.com/photos/63666148@N00/3615989115/ Athens, Ohio or Athens, Greece?
  • 61. Placing Images: Hard Ballooning company in Ottawa
  • 62. Searching the Social Graph  Search entities, and the relationships between them, in the (facebook) social graph  Clearly IR problems, but who has the data to work with? Micheal Curtiss et al. Unicorn: A System for Searching the Social Graph PVLDB, Vol. 6, No. 11
  • 63. Crawling  How to get “the” data?  Rate limited APIs  ToS HEADACHES!
  • 64. Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose ICWSM 2013
  • 65. Not IR yet, but… Interesting stuff nevertheless! de Volkskrant, March 13, 2013 Michal Kosinski, David Stillwell, and Thore Graepel Private traits and attributes are predictable from digital records of human behavior PNAS 2013 ; published ahead of print March 11, 2013, doi:10.1073/pnas.1218772110
  • 66. Take home message(s)
  • 67. Take home message(s)  Social media give us IR researchers access to a rich resource of context  Including time & location!
  • 68. Take home message(s)  Social media give us IR researchers access to a rich resource of context  Including time & location!  Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly
  • 69. Take home message(s)  Social media give us IR researchers access to a rich resource of context  Including time & location!  Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly  Various recommendation and retrieval tasks exist in social media – can one theory address all of these?
  • 70. C U @ #ECIR2014 ? !