Semantic Modelling of User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

594 views

Published on

Paper presented at the International Semantic Web Conference (ISWC) 2008.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
594
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This is preliminary research so there are many gaps to be filled and much future work to be done.
  • This is a snapshot of how I participate in the World of Web 2.0 I bookmark pages in delicious, record my listening habits in last.fm and publish my photos using flickr
  • One important trend that we can observe is that it’s Increasingly common for users to maintain a profile in multiple social networking sites. Ofcom published April 2, survey carried out September – October 2007 Number of profiles significantly higher for under 21’s If you participate in such sites, your are likely to have multiple profiles. This is intuitive - People realise the benefits quickly and often signup to other sites to meet difference requirements.
  • This is a representation of where we’re heading with this work. The activity elicited from each of the individual’s accounts tells us something different about their interests: Technorati and delicious highlight areas of interest on the web flickr and facebook tell us about the events and places the person has been imdb and last.fm givess with knowledge of the user’s music listening and movie watching habits
  • We believe that providing such profiles of interest could help with Recommendation: You could image providing your profiles to a Site like Amazon so they can give you better recommendations Such profiles could also be used to personalise searching It’s all about providing users with a better experience without An overhead. Our idea is that this should happen automatically
  • Another motivation for this work is for consolidation and integration. People have information distributed across different sites and it Would be helpful to support them with an integrated view of this information. There are often activities in different sites that could be related via A common event or interest.
  • In case you’re not already familiar with tagging and folksonomies, here is an example from my delicious and flickr account What’s great about tagging is that it copes well with multimedia content. Web Pages, videos, photos, blogs, music, etc…
  • Overtime, the cumulative frequencies of the tags you use can be represented with a tag-cloud. This gives a visual snapshot of the terms that you use most frequently. When we began this work, the first thing we did was develop a tool For viewing tag clouds from multiple domains. We noticed that many tags represented concepts that could be considered Interests of the users. Hence, the motivation for our work is to exploit this tagging
  • In the field of folksonomy analysis, it’s also important to consider the syntactic Habits of tagging In previous xfolksonony work, we discovered that tagging habits can be quite erratic people use singular / plural / gerrand form compound nouns may be formed using _ - or nospace synonyms can be used misspellings also common
  • Now I will move on to describe the architecture for our system. The aim is to start from a user uri (such as a homepage or blog) And create a foaf file depicting the interests of the user via References to wikipedia category uri’s.
  • If you’re not familiar with Google’s social graph API, it’s pretty simple. In most web2.0 sites, you’re offered the opportunity to reference A uri that describes you.
  • The Tag Filtering engine converts a list of raw tags to a set of filtered tags. This is explained in more detail in the hypertext paper so I won’t go Through it in great detail.
  • The results of the previous tag filtering is a set of tags and their associated frequencies. Weighting of the category is the sum of all the frequencies of the tags Matching of categories includes some simple stemming and pluralisation
  • Explain anomalous spike: Due to unconventional tagging practice
  • This is more of an observation than an evaluation Wanted to understand what kinds of interests can be extracted from delicious and flickr and how they differ. This is a nice result because it reflects our intuition about what we can learn from Each domain
  • One central argument behind this work is that more can be learnt about individuals by Examining multiple profile activites. To evaluate this at a user level, we decided To measure the increase in categories generated in their user profile by adding Flickr information to delicious.
  • To evaluate the approach in terms of how well it identifies the relevant Wikipedia categories, we generated a Random sample of 100 users and then randomly select 1 tag from each of their delicious and Flickr profiles. We looked at their posts for the tag to establish accuracy Couple of false positives: oracle (divinity or database) labrador (dog and city in canada)
  • Creating such a linked model of tags with references to dbpedia uris
  • Semantic Modelling of User Interests Based on Cross-Folksonomy Analysis @ ISWC2008

    1. 1. TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721 Semantic Modelling of User Interests Based on Cross-Folksonomy Analysis Martin Szomszor , Harith Alani, Kieron O’Hara, Nigel Shadbolt University of Southampton Iván Cantador Universidad Autonoma de Madrid
    2. 2. Outline <ul><li>Introduction and Motivation </li></ul><ul><ul><li>Why is your folksonomy interaction useful? </li></ul></ul><ul><ul><li>How could it be exploited? </li></ul></ul><ul><li>Architecture </li></ul><ul><ul><li>Matching user accounts </li></ul></ul><ul><ul><li>Collecting Data </li></ul></ul><ul><ul><li>Tag Filtering </li></ul></ul><ul><ul><li>Profile Building </li></ul></ul><ul><li>Experiment and Evaluation </li></ul><ul><li>Conclusions and Future Work </li></ul>
    3. 3. Introduction Dream Theater Metallica Rush delicious.com http://slashdot.org/ http://news.bbc.co.uk/
    4. 4. Increasing number of online identities <ul><li>Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2 </li></ul><ul><li>Many predict that in the near future, individuals will have in excess of 10 profiles </li></ul><ul><ul><li>[Ofcom 2008] Social Networking: A quantative and qualitative research report into attitudes, behaviours, and use. </li></ul></ul>
    5. 5. The Big Picture Profile of Interests delicious.com
    6. 6. Personalisation Profiles could be exported to other sites to improve recommendation quality Profile of Interests Profiles could be used to support personalised searching Better user experience delicious.com
    7. 7. Consolidation and Integration currency travel hotels cuba http://dbpedia.org/resource/Cuba cuba holiday 2008 http://dbpedia.org/resource/Travel http://dbpedia.org/resource/Holiday http://dbpedia.org/resource/Category:Tourism
    8. 8. User Tagging delicious.com
    9. 9. Tag Clouds delicious.com
    10. 10. Tagging Variation [1] Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies . In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania. Raw Tags Filtered Tags
    11. 11. Architecture for Building Profiles of Interests
    12. 12. Account Correlation <ul><li>Using Google’s Social Graph API </li></ul>account homepage http://users.ecs.soton.ac.uk/mns2 delicious.com
    13. 13. <ul><li>Delicious </li></ul><ul><ul><li>Custom python scripts </li></ul></ul><ul><li>Flickr </li></ul><ul><ul><li>Using public API </li></ul></ul><ul><li>Only public information is harvested </li></ul>Data Collection
    14. 14. Tag Filtering Process
    15. 15. <ul><li>Three stage process: </li></ul><ul><ul><li>Identify Wikipedia page </li></ul></ul><ul><ul><ul><li>London is matched with </li></ul></ul></ul><ul><ul><ul><li>http://en.wikipedia.org/wiki/London </li></ul></ul></ul><ul><ul><li>Extract Category list </li></ul></ul><ul><ul><ul><li>Host cities of the Summer Olympic Games | Host cities of the Commonwealth Games | London | 1st century establishments | British capitals | Capitals in Europe | Port cities and towns in the United Kingdom </li></ul></ul></ul><ul><ul><li>Select representative Categories </li></ul></ul><ul><ul><ul><li>Only choose categories that match the tag string </li></ul></ul></ul><ul><ul><ul><li>Excludes spurious categories such as: </li></ul></ul></ul><ul><ul><ul><ul><li>Host cities of the Summer Olympic Games </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Needs more sources </li></ul></ul></ul></ul>Creating User Profiles
    16. 16. Profile of Interest
    17. 17. Experiment Setup <ul><li>Bootstrapped using 667,141 delicious profiles obtained in previous work </li></ul><ul><li>Only accounts with a matching Flickr profile and > 50 distinct tags were added </li></ul><ul><li>Final list contains 1,392 users </li></ul>Delicious Flickr Total Posts 1,134,527 Total Posts 2,215,913 Distinct Tags 138,028 Distinct Tags 307,182
    18. 18. Evaluation <ul><li>Four evaluation procedures: </li></ul><ul><ul><li>The performance of the tag filtering and matching to Wikipedia Entries </li></ul></ul><ul><ul><li>The difference between the most common categories found in delicious and Flickr </li></ul></ul><ul><ul><li>The amount learnt from merging profiles from the two folksonomies </li></ul></ul><ul><ul><li>The accuracy of matching tags to Wikipedia categories </li></ul></ul>
    19. 19. Tag Filtering and Matching
    20. 20. Global Category View <ul><li>What are the differences in the interests that are learnt from each domain? </li></ul>Delicious Flickr Wikipedia Category Total Freq Wikipedia Category Total Freq Design 69,215 Travel 51,674 Blogs 68,319 Australia 51,617 Music 45,063 London 46,623 Photography 41,356 Festivals 42,504 Tools 35,795 Music 40,943 Video 34,318 Cats 38,230 Arts 29,966 Holidays 37,610 Software 28,746 Family 37,100 Maps 26,912 Japan 36,513 Teaching 22,120 Concerts 35,374 Games 21,549 Surnames 34,947 How-to 19,533 Washington 33,924 Technology 18,032 Given Names 32,843 News 17,737 Dogs 32,206 Humor 15,816 Birthdays 22,290
    21. 21. Learning More About Users <ul><li>How much more can we learn by using multiple profiles? </li></ul>
    22. 22. Category Matching <ul><li>How good is the category matching? </li></ul><ul><li>Take 100 random users and choose 1 Delicious tag and 1 Flickr tag </li></ul><ul><li>Classify tag into one of 3 classes: </li></ul><ul><ul><li>Correct </li></ul></ul><ul><ul><li>Unresolved (not matched to any category) </li></ul></ul><ul><ul><li>Ambiguous (Disambiguation required) </li></ul></ul>Correct Unresolved Ambiguous Delicious 66% 20% 14% Flickr 63% 25% 12%
    23. 23. Conclusions <ul><li>We have proposed a novel method for the creation of Profiles of Interest by exploiting an individual’s tagging activities across two popular folksonomy sites </li></ul><ul><li>Frequently used tags often specify areas of interest but not always! </li></ul><ul><ul><li>Common delicious tags are daily, toread, howto </li></ul></ul><ul><ul><li>Flickr tags often include names of people </li></ul></ul><ul><li>Expanding the analysis across folksonomies increases the amount learnt </li></ul><ul><ul><li>On Average 15 new concepts per user </li></ul></ul>
    24. 24. Future Work <ul><li>Improve page matching </li></ul><ul><ul><li>22.5% of sample tags unresolved </li></ul></ul><ul><li>Handle disambiguation </li></ul><ul><ul><li>13% of sample tags refer to ambiguous terms </li></ul></ul><ul><ul><ul><li>Cooccurrence networks </li></ul></ul></ul><ul><ul><ul><li>Category hierarchy </li></ul></ul></ul><ul><li>Increase network coverage </li></ul><ul><ul><li>Already have the data to include Last.fm </li></ul></ul><ul><li>Understand which tags actually specify an interest of the individual </li></ul><ul><ul><li>Filter out categories such as ‘Surname’ </li></ul></ul>

    ×