Modelling Users’ Profiles and Interests based on Cross-Folksonomy Analysis @ HT2009


Published on

Invited talk at the ACM Hypertext Conference 2009, Turin, Italy

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This is preliminary research so there are many gaps to be filled and much future work to be done.
  • This is a snapshot of how I participate in the World of Web 2.0 I bookmark pages in delicious, record my listening habits in and publish my photos using flickr
  • One important trend that we can observe is that it’s Increasingly common for users to maintain a profile in multiple social networking sites. Ofcom published April 2, survey carried out September – October 2007 Number of profiles significantly higher for under 21’s If you participate in such sites, your are likely to have multiple profiles. This is intuitive - People realise the benefits quickly and often signup to other sites to meet difference requirements.
  • Overtime, the cumulative frequencies of the tags you use can be represented with a tag-cloud. This gives a visual snapshot of the terms that you use most frequently. When we began this work, the first thing we did was develop a tool For viewing tag clouds from multiple domains. We noticed that many tags represented concepts that could be considered Interests of the users. Hence, the motivation for our work is to exploit this tagging
  • This is a representation of where we’re heading with this work. The activity elicited from each of the individual’s accounts tells us something different about their interests: Technorati and delicious highlight areas of interest on the web flickr and facebook tell us about the events and places the person has been imdb and givess with knowledge of the user’s music listening and movie watching habits
  • We believe that providing such profiles of interest could help with Recommendation: You could image providing your profiles to a Site like Amazon so they can give you better recommendations Such profiles could also be used to personalise searching It’s all about providing users with a better experience without An overhead. Our idea is that this should happen automatically
  • Another motivation for this work is for consolidation and integration. People have information distributed across different sites and it Would be helpful to support them with an integrated view of this information. There are often activities in different sites that could be related via A common event or interest.
  • In the field of folksonomy analysis, it’s also important to consider the syntactic Habits of tagging In previous xfolksonony work, we discovered that tagging habits can be quite erratic people use singular / plural / gerrand form compound nouns may be formed using _ - or nospace synonyms can be used misspellings also common
  • This is more of an observation than an evaluation Wanted to understand what kinds of interests can be extracted from delicious and flickr and how they differ. This is a nice result because it reflects our intuition about what we can learn from Each domain
  • Modelling Users’ Profiles and Interests based on Cross-Folksonomy Analysis @ HT2009

    1. 1. TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721 Modelling Users’ Profiles and Interests based on Cross-Folksonomy Analysis Martin Szomszor University of Southampton
    2. 2. Outline <ul><li>Introduction and Motivation </li></ul><ul><ul><li>Why is your folksonomy interaction useful? </li></ul></ul><ul><ul><li>How could it be exploited? </li></ul></ul><ul><li>Making Sense of Folksonomies </li></ul><ul><ul><li>Distributed Contact Networks </li></ul></ul><ul><ul><li>Tag Filtering / Tag Senses </li></ul></ul><ul><li>Profiles of Interests </li></ul><ul><li>Future Work </li></ul><ul><ul><li>Disambiguation </li></ul></ul><ul><ul><li>Building Better Profiles of Interests </li></ul></ul>
    3. 3. Introduction Dream Theater Metallica Rush
    4. 4. Increasing number of online identities <ul><li>Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2 </li></ul><ul><ul><li>[Ofcom 2008] Social Networking: A quantative and qualitative research report into attitudes, behaviours, and use. </li></ul></ul><ul><li>In the future, people will maintain an increasing number of online identities to meet different information sharing tasks and to connect with different communities </li></ul>
    5. 5. Tag Clouds
    6. 6. The Big Picture Profile of Interests
    7. 7. Personalisation Profiles could be exported to other sites to improve recommendation quality Profile of Interests Profiles could be used to support personalised searching Better user experience
    8. 8. Consolidation and Integration currency travel hotels cuba cuba holiday 2008
    9. 9. Tagging Variation [1] Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies . In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania. Raw Tags Filtered Tags
    10. 10. Disconnected Identities fan of contact friend friend #me
    11. 11. Making Sense of Folksonomies Delicious Flickr Facebook Identity Integration Tag Integration Tagging Semantics … FOAF DBpedia + Wordnet
    12. 12. 1. Contact Integration Delicious Identity Integration Tag Integration Tagging Semantics … FOAF DBpedia + Wordnet Flickr Facebook
    13. 13. SNS Contact Integration
    14. 14. <ul><li>Recommend new connections </li></ul>#me Consolidated Contact View
    15. 15. <owl#sameAs> <> <> <>, <>, … <>; FOAF Representation of SNS Accounts
    16. 16. 2. Tag Integration Delicious Flickr Facebook Identity Integration Tag Integration Tagging Semantics … FOAF DBpedia + Wordnet
    17. 17. Folksonomy Integration Tag Heterogeneity Web2.0 Web_2.0 !=
    18. 18. Folksonomy Integration: Tag Heterogeneity Web2.0 Web_2.0 isFilteredTo
    19. 19. Tag Filtering <ul><li>Find canonical form for each tag: </li></ul><ul><ul><li>Use Dbpedia entry labels as reference </li></ul></ul><ul><ul><ul><li>compound terms separated by _ </li></ul></ul></ul><ul><ul><ul><ul><li>second-life, second+life, -> second_life </li></ul></ul></ul></ul><ul><ul><ul><li>concatenated / camel case terms are expanded </li></ul></ul></ul><ul><ul><ul><ul><li>secondlife, SecondLife -> second_life </li></ul></ul></ul></ul><ul><ul><ul><li>International Characters Normalised: </li></ul></ul></ul><ul><ul><ul><ul><li>Caf%C3%A9 -> Cafe </li></ul></ul></ul></ul><ul><li>Recommend Spelling Corrections </li></ul><ul><ul><li>resaerch -> didYouMean research </li></ul></ul><ul><li>Follow unambiguous redirections: </li></ul><ul><ul><li>Humor, Funny -> Humour </li></ul></ul>
    20. 20. schemas /tagging# (f) = functional property property subclass hasUserFrequency hasGlobalFrequency hasDomainFrequency rdfs:label hasCooccurrenceInfo hasCooccurrenceFrequency cooccurringTag hasPost taggedResource isFilteredTo hasNextSegment (f) hasTagSequence (f) tagUsed (f) taggedOn xsd:integer xsd:integer xsd:integer xsd:string xsd:integer xsd:datetime hasGlobalTag hasDomainTag UserTag DomainTag GlobalTag usesTag Tag Tagger Post Resource TagSegment FinalTagSegment CooccurrencInfo
    21. 21. Linked Data View
    22. 22. Linked Data View
    23. 23. Linked Data View
    24. 24. Linked Data View
    25. 25. Finding Syntactic Variations sparql$ select ?x where { ?x <> <>} ┌─────────────────────────────────────────────┐ │ ?x │ ├─────────────────────────────────────────────┤ │ <> │ │ <> │ │ <> │ │ <> │ │ <> │ └─────────────────────────────────────────────┘ sparql$ select * where { ?x <> <>} ┌───────────────────────────────────────────────────┐ │ ?x │ ├───────────────────────────────────────────────────┤ │ <> │ │ <> │ │ <> │ │ <> │ │ <> │ │ <> │ │ <> │ │ <> │ └───────────────────────────────────────────────────┘
    26. 26. Tag Senses <ul><li>What are the possible meanings for a tag? </li></ul><ul><li>We use two reference sets: </li></ul><ul><ul><li>DBPedia </li></ul></ul><ul><ul><ul><li>Concepts </li></ul></ul></ul><ul><ul><li>Wordnet </li></ul></ul><ul><ul><ul><li>Synsets </li></ul></ul></ul>
    27. 27. Disambiguation Ontology schemas /tagging# (f) = functional property property subclass schemas /dbpedia# schemas /disambiguation# senseWeight dbpediaSense hasDbpediaSenseInfo didYouMean Resource DbpediaSenseInfo xsd:float hasWordnetSense WordSense Tag
    28. 29. DBpedia Extraction <ul><li>Extract triples from XML dump </li></ul><ul><ul><li>Calculate normalised title string </li></ul></ul><ul><ul><ul><li>Caf%C3%A9 -> cafe </li></ul></ul></ul><ul><ul><li>Calculate concatenated title string </li></ul></ul><ul><ul><ul><li>Second_life -> secondlife </li></ul></ul></ul><ul><ul><li>Extract disambiguation term from title </li></ul></ul><ul><ul><ul><li>Orange_(fruit) </li></ul></ul></ul><ul><ul><li>Identify compound labels </li></ul></ul><ul><ul><ul><li>Second_Life -> Second, Life </li></ul></ul></ul>
    29. 30. DBpedia Extraction <ul><li>Number of incoming links </li></ul><ul><li>Extract page redirects </li></ul><ul><li>Extract Disambiguation Links </li></ul><ul><ul><li>Find Primary disambiguation (e.g. Apple) </li></ul></ul>
    30. 31. DBpedia Extraction <ul><li>Parse wiki text and extract terms: </li></ul><ul><ul><li>Terms filtered using stop words (with some wiki specific additions) </li></ul></ul><ul><ul><li>Store term frequencies </li></ul></ul><ul><ul><li>Store number of distinct terms in page </li></ul></ul><ul><ul><li>Store total term frequency </li></ul></ul><ul><li>Can associate a vector of terms and weights to each possible sense </li></ul>
    31. 32. FinalCompoundLabelSequence hasCompoundLabelSequence (f) hasNextLabelSequence (f) hasCompoundLabel (f) isa hasLabel hasNormalisedLabel hasConcatenatedLabel hasDisambiguationTerm hasTermFrequencyPair hasTerm hasTermFrequency hasDisambiguation hasPrimaryDisambiguation hasTotalTermFrequency hasTotalTerms CompoundLabelSequence Resource xsd:integer xsd:integer xsd:integer xsd:string xsd:string xsd:string xsd:string xsd:string xsd:string TermFrequencyPair
    32. 33. Profiles of Interests [2] Szomszor, M., Alani, H., Cantador, I., O'Hara, K. and Shadbolt, N. (2008) Semantic Modelling of User Interests based on Cross-Folksonomy Analysis. In: 7th International Semantic Web Conference (ISWC), October 26th - 30th, Karlsruhe, Germany.
    33. 34. Global Category View <ul><li>What are the differences in the interests that are learnt from each domain? </li></ul>Delicious Flickr Wikipedia Category Total Freq Wikipedia Category Total Freq Design 69,215 Travel 51,674 Blogs 68,319 Australia 51,617 Music 45,063 London 46,623 Photography 41,356 Festivals 42,504 Tools 35,795 Music 40,943 Video 34,318 Cats 38,230 Arts 29,966 Holidays 37,610 Software 28,746 Family 37,100 Maps 26,912 Japan 36,513 Teaching 22,120 Concerts 35,374 Games 21,549 Surnames 34,947 How-to 19,533 Washington 33,924 Technology 18,032 Given Names 32,843 News 17,737 Dogs 32,206 Humor 15,816 Birthdays 22,290
    34. 35. Future Work <ul><li>Given a set of possible senses, how can we choose the best match? </li></ul><ul><li>Folksonomy data can provide contextual information: </li></ul><ul><ul><li>User tag-cloud </li></ul></ul><ul><ul><li>Cooccurrence Network </li></ul></ul><ul><ul><li>User Cooccurrence Network </li></ul></ul><ul><li>Can abstract this information as a vector of terms and weights (context) </li></ul>
    35. 36. Disambiguating Flickr Images
    36. 37. Building Better Profiles <ul><li>What tags correspond to interests? </li></ul><ul><ul><li>Locations and topics are useful, but other terms are not </li></ul></ul><ul><li>TF / IDF Approach </li></ul><ul><ul><li>It’s not that useful to find out we are all interested in HTML </li></ul></ul><ul><li>Making use of the Category hierarchy </li></ul><ul><ul><li>If I’m interested in Facebook, Flickr,, Delicious, etc, I can extrapolate the interest Online_Social_Networks </li></ul></ul>
    37. 38. 0.30628910807 _:b9510f00000000a5 “ mac” 35 dbpedia:hasTermFrequency dbpedia:hasTerm dbpedia:hasTermFrequencyPair dbpedia:hasDbpediaSenseInfo dbpedia:sense dbpedia:senseWeight 0.248912928 _:b9510f00000000a5 “ fruit” 41 dbpedia:hasTermFrequency dbpedia:hasTerm dbpedia:hasTermFrequencyPair dbpedia:sense dbpedia:senseWeight owl:sameAs owl:sameAs