Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ieml social recommendersystems


Published on

  • Be the first to comment

  • Be the first to like this

Ieml social recommendersystems

  1. 1. Technical Report, March 2010 Social Recommender Systems on IEML Semantic Space Heung-Nam Kim, Andrew Roczniak, Pierre Lévy, Abdulmotaleb El-Saddik Collective Intelligence Lab, University of Ottawa 4 March 2010H. N. Kim, A. Roczniak, P. Lévy, A. EI-Saddik, “Social recommender systems on IEML semantic space,”Collective Intelligence Lab, University of Ottawa, Technical Report, March 2010 1
  2. 2. Technical Report, March 2010 Social Recommender Systems on IEML Semantic Space Heung-Nam Kim1,2, Andrew Roczniak1, Pierre Lévy1, Abdulmotaleb El Saddik2 1 Collective Intelligence Lab, University of Ottawa 2 Multimedia Communication Research Lab, University of OttawaAbstractIn this report, we present two social recommendation methods that are incorporatedwith the semantics of the tags: a user-based semantic collaborative filtering and an item-based semantic collaborative filtering. Social tagging is employed as an approach inorder to grasp and filter users’ preferences for items. In addition, we analyze potentialbenefits of IEML models for social recommender systems in order to solve polysemy,synonymy, and semantic interoperability problems, which are notable challenges ininformation filtering. Experimental results show that our methods offer significantadvantages both in terms of improving the recommendation quality and in dealing withpolysemy, synonymy, and interoperability issues.1 IntroductionThe prevalence of social media sites bring considerable changes not only on people’slife patterns but also on the generation and distribution of information. This socialphenomenon has transformed the masses, who were only information consumers viamass media, to be producers of information. However, as rich information is sharedthrough social media sites, the sheer amount of information that has not been availablebefore is increasing exponentially with daily additions. As well as finding the mostattractive and relevant content, users struggle with a great challenge in informationoverload. Recommender systems that have emerged in response to the above challengesprovide users with recommendations of items that are more likely to fit their needs [17]. 2
  3. 3. Technical Report, March 2010 With the popularity of social tagging (also known as collaborative tagging orfolksonomies) a number of researchers have recently concentrated on recommendersystems with social tagging [10, 13, 20, 22, 25]. Because modern social media sites,such as Flickr1, YouTube2, Twitter3, and Delicious4 allow users to freely annotate theircontents with any kind of descriptive words, also known as tags [6], the users tend touse the descriptive tags to annotate the contents that they are interested in [13].Recommender systems incorporated with tags can alleviate limitations of traditionalrecommender systems, such as the sparsity problem and the cold start problem [1], andthus the systems eventually provide promising possibilities to better generatepersonalized recommendations. Although these studies obtain reasonable promise ofimproving the performance, they do not take into consideration the semantics of tagsthemselves. Consequently, the lack of semantic information suffers from fundamentalproblems: polysemy and synonymy of the tags, as clearly discussed in [6]. Without thesemantics of the tags used by users, the systems cannot differentiate the various socialinterests of the users from the same tags. Furthermore, they cannot provide semanticinteroperability that is a notable challenge in the cyberspace [11]. To address the discussed issues, we introduce a new concept to capture semantics ofuser-generated tags. We then propose two social recommendation methods that areincorporated with the semantics of the tags: a user-based semantic collaborative filteringand an item-based semantic collaborative filtering. First, in the user-based method, wedetermine similarities between users by utilizing users’ semantic-oriented tags,collectively called Uniform Semantic Locator (USL) and subsequently identifysemantically like-minded users for each user. Finally, we recommend social items (e.g.,text, picture, video) based on the social ranking of the items that are semanticallyassociated to tags that like-minded users annotate. Second, in the item-based method,we determine similarities between items by utilizing USLs and identify semanticallysimilar items for each item. Finally, we recommend social items based on thesemantically similar items.1 http://twitter.com4 3
  4. 4. Technical Report, March 2010 The main contributions of this study toward social recommender systems can besummarized as follows: 1) We present and formalize models for semantic-orientedsocial tagging in dealing with the issues of polysemy, synonymy, and semanticinteroperability. We also illustrate how the models can be adapted and applied toexisting social tagging systems. 2) We propose the methods of social recommendationsin semantic space that aim to find semantically similar users/items and discover socialitems semantically relevant to users’ needs. The rest of this report is organized as follows: in next section we review conceptsrelated to collaborative filtering and provide recent studies applying social tagging torecommender systems. In Section 3, we provide some models used in our study. Wethen describe our semantic models for social recommender systems and provide adetailed description of how the models are applied to item recommendations in Section4. In Section 5, we present the effectiveness of our methods through experimentalevaluations. Finally, we summarize our work.2 Related WorkIn this section, we summarize previous studies and position our study with respect toother related works in the area.2.1 Collaborative FilteringFollowing the proposal of GroupLens [16], automated recommendations based onCollaborative Filtering (CF) have seen the widest use. CF is based on the fact that“word of mouth” opinions of other people have considerable influence on the buyers’decision making [10]. If advisors have similar preferences to the buyer, he/she is muchmore likely to be affected by their opinions. In CF-based recommendation schemes,two approaches have mainly been developed: user-based approaches [4, 16, 18] anditem-based approaches [5, 17]. Usually, user-based and item-based CF systems involvetwo steps. First, the neighbor group, which are users who have a similar preference to atarget user, called k nearest neighbors (for user-based CF) or the set of items that issimilar to a target item, called k most similar items (for an item-based CF), is 4
  5. 5. Technical Report, March 2010determined by using a variety of similarity computing methods, such as Pearsoncorrelation-based similarity, cosine-based similarity, and so on. This step is animportant task in CF-based recommendations because different neighbor users or itemslead to different recommendations [18]. Once the neighborhood is generated, in secondstep, the prediction values of particular items, estimating how much the target user islikely to prefer the items, are computed based on the group of neighbors. The more aneighbor is similar to the target user or the target item, the more influence he/she or ithas for calculating a prediction value. After predicting how much the target user willlike particular items not previously rated by him/her, the top N item set, the set ofordered items with the highest predicted values, is identified and recommended. Thetarget user can present feedback on whether he/she actually likes the recommend top Nitems or how much he/she prefers those items as scaled ratings.2.2 Social Tagging in Recommender SystemsSocial tagging is the practice of allowing any user to freely annotate the content withany kind of arbitrary keywords (i.e., tags) [6]. Social media sites with social tagginghave become tremendously popular in recent years. Therefore, the area of recommendersystems with social tagging (folksonomy) has become active and growing topic ofstudies. These studies can be broadly divided into three topics: tag suggestions, socialsearches, and social recommendations. With the popularity of the usage of tags, many researchers have proposed newapplications for recommender systems supporting the suggestion of suitable tags duringfolksonomy development. In [21], a tag recommender system with Flickr’s dataset ispresented based on an analysis of how users annotate photos and what information iscontained in the tagging. In [9], three classes of algorithms for tag recommendations,such as an adaptation of user-based CF, a graph-based FolkRank [8] algorithm, andsimple methods based on tag counts, are presented and evaluated. Xu et al. [23] proposean algorithm for collaborative tag suggestions that employs a reputation score for eachuser based on the quality of the tags contributed by the user. In [14], a new semantictagging system, SemKey, is proposed in order to combine semantic technologies withthe collaborative tagging paradigm in a way that can be highly beneficial to both areas. 5
  6. 6. Technical Report, March 2010Differently from our aims, the purpose of these studies using social tagging is basicallyto recommend appropriate tags for assisting the user in annotation related tasks. Ourapproach takes a different stance. Rather than offering the tag recommendations, ouraim is to find like-minded users with tags’ semantics and identify personal resourcessemantically relevant to user needs. Research has also been very active in relating information retrieval using socialtagging. In [8], the authors presented a formal model and a new search algorithm forfolksonomies called FolkRank. The FolkRank is applied not only to find communitieswithin the folksonomy but also to recommend tags and resources. In [24], a socialranking mechanism is proposed to answer a user’s query that aims to transparentlyimprove content searches based on emergent tags semantics. It exploits users’ similarityand tags’ similarity based on their past tagging activity. In [19], the ContextMergealgorithm is introduced to support efficiently user-centric searches in social networks,dynamically including related users in the execution. The algorithm adopts two-dimensional expansions: social expansion considers the strength of relations amongusers and semantic expansion considers the relatedness of different tags. In [2], twoalgorithms are proposed, SocialSimRank and SocialPageRank. The former algorithmcalculates the similarity between tags and user queries whereas the latter one capturespage popularity based on its annotations. All these works attempt to improve users’searches by incorporating social annotations into query expansion. Differing from theseworks, our goal is to automatically identify resources without users’ queries, which arelikely to fit their needs. Other researchers have studied the same area as our study. In [10], authors proposedcollaborative filtering method via collaborative tagging. First they determine similaritiesbetween users with social tags and subsequently identify the latent tags for each user torecommend items via a naïve Bayes approach. Tso-Sutter et al. [22] proposed a genericmethod that allows tags to be incorporated into CF algorithms by reducing the three-dimensional correlations to three two-dimensional correlations and then applying afusion method to re-associate these correlations. Similar approach is presented by [25]as well in order to provide improved recommendations to users. Although these studiesgive reasonable promise of improving the performance, they do not take the semanticsof tags into consideration. Consequently, the lack of semantic information has 6
  7. 7. Technical Report, March 2010limitations, such as polysemy and synonymy of tags, for identifying similar users withuser-generated tags. We believe the semantic information of tags can be more helpfulnot only to grasp better users’ interests but also to enhance the quality ofrecommendations. The current literature recently focuses on semantic recommendersystems that are similar to our goal. Unlike our approach, however, the existing work onthe semantic recommender systems relies on a prefixed ontology and uses technologiesfrom Semantic Web. The state of the topic for the semantic recommender systems hasbeen well analyzed in [15].3 IEML ModelsFor understanding our semantic approach, this section briefly explains preliminaryconcepts of Information Economy MetaLanguage (IEML5) that will be exploited in thenext sections of this report. The IEML research program promotes a radical innovationin the notation and processing of semantics. IEML is a regular language and a symbolicsystem for the notation of meaning. It is “semantic content oriented” rather than“instruction oriented” like programming languages or “format oriented” like datastandards. IEML provides new methods for semantic interoperability, semanticnavigation, collective categorization and self-referential collective intelligence [11].IEML research program is compatible with the major standards of the Web of data andis in tune with the current trends in social computing.3.1 IEML OverviewIEML expressions are built from a syntactically regular combination of six symbols,called primitives. In IEML a sequence is a succession of 3l single primitives, where l =(0, 1, 2, 3, 4, 5, 6). l is called the layer of a sequence. For each layer, the sequences haverespectively a length of 1, 3, 9, 27, 81, 243, and 729 primitives [12]. From a syntacticpoint of view, any IEML expression is nothing else than a set of sequences. As there is adistinct semantic for each distinct sequence, there is also a distinct semantic for eachdistinct set of sequences. In general, the meaning of a set of sequences corresponds to5 7
  8. 8. Technical Report, March 2010the union of the meaning of the sequences of this set. The main result is that anyalgebraic operation that can be made on sets in general can also be made on semantics(significations) once they are expressed in IEML. An IEML dictionary provides thecorrespondence between IEML sequences and a natural language descriptor of an IEMLexpression. The terms of the dictionary belong to layers 0-3. There are rules to createinflected words from these terms, to create sentences from inflected words and to createrelations between sentences by using some terms as conjunctions. Given these rules, itis possible to express any network of relations between sentences by using sequences upto layer 6. Various notation, syntax, semantics, and examples of IEML have beenpresented in [12]. Due to a lack of space, we refer the reader to [12] for more details.3.2 IEML Language ModelWe present the model of the IEML language, along with the model of semanticvariables. Let  be a nonempty and finite set of symbols,  = {S, B, T, U, A, E}. Let string s bea finite sequence of symbols chosen from . The length of this string is denoted by |s|.An empty string  is a string with zero occurrence of symbols and its length is | |= 0.The set of all strings of length k composed with symbols from  is defined as k = {swhere |s| = k}. Note that 0 = {} and 1 = {S, B, T, U, A, E}. Although  and 1 aresets containing exactly the same members, the former contains symbols, and the latterstrings. The set of all strings over  is defined as * = 0123 … A useful operation on strings is concatenation, defined as follows. For all si =a1a2a3a4…ai * and sj = b1b2b3b4…bj*, then sisj denotes string concatenation suchthat sisj = a1a2a3a4…ai b1b2b3b4…bj and |sisj| = i + j. The IEML language over  is asubset of *, LIEML  *: LIEML  {s   * | | s | 3l , 0  l  6} (1) 8
  9. 9. Technical Report, March 20103.3 Model of Semantic SequencesDefinition 1 (Semantic sequence) a string s is called a semantic sequence if and only ifsLIEML.To denote the pnth primitive of a sequence s, we use a superscript n where 1  n  3l andwrite sn. Note that for any sequence s of layer l, sn is undefined for any n > 3l. Twosemantic sequences are distinct if and only if either of the following holds : i) theirlayers are different, ii) they are composed from different primitives, iii) their primitivesdo not follow the same order: for any sa and sb, s a  sb  n, s a  sb  | s a || sb | n n (2) Let’s now consider binary relations between semantic sequences in general. Theseare obtained by performing a Cartesian product of two sets6. For any set of semanticsequences X, Y where saX, sbY and using Equation 2, we define four binary relationswhole  X × Y, substance  X × Y , attribute  X × Y, and mode  X × Y as follows: whole  {( sa , sb ) | sa  sb } substance  {( s a , sb ) | s a  sb  | s a | 3 | sb |, 1  n  | sb |} n n (3) attribute  {( s a , sb ) | s a |sb |  sb  | s a | 3 | sb |, 1  n  | sb |} n n mode  {( s a , sb ) | s a  sb  2|sb |  | s a | 3 | sb |, 1  n  | sb |} n n Any two semantic sequences that are equal are in a whole relationship. In addition,any two semantic sequences that share specific subsequences may be in substance,attribute or mode relationship. For any two semantic sequences sa and sb, if they are inone of the above relations, then we say that sb plays a role w.r.t sa and we call sb a semeof sequence.Definition 2 (seme of a sequence) For any semantic sequence sa and sb, if (sa, sb) whole substanceattributemode, then sb plays a role w.r.t. sa and sb is called a seme. We can now group distinct semantic sequences together into sets. A useful groupingis based on the layer of those semantic sequences.6 A Cartesian product of two sets X and Y is written as follows: X × Y = {(x, y) | xX, yY}. 9
  10. 10. Technical Report, March 20103.4 Model of Semantic Categories and CatsetsA category of LIEML is a subset such that all strings of that subset have the same length: c  {si , s j  LIEML where | si || s j |} (4)Definition 3 (semantic category) A semantic category c is a set containing semanticsequences at the same layer. The layer of any category c is exactly the same as the layer of the semanticsequences included in that category. The set of all categories of layer l is given as thepowerset7 of the set of all strings of layer l of LIEML: Cl = Powerset({s  LIEML where |s|=3l}) (5) Two categories are distinct if and only if they differ by at least one element. For anyca and cb: c a  cb  c a  c b  cb  c a (6) A weaker condition can be applied to categories of distinct layers (since twocategories are different if their layers are different) and is written as: l ( c a )  l ( cb )  c a  cb (7)where l(ca) and l(cb) denotes the layer of category ca and cb, respectively. Analogouslyto sequences, we consider binary relations between any categories ci and cj where l(ci),l(cj)  1. For any set of categories X, Y where ca  X, cb  Y, we define four binaryrelations wholeC  X  Y, substanceC  X  Y, attributeC  X  Y, and modeC  X  Y asfollows: whole C  {(ca , cb ) | ca  cb } substance C  {(ca , cb ) | s a  ca , sb  cb , ( s a , sb )  substance} (8) attribute C  {(ca , cb ) | s a  ca , sb  cb , ( s a , sb )  attribute} mode C  {(ca , cb ) | s a  c a , sb  cb , ( s a , sb )  mode}7 A powerset of S is the set of all subsets of S, including the empty set . 10
  11. 11. Technical Report, March 2010 For any two categories ca and cb, if they are in one of the above relations (ca, cb) wholeC  substanceC  attributeC  modeC, then we say that cb plays a role withrespect to ca and cb is called a seme of category. A catset is a set of distinct categories of the same layer as defined Definition 4.Definition 4 (Catset) A catset  is a set containing categories such that  ={cn |i, j: ci cj, l(ci)=l(cj)}. The layer of a catset is given by the layer of any of its members: if some c  , thenl() = l(c). Note that a category c can be written as c  Cl, while a catset  can bewritten as   Cl. All standard set operations, such as union and intersection, (e.g., and ), can be performed on catsets of the same layer.3.5 Model of Uniform Semantic LocatorA USL is composed of up to seven catsets of different layers as follows:Definition 5 (Uniform Semantic Locator, USL) A USL  is a set containing catsets ofdifferent layers such that  = {n | i, j: l(ci)  l(cj)}. Note that since there are seven distinct layers, a USL can have at most sevenmembers. All standard set operations, such as union and intersection (e.g.,  and ) onUSLs are always performed on sets of categories (and therefore on sets of sequences),layer by layer. Since at each layer l there is |Cl| distinct catsets, the whole semanticspace is defined by the tuple: Ł =C0  C1  C2  C3  C4  C5  C6. In the IEML notation of USLs, the categories are separated by ‘‘/ ”. Table 1 shows anexample of USLs used as a *tag for “Wikipedia” and “XML” and a layer by layerEnglish translation of those USLs. A *tag holds the place of an IEML expression bysuggesting its meaning rather than uttering the IEML expression. The meaning of a *taghas to be understood from the singular place that its corresponding IEML expressionoccupies into the network of IEML semantic relations [12]. 11
  12. 12. Technical Report, March 2010Table 1 An example of USLs in semantic space [12].Tag: *WikipediaUSL Semantics in EnglishL0: (U: + S:) L0: knowledge networksL1: (d.) / (t.) L1: truth/ memoryL2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) L2: get one’s bearings in knowledge / act for theL3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) sake of society / synthesize / organizedL4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) knowledge/ collective creation / (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,) L3: opening public space / encyclopedia L4: collective intelligence encyclopedia in cyberspace / volunteers producing didactic material on any subjectTag: *XMLUSL Semantics in EnglishL0: (A: + S:) L0: document networksL1: (b.) L1: languageL2: (we.g.-) / (we.b.-) L2: unify the documentation / cultivate informationL3: (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.-d.u.-’) systemL5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.- L3: establishing norms and standards/ cyberspace / A:.g.-’,_) meeting information needs L5: guarantee compatibility of data through same formal structure4 Social Recommender Systems on IEML Semantic Space Fig. 1 System overview of social recommender systems based on IEML semantic tagging: a user-based semantic collaborative filtering and an item-based semantic collaborative filteringIn this section, we present two social recommendation methods that are incorporatedwith USLs. We explore two well-known approaches of collaborative filtering to ourstudy: a user-based approach and an item-based approach. Fig. 1 illustrates the overallprocess of each method with two phases: a neighborhood formation phase and an item 12
  13. 13. Technical Report, March 2010recommendation phase. In the neighborhood formation phase, we first computesimilarities between users (for a user-based method) and between items (for an item-based method) by utilizing USLs. Thereafter, we generate a set of semantically like-minded users for each user and a set of semantically similar items for each item. Basedon the neighborhood, in the recommendation phase, we predict a social ranking of itemsto decide which items to recommend.4.1 Semantic Models in Bipartite Space Fig. 2 Bipartite space of social tagging space and USL semantic spaceThe social tagging in our study is free-to-all allowing any user to annotate any item withany tag [6]. Therefore, if there is a list of r users Ũ = {u1, u2, …, ur}, a list of m tags Ť ={t1, t2, … , tm}, and a list of n items Ĩ = {i1, i2, …, in}, the social tagging, folksonomy Fis a tuple F = Ũ, Ť, Ĩ, Y where Y  Ũ × Ť × Ĩ is a ternary relationship called tagassignments [9]. More conceptually, the triplet of the tagging space can be representedas three-dimensional data cube. Beyond the tagging space, in our study, there is anotherspace where the tags are connected to USLs according to their semantics. We call thisspace the IEML semantic space as illustrated in Fig. 2. Note that a USL  is composedof catsets (Definition 5) where each catset  consists of semantic categories c(Definition 4). Therefore an extended formal definition of the folksonomy, calledsemantic folksonomy, is defined as follows: 13
  14. 14. Technical Report, March 2010Definition 6 (Semantic Folksonomy) Let Ł = C0  C1  C2  C3  C4  C5  C6 be thewhole semantic space. A semantic folksonomy is a tuple SF = Ũ, Ť, Ĩ, Y, Ň where Ň isa ternary relationship such as Ň  Ũ × Ť × Ł. From semantic folksonomies, we present a formal description of two models, asemantic user model and a semantic item model, which are used in our socialrecommender system.4.1.1 Semantic Model for the UserFrom semantic folksonomies, a semantic user model is defined as follows:Definition 7 (Semantic User Model) Given a user u  Ũ, a formal description of a usermodel for user u, Mu, follows: Mu=Ťu, Ňu, where Ťu = {(t, i)  Ť × Ĩ | (u, t, i)  Y } andŇu = {(t, )  Ť × Ł | (u, t, )  Ň }. For clarity some definitions of the sets used in this work are introduced. We define,for a given user u, the set of all tags that user u has used Tu* := {t  Ť | i  Ĩ: (t, i) Ťu}. Therefore, the set of all USLs of the user u can be defined as N u := {  Ł | t  *Tu* : (t, )  Ňu}. For a certain item h, we define the set of tags that the user u annotatedthe item h Tuh := {t  Ť | Ťu  (t ×{h})}. Accordingly, the set of all USLs of user u forthe item h can be defined as N uh := {  Ł | t  Tuh : (t, )  Ňu}. As stated previously,all standard set operations, such as union and intersection on USLs, can always beperformed on sets of categories, layer by layer. Therefore, for a given user model Mu foruser u, with respect to semantic space, tags that represent user u has used to annotated acertain item h, can be represented as the union of USLs   N uh : 6 USLh  l 0 USLh (l ) , where USLh (l )  l ( c u u u h cn (9) n )  l , cn  ,N u  Likewise, all tags that user u has used are represented as the union of USLs   N u : * n 6 USL*  i 1USLiu  l 0 USL* (l ) , where USL* (l )  l ( c u u u cn (10) n )  l ,cn  ,N u  * 14
  15. 15. Technical Report, March 2010 User u tags Item h h USL* u USL u USL* (0) u USL* (3) u USL* (6) u USLh (0) u … USLh (3) u … USLh (6) u C1 C2 …… Cn-1 Cn Catset in item h USL* (l ) USLh (l ) Cn Semantic Category u u at layer l Fig. 3 Conceptual user models with respect to IEML semantic space4.1.2 Semantic Model for the ItemFrom semantic folksonomies, a semantic item model is defined as follows:Definition 8 (Semantic Item Model) Given an item i  Ĩ, a formal description of asemantic item model for item i, M(i), follows: M(i) = Ť(i), Ň(i), where Ť(i) = {(u, t) Ũ ׍ | (u, t, i)  Y } and Ň(i) = {(t, )  Ť × Ł | (u, t, )  Ň }. For clarity, we introduce some definitions of the sets from an item perspective. Wedefine, for a given item i, the set of all tags that all users have annotated item i, T*i := {t Ť | u  Ũ: (u, t)  Ť(i)}. Therefore, the set of all USLs of the item i can be defined asN *i := {  Ł | t  T*i : (t, )  Ň(i)}. For a certain user v, we define the set of tags thatthe user v annotated the item i, Tvi := {t  Ť | Ť(i)  ({v}× t)}. Accordingly, The set ofall USLs of user v for the item i can be defined as N vi := {  Ł | t  Tvi : (t, )  Ň(i)}.As stated previously, all standard set operations, such as union and intersection onUSLs, can always be performed on sets of categories, layer by layer. Therefore, for agiven item model M(i) for item i, with respect to semantic space, tags that represent acertain user v has used to annotated item i can be represented as the union of USLs,  N vi : USLiv  l 0 USLiv (l ) , where USLiv (l )  l ( c 6 i cn (11) n )  l , c n  , N v 15
  16. 16. Technical Report, March 2010 Likewise, all tags that have been annotated item i are represented as the union ofUSLs,   N *i : r 6 USLi*  u 1USLiu  l  0 USLi* (l ) , where USLi* (l )  l ( c i cn (12) n )  l , cn  , N *  USL i* USLiv USLi* ( 0 ) USLi* (3) USLi* (6) USLiv (0) USLiv (3) i USLv (6) USLi* (l ) USLiv (l ) Fig. 4 Conceptual item models with respect to IEML semantic space4.2 User-based Semantic Collaborative FilteringIn this section, we describe a social recommendation method based on the semantic usermodel (Definition 7). The basic idea of a user-based semantic collaborative filteringstarts from assuming that a certain user is likely to prefer items that like-minded usershave annotated with tags which are similar to the tags he/she used. Therefore, we firstlook into the set of like-minded users who have tagged a target item and then computehow semantically similar they are to the target user, called a user-user semanticsimilarity. Based on the semantically similar users, the semantic social ranking of theitem is computed to decide whether or not to recommend.4.2.1 Generating Semantically Nearest NeighborsOne of the most important tasks in CF recommender systems is the neighborhoodformation to identify a set of users who have similar taste, often called k nearestneighbors. Those users can be directly defined as a group of connected friends in socialnetworks such as followings in Twitter, connections in Twine, people in Delicious,friends in Facebook, and so on. However, most users have insufficient connections with 16
  17. 17. Technical Report, March 2010their friends. In addition, finding like-minded users in current social network servicesstill relies on manually browsing networks of friends, or keywords searching. Thus, thisform of establishing neighbors becomes a time consuming and ineffective process if wetake into consideration huge amount of available people in the network [3]. As mentioned in Section 2.1, typical collaborative filtering methods adopt a varietyof similarity measurement to determine similar users automatically. However, it alsoencounters serious limitations, namely sparsity problem [1, 10]. It is often the case thatthere is no intersection between two users and hence the similarity is not computable atall. Even when the computation of similarity is possible, it may not be very reliablebecause insufficient information is processed. To deal with this problem recent studieshave determined similarities between users by using user-generated tags in terms ofusers’ characteristics [10, 13, 20, 22, 25]. However, there still remain limitations thatshould be treated such as noise, polysemy, and synonymy tags. To this end, our studyidentifies the nearest neighbors by using Uniform Semantic Locators, USLs, of eachuser for more valuable and personalized analyses. We define semantically similar users as a group of users presenting interestcategories of IEML close to those of the target user. Semantic similarity between twousers, u and v, can be computed by the sum of layer similarities from layer 0 to layer 6.Formally, the semantic user similarity measure is defined as: 6 semUSim(u , v)     simULayer l (u, v) (11) l 0where  is a normalizing factor such that the layer weights sum to unity. simULayerl(u,v) denotes the layer similarity of two users at layer l. The layer similarity measuresbetween two USL sets is defined as the size of the intersection divided by the size of theunion of the USL sets. In other words, it is determined by computing the weightedJaccard coefficient of two USL sets. Formally, the layer similarity is given by: (l  1) | USL* (l )  USL* (l ) | simULayer (u , v )  l  u v (12) 7 | USLu (l )  USLv (l ) | * *where USL* (l ) u and USL* (l ) v refer to the union of USLs for user u and v at layer l, 0  l 6, respectively. Here we give more layer weights at higher layer when computing the 17
  18. 18. Technical Report, March 2010semantic user similarity. That is, the intersections of higher layers present morecontribution than intersections of lower layers. When  is set to 0.25 for normalization,the similarity value between two users is in the range of 0 and 1. The higher score a userhas, the more similar he/she is to a target user. Finally, for a given user u Ũ, particulark users with the highest semantic similarity are identified as semantically k nearestneighbors such that: k SSN k (u )  arg~max semUsim (u , v ) (13) vU {u } To illustrate a simple example for computing semantic user similarity, consider thefollowing three users, Alice, Bob, and Nami. Alice annotated Media1 with tags such as“Community” and “XML”, Bob annotated Media2 with “Wikipedia” and “OWL” tags,and Nami annotated Media3 with “Web of data” and “Folksonomy” tags. In addition,consider USLs of each tag as shown in Fig. 5. *Community *XML USL USL L0: (A: + B:) L0: (A: + S:) L1: (k.) / (m.) L1: (b.) L2: (k.o.-) / (p.a.-) L2: (we.g.-) / (we.b.-) L3: (s.o.-a.a.-’) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.-‘) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_) *Wikipedia *OWL USL USL L0: (U: + S:) L0: (A: + S:) L1: (d.) / (t.) L1: (b.) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) L2: (we.g.-) / (we.h.-) L3: (a.u.-we.h.-) / (n.o.-y.y.-s.y.-) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘) L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) / (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_) *Web of data *Folksonomy USL USL L0: (U: + S:) L0: (A: + B:) L1: (k.) L1: (k.) L2: (s.x.-) / (x.j.-) L2: (s.y.-) / (k.h.-) / (we.b.-) L3: (e.o.-we.h.-’) / (b.i.-b.i.-’) L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’) L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) / (k.o.-a.a.-) / (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,) Fig. 5 An example of tag assignments and USLs for computing the semantic user similarity Now, compute the semantic similarity between Alice and Bob. Table 2 shows theresults of the union of the two USLs for “Community” and “XML” tags and for“Wikipedia” and “OWL” tags, respectively. 18
  19. 19. Technical Report, March 2010Table 2 The union of USLs for Alice and Bob.USL*Alice : the union of Alice’s USLs USL* : the union of Bob’s USLs BobL0: (A:+B:) / (A:+S:) L0: (U:+S:) / (A:+S:)L1: (k.) / (m.) / (b.) L1: (d.) / ( t.) / (b.)L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) /L3: (s.o.-a.a.-’) / (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.- / (we.g.-) / (we.h.-) d.u.-’) L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) / (e.o.-we.h.-’) /L4:  (b.i.-b.i.-’) / (l.o.-u.u.-’)L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.- L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) / A:.g.-’,_) (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’,) / (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.- A:.g.-’,_) From these two USLs set, we can compute the USL layer similarity, layer by layer.For example, in the case of the layer 0, the size of the intersection of two USL sets is 1(i.e., (A:+S:) which means documentary networks). And the size of the union of USLsets is 3 (i.e., (A:+B:), (A:+S:), and (U:+S:)). The layer weight of the layer 0 would beapproximately 0.143 (i.e., 1/7). Consequently, the layer similarity at the layer 0 iscalculated as follows: simULayer0(Alice, Bob) = 1/7  1/3 = 0.048. In the case of thelater 1, the size of the intersection of two USL sets is 1 whereas the size of the union ofUSL sets is 5. And thus, simULayer1(Alice, Bob) = 2/7  1/5 = 0.057. In a similarfashion, it is possible to compute that simULayer2(Alice, Bob) = 0.043, thatsimULayer3(Alice, Bob) = 0.163, that simULayer4(Alice, Bob) = 0, thatsimULayer5(Alice, Bob) = 0.857, and that simULayer6(Alice, Bob) = 0. Finally, from thelayer 0 to the layer 6, the semantic similarity between Alice and Bob, semUSim(Alice,Bob), is computed by the sum of each layer similarity as follows: semUSim(Alice, Bob)=   (0.048 + 0.057 + 0.043 + 0.143 + 0.857) = 0.25 1.168 = 0.292. In the same way, the semantic similarity between Alice and Nami and between Boband Nami is calculated as semUSim(Alice, Nami) = 0.11 and semUSim(Bob, Nami) =0.132, respectively. It means that Alice is semantically similar to Bob, rather than Nami.4.2.2 Item Recommendation via Semantic Social RankingOnce we have identified a group of semantically nearest neighbors, the final step is aprediction, this is, attempting to speculate upon how a certain user would prefer unseenitems. In our study, the basic idea of uncovering relevant items starts from assuming 19
  20. 20. Technical Report, March 2010that a target user is likely to prefer items that have been tagged by semantically similarusers. Items tagged by similar user to the target user should be ranked higher. We labelthis prediction strategy social ranking based on the semantic user model. Formally, thesemantic social ranking score of the target user u for the target item h, denoted asSUR(u, h), is obtained as follows: | USL *  USL h | SUR (u , h )   ( u ) | USL h | v  semUSim (u , v ) vSSN k u (14) vwhere SNNk(u) is a set of k nearest neighbors of user u grouped by the semantic usersimilarity and USLh v is the union of USLs connected to tags that user v has assigned itemh. semUSim(u, v) denotes the semantic similarity between user u and user v. Once the ranking score of the target user for items, which have not previously beentagged by him/her, are computed, the items are sorted in descending order of the scoreSSR(u, h). Two strategies can then be used to select the relevant items to user u. First, ifthe ranking scores are greater than a reasonable threshold, i.e., SUR(u, h) > , the itemsare recommended to user u. Second, a set of top N ranked items that have obtained thehigher scores are identified for user u, and then, those items are recommended to user u. USLw 2 : USLs of Bob for Webpage 2 Bob USLw3 : USLs of Bob for Webpage 3 Bob L0: (A: + S:) L0: (U: + S:) L1: (b.) L1: (d.) / (t.) L2: (we.g.-) / (we.h.-) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘) L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_) / (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,) | USL*Alice  USLM 2 | 6 Bob | USL*Alice  USLM 3 | 0 Bob | USLM 2 | 9 USL*Alice : Union of Alice’s USLs | USLM 3 | 12 Bob Bob L0: (A:+B:) / (A:+S:) L1: (k.) / (m.) / (b.) L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-) L3: (s.o.-a.a.-’)/ (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.- ) | USLM 3 | 9 | USLM 2 | 8 Nami Nami L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_) | USL*Alice  USLM 2 | 3 | USL * Alice  USLM 3 | 4 Nami Nami USL w 2 : USLs of Nami for Webpage 2 USLw3 : USLs of Nami for Webpage 3 Nami Nami L0: (U: + S:) L0: (A: + B:) L1: (k.) L1: (k.) L2: (s.x.-) / (x.j.-) L2: (s.y.-) / (k.h.-) / (we.b.-) L3: (e.o.-we.h.-’) / (b.i.-b.i.-’) L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’) / (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,) / (k.o.-a.a.-) Fig. 6 An example of computing the semantic social ranking of Alice for Media2 and Media3 20
  21. 21. Technical Report, March 2010 Concentrate on the same situation as the example described in the previous section.And assume that Alice is a target user to whom the system should recommend items andthat the neighbors of Alice are Bob and Nami. We want to answer the question: whichitems should the system recommend to Alice? To provide the answer, we shouldcompute the semantic social ranking scores for items that Alice has not previouslytagged (e.g., Media2 and Media3). First, let us calculate the social ranking of Alice forMedia2. To this end, we should first compute a number of the intersection categoriesbetween Alice and each of neighbors for Media2. In the case of Bob, he annotatedMedia2 with tag “OWL”. As can be seen in Fig. 6, the number of intersection categoriesis 6, i.e., | USL*Alice  USLM 2 | 6 , and the size of union of Bob’s USLs for Media2 is 9, i.e., Bob| USLM 2 | 9 . Bob In the case of Nami who annotated Media2 with tag “Web of data”, thenumber of intersection categories is 3, i.e., | USL*Alice  USLM 2 | 3 , and the size of union of NamiNami’s USLs for Media2 is 8, i.e., | USLM 2 | 8 . Finally, the semantic ranking score of NamiAlice for Media2 can be computed by the weighted sum of the number of the interactioncategories using the semantic similarity as the weight: 6  0 .292 3  0 .11 SUR ( Alice , M 2 )    0 .236 9 8 Second, let us calculate the semantic ranking score of Alice for Media3. In this case,there exist no intersections of Alice and Bob at all layers, i.e., | USL*Alice  USLM 3 | 0 . In the Bobcase of Nami, a number of the intersection categories is 4, i.e., | USL*Alice  USLM 3 | 4 . NamiConsequently, the ranking score for Media3 becomes SUR(Alice, M3)=0.049.Considering two social media, Media2 and Media3, the system predicts that Media2will be likely to fit Alice’s needs, rather than Media3.4.3 Item-based Semantic Collaborative FilteringIn this section, we explain an item perspective of a social recommendation method byusing the semantic item model (Definition 8). With respect to an item-basedcollaborative filtering, we first look into the set of similar items that the target user hastagged and then compute how semantically similar they are to the target item, called asemantic item-item similarity. Based on the semantically similar items, we recommend 21
  22. 22. Technical Report, March 2010relevant items to the target user through capturing how he/she annotated the similaritems.4.3.1 Generating Semantically Similar ItemsWe define semantically similar items as a group of items that tagged categories ofIEML close to those of the target item. Semantic similarity between two items, i and j,can be computed by the weighted sum of layer similarities from layer 0 to layer 6.Formally, the semantic item similarity measure is defined as: 6 semISim(i, j )     simILayeri (i, j ) (15) l 0where  is a normalizing factor such that the layer weights sum to unity. simILayerl(i, j)denotes the layer similarity of two items at layer l. The layer similarity measuresbetween two USL sets is defined as the weighted Jaccard coefficient of two USL sets.Formally, the layer similarity is given by: (l  1) | USLi* (l )  USL*j (l ) | simILayer (i, j )  l  (16) 7 | USLi* (l )  USL*j (l ) |where USLi* (l ) and USL*j (l ) refer to the union of USLs for item i and j at layer l, 0  l 6, respectively. Here we give more layer weights at higher layer when computing thesemantic item similarity. That is, the intersections of higher layers present morecontribution than intersections of lower layers. When  is set to 0.25 for normalization,the similarity value between two users is in the range of 0 and 1. The higher score anitem has, the more similar it is to a target item. Finally, for a given item i Ĩ, particulark items with the highest semantic similarity are identified as semantically k most similaritems such that: k SSI k (i )  arg ~ max semIsim (i, j ) (17) jI {i} 22
  23. 23. Technical Report, March 20104.3.2 Item Recommendation via Semantic Social RankingOnce we have identified a group of semantically similar items, the final step is aprediction, this is, attempting to speculate upon how a certain user would prefer unseenitems. In our study, the basic idea of discovering relevant items starts from assumingthat a target user is likely to prefer items which are semantically similar to items thathe/she has tagged before. We call this prediction strategy social ranking based on thesemantic item model. Formally, the prediction value of the target user u for the targetitem i, denoted as SIR(u, i), is obtained as follows: | USL i*  USL u | j SIR (u , i )   j SSI k ( i ) j | USL u |  semISim ( i , j ) (17)where SSIk(i) is a set of k most similar items of item i grouped by the semantic itemsimilarity and USLuj is the union of USLs connected to tags that user u has annotateditem j. semISim(i, j) denotes the semantic similarity between item i and item j. Once the ranking score of the target user for items, which have not previously beentagged by her, are computed, the items are sorted in descending order of the valueSIR(u, i). Finally, a set of top-N ranked items that have obtained the higher scores areidentified for user u, and then, those items are recommended to user u.5 Experimental EvaluationIn this section, we present experimental evaluations of the proposed approach andcompare its performance against that of the benchmark algorithms.5.1 Evaluation Design and MetricsThe experimental data comes from BibSonomy 8 which is a collaborative taggingapplication allowing users to organize and share scholarly references. The dataset usedin this study is the p-core at level 5 from BibSonomy [26]. The p-core of level 5 meansthat each user, tag and item has/occurs in at least 5 posts [9]. The original dataset8 23
  24. 24. Technical Report, March 2010contains several useless tags and system tags, such as “r”, “!”, and “system:imported”,and thus we cleaned those tags in the experiments. Table 3 briefly describes our dataset.Table 3 Characteristic of Bibsonomy dataset (p-core at level 5) |Ũ| |Ĩ| |Ť| |Ł| |Y| |Ň| # of posts 116 361 400 325 9,996 3,783 2,494To evaluate the performance of the recommendations, we randomly divided the datasetinto a training set and a test set. For each user u we randomly selected 20% of itemswhich he had previously posted and subsequently used those as the test set. And 80% ofitems which he had previously posted was used as the training set. To ensure that ourresults are not sensitive to the particular training/test portioning for each user, we used a5-fold cross validation scheme [7]. Therefore, the result values reported in theexperiment section are the averages over all five runs. To measure the performance of the recommendations, we adopted precision andrecall, which can judge how relevant a set of ranked recommendations is for the user[7]. Precision measures the ratio of items in a list of recommendations that were alsocontained in the test set to number of items recommended. And recall measures theratio of in a list of recommendations that were also contained in the test set to theoverall number of items in test set. Precision and recall for a given user u in test set isgiven: | Test (u )  TopN (u ) | , | Test (u )  TopN (u ) | precision (u )  recall (u )  (15) | TopN (u ) | | Test (u ) |where Test(u) is the item list of user u in the test set and TopN(u) is the top Nrecommended item list for user u. Finally, the overall precision and recall for all usersin the test set is computed by averaging the personal precision(u) and recall(u).However, precision and recall are often in conflict with each other. Generally, theincrement of the number of items recommended tends to decrease precision but itincreases recall [18]. Therefore, to consider both of them as the quality judgment ofrecommendations we use the standard F1 metric, which combines precision and recallinto a single number: 24
  25. 25. Technical Report, March 2010 2  precision  recall F1  (16) precision  recall In order to compare the performance of our algorithm, a user-based CF algorithm,where the similarity is computed by cosine-based similarity (denoted as UCF) [18], anitem-based CF algorithm [5], which employs cosine-based similarity (denoted as ICF),and a most popular tags approach [9], which are based on tags that a user already used(denoted as MPT), were implemented. The top N recommendation via the semanticsocial ranking of the user-based method (denoted as SUR) and the item-based method(denoted as SIR) were evaluated in comparison with the benchmark algorithms.5.2 Experiment ResultsIn this section, we present detailed experimental results. The performance evaluation isdivided into two dimensions. The influence of the number of neighbors on theperformance is first investigated, and then the quality of item recommendations isevaluated in comparison with the benchmark methods.5.2.1 Experiments with Neighborhood SizeAs noted in a number of previous studies, the size has significant impact on therecommendation quality of neighborhood-based algorithms [5, 10, 17, 18]. Therefore,we varied the neighborhood size k from 5 to 80. In the case of UCF and SUR, theparameter k denotes the number of the nearest users whereas it is the number of themost similar items for ICF and SIR. Even though MPT is not affected by theneighborhood size at all, we also reported its result for the purpose of comparisons. Inthis experiment, we set the number of recommended items N to 10 (i.e., top-10) for eachuser in the test set. Fig. 7 shows a graph of how precisions and recalls of four methods changes as theneighborhood size grows. UCF elevate precision as the neighborhood size increasesfrom 5 to 20, after this value, it decreased slightly. With respect to recall, the curve ofthe graph for UCF tends to be flat. In the case of ICF, precision and recall improveduntil a neighbor size of 40; beyond this point, further increases of the size give negative 25
  26. 26. Technical Report, March 2010influence on performance. With respect to SIR, we observe that precision and recalltend to improve slightly as k value is increased from 5 to 30 and to 40, respectively.However, after this point, any further increases leads to worse results. For SUR, resultsobtained look different. Unlike UCF, ICF, and SIR, It can be observed from the curve ofthe graphs that SUR was quite affected by the size of the neighborhood. The two chartsfor SUR reveal that increasing neighborhood size has detrimental effects on bothmetrics. In other words, we found that SUR provides better precision and recall whenthe neighborhood size is relatively small. For example, in terms of both precision andrecall, increasing neighborhood size much beyond 20 yielded rather worse results thanwhen the size was 5. This result indicates that superfluous users can negatively impactthe recommendation quality of our method. Fig. 7 Precision and recall with respect to increasing the neighborhood size We continued to examine F1 values in order to compare with each other and then toselect the best neighborhood size of each method. Fig. 8 depicts F1 variation of fourmethods as the neighborhood size increases. SUR outperforms MPT and ICF at allneighborhood size levels, whereas it provides improved performance over UCF as theneighborhood size increases from 5 to 40; beyond this point, F1 was poorer for SURthan for UCF. Examining the best value of each method, F1 of UCF is 0.107 (k=20), F1of ICF is 0.099 (k=40), F1 of MPT is 0.0825, F1 of SUR is 0.119 (k=10), and F1 of SIRis 0.114 (k=30). This result implies SUR can provide better performance than the othermethods even when data is sparse or available data for users is relatively insufficient. Inpractice, CF recommender systems make a trade-off between recommendation qualityand real-time performance efficiency by pre-selecting a number of neighbors. In 26
  27. 27. Technical Report, March 2010consideration of both quality and computation cost, we selected 20, 40, 10, and 30 asthe neighborhood size of UCF, ICF, SUR, and SIR, respectively, in subsequentexperiments. Fig. 8 F1 value with respect to increasing the neighborhood size5.2.2 Comparisons with Other MethodsTo experimentally evaluate the performance of top N recommendation, we calculatedprecision, recall, and F1 obtained by UCF, ICF, MPT, SUR, and SIR according to thevariation of number of recommended items N from 2 to 10 with an increment of 2.Since in practice users tend to click on items with higher ranks, we only examined asmall number of recommended items. Fig. 9 depicts the precision-recall plot, showing how precisions and recalls of fourmethods changes as N value increases. Data points on the graph curves refer to thenumber of recommended items. That is, first point of each curve represents the case ofN=2 (i.e., top-2) whereas last point is the case of N=10 (i.e., top-10). As can be observedfrom the graph, the curves of all methods tend to descend. This phenomenon impliesthat the increment of the number of items recommended tends to decrease precision butit increases recall. With respect to precision, in the case that N=2 and N=4, SIRdemonstrates the best performance. However, SUR demonstrates the best performance 27
  28. 28. Technical Report, March 2010as N is increased. With respect to recall, SUR outperforms the other four methods on alloccasions. Fig. 9 Precision and recall as the value of the number of recommended items N increases Fig. 10 Comparisons of F1 values as the number of recommended items N increases Let us now focus on F1 results. Fig. 10 depicts the results of F1, showing how SURand SIR outperforms the other methods. As shown, both of the methods showconsiderably improved performance compared to UCF, ICF, and MPT. For example,SUR achieves 0.1%, 1.6%, and 1.5% improvement in the case of top-2 (N=2) whereasSIR achieves 0.3%, 1.9% and 1.7% improvement, compared to UCF, ICF, and MPT,respectively. When comparing the results achieved by SUR and SIR, The 28
  29. 29. Technical Report, March 2010recommendation quality of the former is superior to that of SIR as N is increased. Interms of the five cases in average, SUR obtains 0.6%, 1.8%, 2.6%, and 0.3%improvement compared to UCF, ICF, MPT, and SIR, respectively. Fig. 11 Comparisons of precision, recall, and F1 for cold start users and active users We further examined the recommendation performance for users who had few posts,namely cold start users, and had lots of posts, namely active users, in the training set. ACF-based recommender system is generally unable to make high qualityrecommendations, compared to the case of active users, which is pointed out as one ofthe limitations. We selectively considered two subsets of users who have less than 6posts (21 users) and greater than 25 posts (21 users). And for two groups we calculatedprecision, recall, and F1 within top-10 of ranked result set obtained by UCF, ICF,MPT, and SUR. Fig. 11 shows those results for the cold start users (left) and active users(right). As we can see from the graphs, the result demonstrated that F1 values of thecold start group were considerably low compared to those for the active group. Suchresults were caused by the fact that it was hard to analyze the users’ propensity forpostings because they did not have enough information (items or tags). Nevertheless,comparing the results achieved by SUR and the benchmark algorithms, for the cold startdataset, precision, recall, and F1 values of the former was found to be superior to thoseof the other methods. For example, SUR obtains 11.9%, 14.3%, and 2.4% improvementfor recall compared to UCF, ICF, and MPT, respectively. In terms of F1, SURoutperforms UCF, ICF and MPT by 2.2%, 2.6% and 0.4%, respectively. Only MPTachieves comparable results. This result indicates that utilizing tagging information canbe helpful to alleviate the problem of the cold start users and thus to improve the quality 29
  30. 30. Technical Report, March 2010of item recommendations. With respect to the active dataset, it can be observed thatSUR provides better performance on all occasions than ICF and MPT. And comparingF1 obtained by UCF and SUR, the difference appears insignificant in a comparativefashion. Although precision of SUR is slightly worse than that of UCF in the activedataset, notably the proposed method provides better quality than the benchmarkmethods. That is, SUR can provide more suitable items not only to the cold start usersbut also to the active users. Comparing results in the cold start and active datasetachieved by MTP, interesting results were observed. Simple approach based on tags forrecommendations works well enough for the cold start users, compared to UCF andICF. However, in the other scenario, superfluous tags of users can include noise instead. We conclude from these comparison experiments that our approaches can provideconsistently better quality of recommendations than the other methods. Furthermore, webelieve that the results of the proposed approach will become more practicallysignificant on large-scale Web 2.0 frameworks.6 Concluding RemarksFor the future of the social Web, in this report, we have presented the semantic modelsfor the interoperability challenges that face semantic technology. We also proposed twomethods of collaborative filtering applied the semantic models and analyzed thepotential benefits of IEML to social recommender systems. As noted in ourexperimental results, our methods can successfully enhance the performance of itemrecommendations. Moreover, we also observed that our methods can provide moresuitable items for user interests, even when the number of recommended is small. Themain contributions of this study can be summarized as follows: 1) Our methods cansolve traditional stumbling blocks such as polysemy, synonymy, data sparseness, coldstart problem, semantic interoperability. 2) It can also offer trustworthy itemssemantically relevant to a user’ needs because it becomes easier not only to catchhis/her preference but also to recommend to him/her by capturing semantics of user-generated tags. 30
  31. 31. Technical Report, March 2010AcknowledgmentThe work was mainly funded since 2009 by the Canada Research Chair in CollectiveIntelligence at University of Ottawa.References1. Adomavicius, G., Tuzhilin, A. (2005) Toward the Next Generation of Recommender Systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17(6): 734-7492. Bao, S., Wu, X., Fei, B., Xue, G., Su, Z., Yu, Y. (2007) Optimizing web search using social annotations. In: Proceedings of the 16th International Conference on World Wide Web, pp. 501-5103. Bonhard, P., Sasse, A. (2006) ‘Knowing me, knowing you’ - using profiles and social networking to improve recommender systems. BT Technology Journal 24(3): 84-984. Breese, J. S., Heckerman, D., Kadie, C. (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 43–525. Deshpande, M., Karypis, G. (2004) Item-based Top-N Recommendation Algorithms. ACM Transactions on Information Systems22(1): 143-1776. Golder, S. A., Huberman, B. A. (2006) Usage patterns of collaborative tagging systems. Journal of Information Science 32(2): 198-2087. Herlocker, J. L., Konstan, J. A., Terveen, L. G., Riedl, J. T. (2004) Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22(1): 5-538. Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. (2006) Information Retrieval in Folksonomies: Search and ranking. In: Proceedings of the 3rd European Semantic Web Conference, pp. 411-4269. Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G. (2008) Tag recommendations in social bookmarking systems. AI Communications 21(4): 231-24710. Kim, H.-N., Ji, A.-T., Ha, I., Jo, G.-S. (2009) Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electronic Commerce Research and Applications, Doi: 10.1016/j.elerap.2009.08.00411. Lévy, P. (2009) Toward a self-referential collective intelligence some philosophical background of the IEML research program. In: Proceedings of 1st International Conference on Computational Collective Intelligence - Semantic Web, Social Networks & Multiagent Systems, pp. 22-3512. Lévy, P. (2010) From social computing to reflexive collective intelligence: The IEML research program. Information Sciences 180(1): 71-9413. Li, X., Guo, L., Zhao, Y. (2008) Tag-based social interest discovery. In: Proceedings of the 17th International Conference on World Wide Web, pp. 675-684 31
  32. 32. Technical Report, March 201014. Marchetti, A., Tesconi, M., Ronzano, F. (2007) SemKey: A semantic collaborative tagging system. In: Proceedings of Tagging and Metadata for Social Information Organization Workshop in the 16th International Conference on World Wide Web15. Peis, E., Morales-del-Castillo, J. M., Delgado-López, J. A. (2008) Semantic recommender systems. Analysis of the state of the topic. number 6. Accessed 15 Dec 200916. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J. (1994) GroupLens: An open architecture for collaborative filtering of netnews. In: Proceedings of the ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175–18617. Sarwar, B., Karypis, G., Konstan, J., Reidl, J. (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the Tenth International World Wide Web Conference, pp. 285-29518. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2000) Analysis of recommendation algorithms for E- commerce. In: Proceedings of ACM Conference on Electronic Commerce, pp. 158–16719. Schenkel, R., Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J. X., Weikum, G. (2008) Efficient top-k querying over social-tagging networks. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 523-53020. Siersdorfer, S., Sizov, S. (2009) Social recommender systems for web 2.0 folksonomies. In: Proceedings of the 20th ACM conference on Hypertext and hypermedia, pp. 261-27021. Sigurbjörnsson, B., van Zwol, R. (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web, pp. 327-33622. Tso-Sutter, K. H. L., Marinho, L. B., Thieme, L. S (2008) Tag-aware recommender systems by fusion of collaborative filtering algorithms. In: Proceedings of the 2008 ACM symposium on Applied computing, pp. 1995-199923. Xu, Z., Fu, Y., Mao, J., Su, D. (2006) Towards the Semantic Web: collaborative tag suggestions. In: Proceedings of the Collaborative Web Tagging Workshop in the 15th International Conference on the World Wide Web24. Zanardi, V., Capra, L. (2008) Social Ranking: Uncovering relevant content using tag-based recommender systems. In: Proceedings of the 2008 ACM conference on Recommender Systems, pp. 51-5825. Zhang, Z.-K., Zhou, T., Zhang, Y.-C. (2010) Personalized recommendation via integrated diffusion on user-item-tag tripartite graphs. Physica A: Statistical Mechanics and its Applications 389(1): 179- 18626. Knowledge and Data Engineering Group (2007) University of Kassel: Benchmark Folksonomy Data from BibSonomy, version of April 30th, 2007. Accessed 15 Dec 2009 32