Tutorial on Personalization techniques. Covers user profile creation, document modeling techniques (including LSI an PLSI) and the use of semantics in personalization
4. Why Personalization ?
• Scale of the web is limiting its utility
• There is too much information
• Consumer has to do all the work to use the web
• Search engines and portals provide the same results for
different personalities, intentions and contexts
• Personalization can be the solution
• Customize the web for individuals by
• Filtering out irrelevant information
• Identifying relevant information
4 22 January 2008
5. Some quotes from NY bits
•I am married with a house. Why do I see so many
ads for online dating sites and cheap mortgages?
Should I be happy that I see those ads? It means
Internet advertisers still have no idea who I am.
5 22 January 2008
6. Personalization
• Goal – Provide users what they need without requiring
them to ask for it explicitly
• Steps
• Generate useful, actionable knowledge about users
• Use this knowledge for personalizing an application
• User centric data model – Data must be attributable to
specific user
• Two kinds
• Business Centric : Amazon, Ebay
• Consumer Centric
• Personalization requires User Profiling
6 22 January 2008
7. Applications of Personalization
• Interface Personalization
• E.g. Go directly to the web page of interest instead of
site home page
• Content personalization
• Filtering (News, blog articles, videos etc)
• Ratings based recommendations
• Amazon, Stumbleupon
• Search
• Text, images, stories, research papers
• Ads
• Service Personalization
7 22 January 2008
8. Why is personalization hard ?
• Server side personalization – Sites do not see all
data
• E.g. A user might visit Expedia and Orbitz, Expedia
doesn’t know what the user did on Orbitz
• Difficult to get user context
• User needs to agree to cookies or login
• Site profiles are not portable
• Some standards are emerging (Attention profile markup
language)
• Privacy
8 22 January 2008
9. Personalization example 1 (Routing
queries)
Google alerts Google news page
routing queries
9 22 January 2008
15. Outline
• User Profile creation
• Profile Privacy
• Evaluating and managing user profiles
• Personalizing search
15 22 January 2008
16. User profile information
• Two kinds of information
• Factual (Explicit)
• Behavioral (Implicit)
• Factual – Geographic, Demographic,
Psychographic information
• Eg. Age is 25 years, searched for Lexus, lives in
Bangalore
• Behavioral – Describes behavioral activities (visits
finance sites, buys gadgets)
16 22 January 2008
17. Client side versus Server side profiles
Server side
Client side
Have queries, clickstreams from
No access to clickstreams of
multiple users
multiple users
Don’t see all the user data
See all user data
No way for users to aggregate
Possible for user to aggregate
and reuse the profiles different
and reuse their attentional
websites (Google, Yahoo, ..) build
information
using their data
Strong privacy model
Privacy is a big problem
Can access the full compute
Server cycles have to be shared,
power at the client
however some computations can
be done once and reused
17 22 January 2008
18. Desired profile characteristics
• Represent multiple interests
• Adapt to changing user interests
• Incorporate contextual information
18 22 January 2008
19. Using User profiles to personalize services
Search query, Content
news, video,
…
Explicit
and User
Implicit info Profile Profile
Data Profile to Content
Collection Constructor Matching
User Personalized services
Diagram adapted from Gauch et.al,
Chapter 2, The Adaptive Web, Springer
LNCS 4321
19 22 January 2008
20. User Profiling approaches
• Broadly two approaches
• IR approach
• User interests derived from text (documents/search
queries)
• Machine learning approach
• Model user based on positive and negative examples of
his interests
• Problems
• Getting labeled samples
• High dimensional feature space
20 22 January 2008
21. Profile building Steps
• Authenticate the user
• Select information to build profile from and archive
the information if necessary (eg. Web pages might
get flushed from IE cache)
• Build/Refresh/Expand/Prune the profile
• Use it in an application
• Evaluate the profile
21 22 January 2008
22. Authenticating the user
• Users need to be authenticated in order to attribute
data to a particular user for profile creation
• Identifying single user
• Login
• Cookies
• IP address (when it is static)
• Identifying different users on same machine
• Login
• Biometrics
22 22 January 2008
23. Explicit user information collection
• Ask the user for
• static information
• Name, age, residence location, hobbies, interests etc
• Google personalization – found explicit information to be noisy
• People specified literature as one of their interests but did not make
a single related search
• Matchmine – presents examples (movies, TV shows, music, blog topics)
and asks the users to explicity rate them
• Ratings
• Netflix, Stumbleupon (thumbs up/down)
• In general, people do not like to give explicit information
frequently
• Recent research (Jian Hu WWW 2007) showed good
results for gender and age prediction based on users
browsing behavior
23 22 January 2008
25. Implicit user information collection
• Data sources
• Web pages, documents, search queries, location
• Information from applications (Media players, Games)
• Data collection techniques
• Desktop based
• Browser cache
• Proxy servers
• Browser plugins
• Server side
• Web logs
• Search logs
25 22 January 2008
26. How much implicit info to use ?
• Teevan (SIGIR 2005) constructed two profiles
• One with only search queries
• Other using all information on desktop
• Findings
• More richer information => better profile
• All docs better than only recent docs better than only
web pages better than only search queries better than
no personalization
• Drawback with implicit info – cannot collect info
about user dislikes
26 22 January 2008
27. Stereotypes
• Generalizations from communities of users
• Characteristics of group of users
• Stereotypes alleviate the bootstrap problem
• Construction of stereotypes
• Manual – e.g. Bangalore user will be interested in IT
• Automatic method
• Clustering – Similar profiles are clustered and common
characteristics extracted
27 22 January 2008
28. How Acxiom delivers personalized ads
(source - WSJ)
• Acxiom has accumulated a database of about 133 million households and
divided it into 70 demographic and lifestyle clusters based on information
available from public sources.
• A person gives one of Acxiom’s Web partners his address by buying
something, filling out a survey or completing a contest form on one of the
sites.
• Acxiom checks the address against its database and places a “cookie,” or
small piece of tracking software, embedded with a code for that person’s
demographic and behavioral cluster on his computer hard drive.
• When the person visits an Acxiom partner site in the future, Acxiom can use
that code to determine which ads to show
• Through another cookie, Acxiom tracks what consumers do on partner Web
sites
28 22 January 2008
29. Profile representation
• Bag of words (BOW)
• Use words in user documents to represent user interests
• Issues
• Words appear independent of page content (“Home”, “page”)
• Polysemy (word has multiple meanings e.g. bank)
• Synonymy (multiple words have same meanings e.g. joy, happiness)
• Large profile sizes
• Concepts (e.g. DMOZ)
• Use existing ontology maintained for free
• Issues
• Too large (about 6 lakh DMOZ nodes), ontology has to be drastically
pruned for use
• Need to build classifiers for each DMOZ node
29 22 January 2008
30. Word based term vector profiles
• Profile represented as sets of words tf*idf weighted
• Could use one long profile vector or different
vectors for different topics (sports, health, finance)
• Documents converted to same representation,
matched with keyword vectors using cosine
similarity
• Should take structure of the document into account
(ignore html tags, email header vs body)
30 22 January 2008
31. Word based hierarchical profiles
Support of User Profile:10
Interest
Research:5 Sports:3.5 Sex:1.5
IR:3 DB:2 Soccer:2 Others:1.5
Search:2 ...
Support decreases from high to low level, and from left to right
We are thankful to Yabo Arber-Xu from Simon Fraser University
for kindly allowing us to use slides numbered 31,37,38,39 from
his WWW 07 presentation.
31 22 January 2008
32. Building word based hierarchical
profiles
• Builda (word, document) map for each word
occurring in the corpus
• Order words by amount of support
• Support of a word = number of documents in which
word appears
• For each word
• Decide whether to merge with another word (using some
measure of similarity)
• Decide whether to make one word the child of other
32 22 January 2008
34. Term similarity and Parent-child terms
• Words that cover the same document sets are similar
• Jacquard measure
Sim( w1, w2) =| D( w1 ) I D( w2 ) | / | D ( w1 ) U D ( w2 ) |
• Parent child terms
• A specific term is a child of a more general term if it frequently occurs
with a general term (but the reverse is not true)
• Word w2 is taken as child of term w1 if P(w1|w2) > some_threshold
• e.g. Terms “Soccer” and “Badminton” might co-occur with the term
“Sport” but not the other way around
34 22 January 2008
35. Personalization and Privacy
• Studies have shown that
• People are comfortable sharing preferences (favourite TV
show, snack etc.), demographic and lifestyle information
• People not comfortable sharing financial and purchase
related information
• Facebook fiasco because of reporting “Your friends bought …”
• Financial rewards (even small amounts) encourage
disclosure
• People parted with valuable information for Singapore
$15
35 22 January 2008
37. What and How much to Reveal? - 1
More
User Profile:10 Sensitive
More
Research:5 Sports:3.5 Sex:1.5
specific
IR:3 DB:2 Soccer:2 Others:1.5
Search:2 ...
Manual Option – Absolute privacy guarantee, but requires a lot of user
intervention
37 22 January 2008
38. What and How much to Reveal? - 2
User Profile U à indicator of a user’s possible interests
Term t à indicator of a possible interest,
P(t)=Sup(t)/|D|
The amount of information for an interest t
I(t) = log(1/P(t))= log(|D|/ Sup(t)).
àindication of the specificity and sensitivity of an interest
H(U) – the amount of information carried by U
H(U)=∑tP(t)×I(t)
Two Privacy Parameters:
MinDetail - Protect t with P(t)<MinDetail
ExpRatio – H(U[exp] )/H(U)
The more detail we expose, the higher expRatio.
38 22 January 2008
39. What and How much to Reveal? - 3
User Profile:10 minDetail=0.5
expRatio=44%
minDetail=0.3
Research:5 Sports:3.5 Sex:1.5 expRatio=69%
IR:3 DB:2 Soccer:2 Others:1.5
Search:2 ...
The mindetail and expRation parameters allow a balance between privacy
and personalization.
39 22 January 2008
40. Profile portability
• Move the profile to a central server
• Claria PersonalWeb, Google-Yahoo-Microsoft
• Provision to delete search queries, visited pages
• No control over which part of the profile can be used
• Have a client side component that reconstructs the
profile on the client using server side info
(Matchmine)
• Attention Profile markup language
• Allows explicit and implicit information to be stored (as
XML) and provided to web services
40 22 January 2008
42. Application-independent evaluation of
the profile
• Stability
• Number of profile elements that do not change over the evaluation
cycle
• Precision
• How many items in the profile does the user agree with as
representative of his interests ?
• Does the user agree with the strength of the interest ?
• Do interests at deeper levels of the hierarchy have less precision
compared to interests at higher levels ?
• Which data sources (bookmarks, search keywords, web
pages) is better ?
• Bookmarks were not very representative of user interests in our study
42 22 January 2008
43. Profile evaluation
Sample Evaluation of one profiling
algorithm 0.8
0.7
0.6
0.5
Stability
Stability_alpha
0.4
•Profiles are stable (fig 1) 0.3
Stability_date
0.2
•Profile elements with high support 0.1
have high precision (fig 2) 0
0 200 400 600
•Profile elements at all levels of the Number of web pages in cache
hierarchy have similar precision (fig 3)
Figure 1
1 1.2
0.95 1
0.9 Percent (%) 0.8
Precision
Percentage in profile
0.85 0.6
Precis ion
0.8 0.4
0.75 0.2
0.7 0
Support > 5 3 < Support < 5 Support < 3 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6
Figure 2 Figure 3
43 22 January 2008
44. Managing the profile
• Profiles may need to be expanded (bootstrapped) or
pruned
• Allowing users to manually edit their profiles to add/delete
topics of interest was found to make performance worse
(Jae-wook Ahn, WWW 2007)
• Adding and deleting topics to profile harmed system performance
• Deleting topics harmed performance four times more compared to
adding topics
• Some agents learn short term and long term profiles
separately using different techniques (K-NN for short term
interests, Naïve Bayes for long term interests)
44 22 January 2008
46. Personalized search
• Search can be personalized based on
• User profile
• Current working context
• Past search queries
• Server side clickstreams
• Personalized Pagerank
• Determining user intent is hard (e.g query Visa)
46 22 January 2008
47. A generic personalized search algorithm
using a user profile
• Inputs- User profile, Search query
• Output – A results vector reordered by the user’s
preference
• Steps
• Send the query to a search engine
• Results[] = A vector of the search engine’s results
• For each item i in Results[] calculate the preference
Pref [i] = α *Similarity(Results[i] , User Profile)
+ (1- α)*SearchEngineRank
• Sort Results[] using Pref [i] as the comparator
47 22 January 2008
48. Current working context – JIT retrieval
• Context includes time, location, applications
currently running, documents currently opened, IM
status
• Use profile and current context to provide relevant
(and just-in-time) information
• Blinkx toolbar – provides relevant news, video and
Wikipedia articles within different applications (Micrsoft
Word, IE browser)
• Intersectinterests from the overall profile with
current context to get the contextual profile
• Context can also be used in query expansion
48 22 January 2008
49. Personalization based on Search history
• Use query-to-query similarity to suggest results that
satisfied past queries
• Create user profiles from past queries/snippets from
search results clicked
• Misearch (Gauch et.al 2004) creates weighted concept
hierarchies based on ODP as the reference concept hierarchy
• Compute degree of similarity between search engine result
snippets (title and text summaries) and user profile as
n
sim ( user i , doc j ) = ∑ wp
k =1
ik * wd jk
wp ik = weight of concept k in profile i
wd jk = weight of concept k in document j
49 22 January 2008
50. Personalization by clickthrough data analysis
– CubeSVD (Jian-Tao Sun, WWW 2005)
• Search engine has tuples of the form (User, Query, Visited
page)
• Multiple tuples constitute a tensor (generalization of matrix
to higher dimensions)
• Higher order SVD (HOSVD) performs SVD on tensor
• The reconstructed tensor is a tuple of the form (User,
Query, web page, p)
• Where p is the probability that the user posing the query will visit
the web page
• Recommend pages with highest value of p
• Computationally intensive but HOSVD can be done offline
• Need to recompute to account for new clickthrough data
50 22 January 2008
51. Topic sensitive pagerank (Haveliwala
2002)
• For top 16 ODP categories, create a pagerank vector
• Each web page/document d has multiple ranks
depending on what the topic of interest j is
• For a query compute, P(Cj|q) = P(Cj)*P(q,Cj)
• Intuition: If a topic is more probable given a query, the
topic specific rank should have more say in the final
rank
• Compute query sensitive rank as
∑ P(C j | q) * rank jd
51 22 January 2008
52. Topics
• Overview of Personalization
• User Profile creation
• Personalizing Search
• Document modeling
• Recommender System
• Semantics in Personalization
52 22 January 2008
54. Under this topic
• Document representation
• Document analysis using
• Latent Semantic Analysis (LSA)
• Probabilistic Latent Semantic Analysis (PLSA)
• Document Classification
• Support Vector Machine (SVM): A machine learning
algorithm
54 22 January 2008
55. Document representation
• Term vector
• Document is represented as vector of terms
• Each dimension corresponds to a separate term
• Several methods of computing the weights of the terms
• Binary weighting: 1 if the word appear in the document
• Most well known is TF*IDF
ni , j
tf i , j =
∑n
k
k, j
D
idf i = log
{d j : ti ∈ d j }
tfidf i , j = tf i , j × idf i
55 22 January 2008
56. Computing similarity
sim ( A, B ) = cos ine (θ ) =
A•B
=
∑ A ×Bi i
A
2
× B
2
∑A ∑B
i
2
i
2
AI B
Jaccard coefficien t = J ( A, B ) =
AU B
2 AI B
Dice' s coefficien t = D ( A, B ) =
A+ B
56 22 January 2008
57. Example
• g1: Google Gets Green Light from FTC for DoubleClick Acquisition
• g2: Google Closes In on DoubleClick Acquisition
• g3: FTC clears Google DoubleClick deal
• g4: US regulator clears DoubleClick deal
• g5: DoubleClick deal brings greater focus on privacy
• e1: EU Agrees to Reduce Aviation Emissions
• e2: Aviation to be included in EU emissions trading
• e3: EU wants tougher green aviation laws
• Underlined words appeared in more than one documents
57 22 January 2008
59. Retrieval example
• Query (or Profile) q = “Google Acquisition”
• Query vector q = [1 0 0 0 1 0 0 0 0 0]'
• Cosine similarity of the query to the documents
g1 g2 g3 g4 g5 e1 e2 e3
S=
0.634 0.816 0.447 0 0 0 0 0
• What about the documents g4 and g5?
• Problem of data sparsity
59 22 January 2008
60. Under this topic
• Document representation
• Document analysis using
• Latent Semantic Analysis (LSA)
• Probabilistic Latent Semantic Analysis (PLSA)
• Document Classification
• Support Vector Machine (SVM) ): A machine learning
algorithm
60 22 January 2008
61. Latent Semantic Analysis (LSA)
• You searching for “Tata Nano” are not the documents
containing “People’s Car” also relevant?
• How a machine can understand that?
• Analyze the collection of documents
• Documents that contain “Tata Nano” generally contain “People’s
Car” as well
• Covariance of these two dimensions are high
• LSA finds such correlation using a technique from linear algebra
61 22 January 2008
62. LSA
• Transforms the term document matrix into a relation
between the
• terms and some concepts,
• relation between those concepts and the documents
• Concepts are the dimensions of maximum variance
• Removes the dimensions with low variance
• Reduction in feature space
• Term document matrix becomes denser
62 22 January 2008
63. Singular Value Decomposition
documents
•1
•2
•3
… D'
terms
X = T S •m
mxm mxd
txd txm
•1• •2 • … • •m>0
m is the rank of the matrix X
T and D are orthonormal matrix
S is a diagonal matrix of singular values
63 22 January 2008
64. Reduced SVD
documents
•1
•2
•3 Dk '
…
= Tk Sk
terms Xk •k
mxk mxk
txd txk
-Choose largest k singular values (•1… •k)
-Choose k columns of T and D
-Then construct Xk
-Xk is the best k rank approximation of X in terms of Frobenius norm
64 22 January 2008
65. Example
• g1: Google Gets Green Light from FTC for DoubleClick Acquisition
• g2: Google Closes In on DoubleClick Acquisition
• g3: FTC clears Google DoubleClick deal
• g4: US regulator clears DoubleClick deal
• g5: DoubleClick deal brings greater focus on privacy
• e1: EU Agrees to Reduce Aviation Emissions
• e2: Aviation to be included in EU emissions trading
• e3: EU wants tougher green aviation laws
• Query (or Profile) q = “Google Acquisition”
65 22 January 2008
70. LSA Example
Dq S22
X X
D2'
Sim =
70 22 January 2008
71. Example
• g1: Google Gets Green Light from FTC for DoubleClick Acquisition [1.28
4]
• g2: Google Closes In on DoubleClick Acquisition [0.936]
• g3: FTC clears Google DoubleClick deal [1.426]
• g4: US regulator clears DoubleClick deal [0.891]
• g5: DoubleClick deal brings greater focus on privacy [0.697]
• e1: EU Agrees to Reduce Aviation Emissions [0.035]
• e2: Aviation to be included in EU emissions trading [0.035]
• e3: EU wants tougher green aviation laws [0.152]
• Underlined words appeared in more than one documents
71 22 January 2008
72. Under this topic
• Document representation
• Document analysis using
• Latent Semantic Analysis (LSA)
• Probabilistic Latent Semantic Analysis (PLSA)
• Document Classification
• Support Vector Machine (SVM) ): A machine learning
algorithm
72 22 January 2008
73. Probabilistic Latent Semantic Analysis
(PLSA)
• If we know the document collection contains two
topics can we do better?
• Can we estimate
• Probability( topic | document) ?
• Probability( word | topic) ?
• If we can also estimate Probability( topic | query) then we
can compute the document to query similarity
• PLSA is a statistical technique to estimate those probability
from a collection of documents
73 22 January 2008
74. Probabilistic Latent Semantic Analysis
(PLSA)
• Dyadic data: Two (abstract) sets of objects, X ={x1,
..,xm} and Y ={y1, … ,yn} in which observations are
made of dyads(x,y)
• Simplest case: observation of co-occurrence of x and y
• Other cases may involve scalar weight for each
observation
• Examples:
• X = Documents, Y =Words
• X = Users, Y =Purchased Items
• X = Pixels, Y =Values
74 22 January 2008
75. PLSA
• Document consists of topics and words in the document are generated
based on those topics
• Generative model (asymmetric): (di, wj) is generated as follow
• pick a document with probability P(di),
• pick a topic zk with probability P(zk | di),
• generate a word wj with probability P(wj | zk)
( ) (
P d i , w j = P(d i )P w j | d i )
P(di) P(zk |di) P(wj |zk)
( ) ∑ P(w j | z k )P(z k | d i )
K
D Z W
P w j | di =
k =1
75 22 January 2008
76. PLSA
• Parameters P(di), P(zk | di), P(wj | zk)
• P(di) is proportional to number of times the document is observed and be
computed independently
• P(zk | di), P(wj | zk) can be estimated using Expectation Maximization
(EM) algorithm
∏∏ P(d , w )
N M
P ( D, W ) = i j
n(di ,w j )
i =1 j =1
∑∑ n(d , w )ln P(d , w )
M N
L= i j i j
i =1 j =1
M = Number of documents; N = Number of distinct words
76 22 January 2008
77. PLSA: EM steps
• E-Step: ( )
P z k | di , w j =
( )
P w j | z k P(z k | d i )
∑ P(w )
K
j | zl P(zl | d i )
l =1
M-Step:
∑ n(d , w )P(z )
• N
i j k | di , w j
(
P w j | zk = ) i =1
M N
∑∑ n(d , w
m =1 i =1
i m )P(z k | d i , wm )
∑ n(d , w )P(z )
M
i j k | di , w j
P (z k | d i ) =
j =1
n( d i )
77 22 January 2008
78. PLSA Example
g1 google e1 eu
g1 green e1 aviation
•Dyadic data in our g1 ftc e1 emission
example
g1 doubleclick e2 aviation
g1 acquisition e2 eu
g2 google e2 emission
g2 doubleclick e3 eu
g2 acquisition e3 green
g3 ftc e3 aviation
g3 clear
g3 google
g3 doubleclick
g3 deal
g4 clear
g4 doubleclick
g4 deal
g5 doubleclick
g5 deal
78 22 January 2008
79. PLSA Example
• After 20 iterations of EM algorithm
P(zk |di)
P(wj |zk)
79 22 January 2008
80. PLSA Example
• Query q = “Google Acquisition”
• Steps
• Keep P(wj |zk) fixed.
• Estimate P(zk |q) using EM steps
• Then compute cosine similarity of the vector P(Z|q) to the
P(Z|d)
Z1 Z2
q 1 0
P(zk |q)
80 22 January 2008
81. Example
• g1: Google Gets Green Light from FTC for DoubleClick Acquisition [1.0]
• g2: Google Closes In on DoubleClick Acquisition [1.0]
• g3: FTC clears Google DoubleClick deal [1.0]
• g4: US regulator clears DoubleClick deal [1.0]
• g5: DoubleClick deal brings greater focus on privacy [1.0]
• e1: EU Agrees to Reduce Aviation Emissions [0.0]
• e2: Aviation to be included in EU emissions trading [0.0]
• e3: EU wants tougher green aviation laws [0.0]
• Underlined words appeared in more than one documents
81 22 January 2008
82. Under this topic
• Document representation
• Document analysis using
• Latent Semantic Analysis (LSA)
• Probabilistic Latent Semantic Analysis (PLSA)
• Document Classification
• Support Vector Machine (SVM) ): A machine learning
algorithm
82 22 January 2008
84. Document classification with SVM
• We will concentrate on binary classification
• {sports, not sports}, {interesting, not interesting} etc
• In general {+1,-1} also called {positive, negative}
• SVM is a supervised machine learning technique. It learns
the pattern from a training set
• Training set
• A set of documents with labels belonging to {+1, -1}
• SVM tries to draw a hyperplane that best separates the
positive and negative data in the training set
84 22 January 2008
85. Support Vector Machine (SVM)
• A Machine learning algorithm
• SVM was introduced in COLT-92 by Boser, Guyon and
Vapnik.
• Initially popularized in the NIPS community, now an
important and active field of all Machine Learning
Research
• Successful applications in many fields (text, bioinformatics,
handwriting, image recognition etc.)
85 22 January 2008
86. SVM – Maximum margin separation
SVM illustration by
Bülent Üstün
Radboud Universiteit
86 22 January 2008
87. Mapping to higher dimension for non-
separable data
P1 • (0,0) x {+1}
P2 • (0,1) x {-1}
P2 P3
P3 • (1,1) x {+1}
P4 • (1,0) x {-1}
P1 P4
P1 • (0,0,0) x {+1}
x 2
1
x → φ (x ) → x 2
2
P2 • (0,1,0) x {-1}
x x P3 • (1,1,1) x {+1}
1 2
P4 • (1,0,0) x {-1}
87 22 January 2008
88. The XOR example
SVM uses kernel trick to map data to higher
dimensional feature space without incurring
much computational overhead
88 22 January 2008
92. Classification
• Broadly three approaches
• Content Based Recommendation
• Collaborative Filtering
• Hybrid approach
92 22 January 2008
93. Content based recommendation
• Utility of an item for a user is determined based on
the items preferred by the user in the past
• Applies similar techniques as introduced in the
document modeling part
93 22 January 2008
94. Basic Approach
• Create and represent the user profile from the items rated by the user in
the past
• A popular choice of profile representation is vector of terms weighted
based on TF*IDF
• Represent the item in the same format
• A news item can be represented using (TF*IDF) term vector
• For movies, books one needs to get sufficient metadata to represent the
item in vector format
• Define a similarity measure to compute the similarity between the
profile and the item
• Popular choice is cosine similarity
• Advance machine learning techniques can also be applied to do the
matching
• Recommend most similar items
94 22 January 2008
95. Problems with content based
recommendation
• Knowledge engineering problem
• How do you describe multimedia, graphics, movies,
songs
• Recommendation shows limited diversity
• New user problem
• It requires large number of ratings from the user to generate
quality recommendation
95 22 January 2008
96. Collaborative filtering
• Recommends items that are liked in the past by other users
with similar tastes
• Quite popular in e-commerce sites, like Amazon, eBay
• Can recommend various media types, text, video, audio,
Ads, products
96 22 January 2008
98. Advantages
• Does not have the knowledge engineering
problem
• Both user and items can be represented using just ids
• Often recommendation shows good amount of
diversity
98 22 January 2008
99. Lets learn C.F. with an example
Ran Casablanca Ben Tomb Raider MI -II Air Force
Hur One
Jane 5 5 ? 2
Bill 2 3 4
Tom 2 2 5 5
Cathy 3 3 1 1
What rating Jane will possibly give to MI – II?
99 22 January 2008
100. Normalizing the ratings
• All users won’t give equal rating even if they all equally liked/disliked an
item
• Normalize rating r = ru ,i − ru
Ran Casablanca Ben Tomb MI -II Air Force
Hur Raider One
Jane 1 1 -2
Bill -1 0 1
Tom -1.5 -1.5 1.5 1.5
Cathy 1 1 -1 -1
100 22 January 2008
101. Similarity between users
• Who are the other users with similar taste like Jane
• Each row of the matrix is a vector representing the user
• Compute cosine similarity between the users
Bill Tom Cathy
Jane -0.289 -0.612 0.816
101 22 January 2008
102. Compute probable rating
• Possible rating is the rating given by the other users weighted by the
similarity
• Sometimes only top N similar users are taken
∑ sim(u, v )∗ (r − r )
v∈V
v ,i v
ru ,i = ru +
∑ sim(u, v )
v∈V
Jane will rate MI-II as (−0.289 × 1) + (−0.612 × 1.5) + (0.816 × −1)
• = 4+
0.289 + 0.612 + 0.816
≈ 2.82
102 22 January 2008
103. Remarks
• There is another popular version of the above technique
where instead of user to user similarity item to item
similarity is computed
• Rating prediction is based on the similarity to the items rated by the
user
• The above mentioned methods are known as memory
based techniques
• It has the disadvantage that it require more online
computations
103 22 January 2008
104. Model based technique
• A model is learnt using the collection of ratings as training
set
• Prediction is done using the model
• More offline computing and less online computing
104 22 January 2008
105. Model based technique
• A simple model
ru ,i = E (ru ,i ) = ∑ r × Pr (r
u ,i = r | ru , s′ , s ′ ∈ I u )
r∈R
105 22 January 2008
106. Model based technique
• Recent research tries to model the recommendation process
with more complex probabilistic models
u z
i r
P(r | u , i ) = ∑ P(r | z, i )× P(z | u )
z
• Parameters P(r|z,i) and P(z|u) can be estimated using EM
algorithm
106 22 January 2008
107. Problems of C.F.
• New user problem
• New Item problem
• Sparsity problem
• A user rates only a few items
• Unusual user
• User whose tastes are unusual compared to the rest of
the population
107 22 January 2008
108. Hybrid approaches
- Combining Collaborative and Content based methods
• Combining predictions of Content based method
and C.F.
• Implement separate content based and collaborative
filtering method
• Combine their predictions using
• Linear combination
• Voting schemes
• Alternatively select a prediction method based on some
confidence measure on the recommendation
108 22 January 2008
109. Hybrid Approaches
• Adding content based characteristics into a C.F.
based method
• Maintain a content based profile for each user
• Use these content based profiles (not the commonly
rated items) to compute the similarity between users
• Then do C.F.
• Helps to overcome sparsity related problems as
generally not many items are commonly rated two users
109 22 January 2008
110. Hybrid approaches
• Adding C.F. characteristics into a content based
method
• Most popular techniques in this category is
dimensionality reduction on a group of content based
profiles
• Dimensionality reduction technique like LSA can improve
prediction quality by having compact representation of
profile
110 22 January 2008
111. Future directions of research (Adomavicious et
al)
• Incorporating richer user and item profile in a unified
framework of different methods
• Using contextual information in recommendation
• Example: Recommending a vacation package the system should
consider
• User
• Time of the year
• With whom the user plans to travel
• Traveling conditions and restrictions at the time
• Multi-Criteria ratings
• E.g. three criteria restaurant ratings food, décor and service
111 22 January 2008
112. Future directions of research
• Non-intrusiveness
• Flexibility
• Enabling end-users to customize recommendation
• Evaluation
• Empirical evaluation on test data that users choose to
rate
• Items that users choose to rate are likely to be biased
• Economics-oriented measures
112 22 January 2008
113. References (Recommender System)
• Adomavicius, G., and Tuzhilin, A., “Toward the Next
Generation of Recommender Systems: A Survey of the
State-of-the-Art and possible Extensions”, IEEE
Transaction on Knowledge and Data Engineering, 2005
113 22 January 2008
115. Topic Outline
• Why use semantic information?
• Introduction to Ontology
• Formal Specification of an Ontology
• A Quick Overview of Semantic Web
• Techniques and Approaches
• Word Sense Disambiguation
• Semantic Profiles
• Constrained Spreading Activation
• Semantic Similarity
• Looking Ahead
115 22 January 2008
116. News Example Revisited
• g1: Google Gets Green Light from FTC for DoubleClick Acquisition
• g2: Google Closes In on DoubleClick Acquisition
• g3: FTC clears Google DoubleClick deal
• g4: US regulator clears DoubleClick deal
• g5: DoubleClick deal brings greater focus on privacy
• e1: EU Agrees to Reduce Aviation Emissions
• e2: Aviation to be included in EU emissions trading
• e3: EU wants tougher green aviation laws
•
116 22 January 2008
117. News Example Modified
• g1: Apple Gets Green Light from FTC for TripleClick Acquisition
• g2: Apple Closes In on TripleClick Acquisition
• g3: FTC clears Apple TripleClick deal
• g4: US regulator clears TripleClick deal
IT company
Google
• g5: TripleClick deal brings greater focus on privacy
Acquisition
Acquisition
• e1: EU Agrees to Reduce Aviation Emissions
• e2: Aviation to be included in EU emissions trading
• e3: EU wants tougher green aviation laws
• f1: Apple prices soaring high.
• f2: Increased apple rates causes concern to doctors.
• f3: Cost of 10 kg of apple to become Rs 1000 from 1 Feb.
117 22 January 2008
118. Semantics for Personalization
Profile
Represent Search query,
Representation Content
Profiles as news, video,
using domain …
meaningful
concepts
concepts
Explicit
and User
Implicit info Profile Profile
Data Profile to Content
Collection Constructor Matching
Semantics
based
Matching
Function
Implicit Expand the
User Personalized services
Cluster
Documents Info based generated
documents
on domain profile using
based on
knowledge domain info
better User
118 22 January 2008 groups
119. Techniques and Approaches
1. Implicit Information based on domain knowledge
• Word Sense Disambiguation
2. Represent Profiles as meaningful concepts
• Semantic Profiles
3. Semantics based Matching Function
• Semantic Distance
4. Expand the generated profile using domain info
• Constrained Spreading Activation
5. Cluster documents based on better User groups
• Social Semantic Networks
119 22 January 2008
120. Word Sense disambiguation
Animal
Using Wordnet Transport
Mammal Vehicle
Hyponyms
Meronyms
Carnivore Motor Vehicle
tail
Accelerator
fur
Feline Automobile Door
nail
contains Bumper
Big cat Car Wheel
type of Synonyms
Jaguar Panther Jaguar
same as
120 22 January 2008
121. Word Sense disambiguation
Abstract
entity
entity
group substance
employee
organization Advisory board solid
stocks eat
animal
institution food plant
Revenue ripe
Business Acquisition tree
company Sales tax fruit plant
skin
….. seed
Apple Apple pulp
KEY: Additional domain information
121 22 January 2008
122. Three level Conceptual Network
• Domain Ontology
• Co-occurrence
• synonyms
• hyponyms
• ..
• Hyperlinks
• Order of access
• Browsed together
•…
122 22 January 2008
124. Views on Ontologies
TopicMaps Front-End
Thesauri
Navigation
Taxonomies Information Retrieval
Query Expansion Sharing of Knowledge
Queries
Ontologies Semantic Networks
Consistency Checking
EAI
Mediation
Reasoning
Extended ER-Models
Predicate Logic
Back-End
124 22 January 2008
125. Structure of an Ontology
Ontologies typically have two components:
• Names for important concepts in the domain
• Elephant is a concept whose members are a kind of
animal
• Herbivore is a concept whose members are exactly
those animals who eat only plants or parts of plants
• Background knowledge/constraints on the
domain
• No individual can be both a Herbivore and a
Carnivore
125 22 January 2008
126. A Simple Ontology
Object
Is a Is a
knows Described in
Person Topic Document
writes
Is a
Student Researcher Semantics Ontology
Is a
similar
PhD Student
Described in Is about
Topic Document Document Topic
Is about
writes knows
Person Document Topic Person Topic
126 22 January 2008
127. Defining Ontology
[Gruber, 1993]
An Ontology is a
formal specification Ø Executable
of a shared Ø Group of persons
conceptualization Ø About concepts
of a domain of interest. Ø Application & “unique truth”
•Formal description of concepts and their relationships
•Strong Basis in the family of First Order Logics (DL)
•Deductive Inference based on ground truth of the domain.
127 22 January 2008
129. The Semantic Web Vision
Semantic web aims to transform WWW into a global database
“The semantic web is a web
for computers”
129 22 January 2008
130. Semantic web
Make web resources more accessible to automated processes
• Extend existing rendering markup with semantic markup
• Metadata annotations that describe content/funtion of web
accessible resources
• Use Ontologies to provide vocabulary for annotations
• “Formal specification” is accessible to machines
• A prerequisite is a standard web ontology language
• Need to agree common syntax before we can share semantics
• Syntactic web based on standards such as HTTP and HTML
130 22 January 2008
131. Semantic Web Layers
Context for
vocabulary
Globally
User definable, Unambiguous
domain specific Identifiers
markup
131 22 January 2008
132. What is RDF ?
• RDF – resource description framework
• RDF is a data model
• Statement-based approach
• Subject/predicate/object triples – simple powerful unit
• All resources identified by URIs
• Triples create a directed labelled graph of
• object/attribute/value
• (semantic) relationships between objects
• RDF model is an abstract layer independent of XML
• XML serialization is supported
132 22 January 2008
133. RDF Example
resource value
../presentation.ppt property
dc:creator
dc:date dc:description
people.com/../dave_reynolds
Some starter slides…
org:email
2005-09-23
mailto:dave.reynolds@hp.com
<rdf:Description rdf:about=“allppt.com/presentation.pptquot;>
<dc:creator resource=“people.com/person/dave_reynoldsquot;/>
</rdf:Description>
Enables easy merge of information
<rdf:Description rdf:ID=“people.com/person/dave_reynoldsquot;>
• Indirect metadata (anyone can say anything about anything)
<org:email resource= “mailto:dave.reynolds@hp.com” />
• Extensibility (open world assumption, compositional)
</rdf:Description>
133 22 January 2008
134. RDF Schema
• Defines small vocabulary for RDF:
• Class, subClassOf, type
rdfs:Resource
• Property, subPropertyOf
rdfs:subClassOf
• domain, range
Veh: MotorVehicle
• Vocabulary can be used to define other
vocabularies for yourrdfs:subClassOf
application domain
Veh: Van Veh: Truck
Veh: PassengerVehicle
rdfs:subClassOf
Veh: MiniVan
134 22 January 2008
135. OWL – Web Ontology Language
• A language to express an ontology
• An OWL ontology is an RDF graph
• A set of RDF triples
• Vocabulary Extension
Domain Restrictions/Truth
• Structure
• Ontology headers Important Concepts of the Domain
• Class Axioms
• Class Descriptions, Enumeration, Membership Restrictions
• Property Axioms
• Property Descriptions, Property Restrictions, Functional Spec
• Facts about individuals
135 22 January 2008
137. The Syntax
Parent = Person with at least one child
<owl:Class rdf:ID=“Parent”>
<owl:intersectionOf >
<owl:Class rdf:about=quot;#Personquot;/>
<owl:Restriction>
<owl:onProperty rdf:resource=quot;#hasChildquot;/>
<owl:minCardinality>1</owl:minCardinality>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
137 22 January 2008
139. SPARQL
• RDF Query Language
• Triples with unbound variables
• Protocol
• HTTP binding
• SOAP binding
• XML Results Format
• Easy to transform (XSLT, XQuery)
139 22 January 2008
140. Why Ontologies?
• Enable formalisation of user preferences
• Common underlying, interoperable representation
• Public vocabulary agreed & shared between different systems
• Better content matching & sharing across applications
• User interests can be matched to content meaning
• Using conceptual reasoning
• Richer, more precise, less ambiguous than keyword-based
• Provides adequate grounding for hierarchical representation
• coarse to fine-grained user interests
• Formal, computer processable meaning on the concepts
140 22 January 2008
142. Semantic Profiles
• User Profile as concepts
• Books, Clothes and Soccer Web
pages
visited by
Top the user
Shopping Science Sports ..… ..…
W=2 W=0 W=1
Books Clothes Soccer Cricket
W=1 W=1 W=1 W=0
How do we map documents/users to concepts?
142 22 January 2008
143. Building concept profiles based on ODP
The Machine Learning Approach
ODP
Training ODP
categories +
classifier
documents
Step 1: Build ODP classifier for selected ODP categories
User web ODP ODP
pages classifier concepts
Step 2: Use user data and ODP
classifier to build the user profile
Add to
profile
143 22 January 2008
145. Using Wikipedia to map documents to
concepts
Item: “Sony to slash PlayStation3 price”
Term vector Representation: <sony:1>,<slash:1>, <playstation3:1>,<price:1>
Item: “Jittery Sony Knocks $100 Off PS3 Price Tag”
Term vector Representation: <jittery:1>, <sony:1>, <knocks:1> <ps3:1>,<price:1>, <tag:1>
Additional features: titles of the retrieved
articles
query 1. PlayStation Network Platform
2. PlayStation 2
3. Ducks demo
4. PlayStation 3
Sony to slash PlayStation3 price 5. PlayStation
6. Ken Kutaragi
7. PlayStation Portable
8. Console manufacturer
9. Sony Group
Index of Wikipedia dump 10. Crystal Dynamics
11. PlayStation 3 accessories
12. …
13. …
A Search Approach
145 22 January 2008
146. Profile: Words Vs Concepts
TF * IDF based user profile Wikipedia Based user profile
Search Text Retrieval Conference
Home HTML element
Help Bank of America
News Google search
Privacy ICICI Bank
Google IDBI Bank
Terms Bank fraud
New Artificial neural network
Page Web crawler
Use Web design
Web Debit card
View Extensible Markup Language
Results Hewlett-Packard
Information Microsoft
Account XHTML
Demand account
146 22 January 2008
147. Semantic Profiles
• Vector of weights – representing the intensity of user
interest for each concept (-1 to 1)
• Content also described by a set of weighted concepts
(0 to 1)
• Concept Profiles: Can express fine grained interests
• Interest in atheletes who have won a gold medal
• Interest in IT companies which have acquired atleast 3
companies in the last one year
• Only movies with either Amitabh or Sharukh
147 22 January 2008
151. Constrained Spreading Activation
• Cannot
take ‘all’ related data
• Commonly used SA models
• Distance Constraint
• Fan-out Constraint
• Path Constraints
• App dependent inference rules
• Type of relationship
• Preferential paths
• Activation Constraint
• Threshold function at each single node level
151 22 January 2008
152. Learning preferences using semantic links
Two main ways of updating Concept History Stack
1. Interest Assumption Completion
• Add more potential user interests
• Based on Hierarchical relationships
• Threshold on value of pseudo-occurrence for insertion
• Nocc (C supertype) = γ * Nocc (C subtype)
where γ < 1 is determined empherically
• Based on Semantic relationships
• All related concepts such that ∃ prop p, p (C, C related)
• Pseudo-occurrence Nocc (Crelated) = αi* Nocc (C)
152 22 January 2008
153. Learning preferences using semantic links
(contd)
2. Preference update by expansion
• Re-weighting over time
• Wnew (Crelated) = Wold (Crelated) + βi * Wnew(C)
• βI – Semantic Factor that depends on the level of semantic proximity
• Directly part of definition (Tbox)
• Related through inferred transitive relation (# such links matter)
• Notion of Semantic distance
153 22 January 2008
155. Similarity/Matching
• Cosine similarity
•U represents user preference
•D represents content object
•Dimension: #concepts in the ontology
similarity ( U, D )= cos (U ,D) =
U• D
=
∑ U ×Di i
U × D
2 2
∑U ∑D
i
2
i
2
155 22 January 2008
156. Semantic distance d(x,y,c)
• Semantic distance between 2 nodes x and y is defined
with respect to a concept, c
• Example: a black cat and an orange cat
• very similar as instances of the category Animal, since their
common catlike properties would be the most significant for
distinguishing them from other kinds of animals.
• But in the category Cat, they would share their catlike properties
with all the other kinds of cats, and the difference in color would be
more significant.
• In the category BlackEntity, color would be the most relevant
property, and the black cat would be closer to a crow or a lump of
coal than to the orange cat.
156 22 January 2008
162. Contextual Personalisation
• Finer,
qualitative, context sensitive activation of
user pref
• Notion of a Semantic Runtime Context
• Representation: Vector of concept weights
• Fuzzy semantic intersection between user
preferences and runtime context
• Using Constrained spreading activation
162 22 January 2008
163. Semantic Social Networking
• Identify hidden links between users
• Similarity between user preferences
• Collaborative Recommender systems
• Use of Global Preferences not correct
• Partial & Strong Similarities are very useful
• Eg: Coinciding interest in cinema but drastically different
in sports
163 22 January 2008
165. Microformats
Metadata
Social links Geo
Outline hResume
adr
Licensing
tags
http://microformats.org
• Microformats are small bits of HTML that represent things like people, events,
tags, etc. in web pages.
• Building blocks that enable users to own, control, move, and share their data on
the Web.
• Microformats enable
• publishing of higher fidelity information on the Web,
• the fastest and simplest way to support feeds and APIs for your website.
165 22 January 2008
167. eRDF
• A subset of RDF embedded into XHTML or HTML by using
common idioms and attributes.
• No new elements or attributes have been invented and the
usages of the HTML attributes are within normal bounds.
• This scheme is designed to work with CSS and other HTML
support technologies.
• HTML Embeddable RDF.
• all HTML Embeddable RDF is valid RDF, not all RDF is
Embeddable RDF
167 22 January 2008
168. GRDDL
• Gleaning Resource Descriptions from Dialects of
Languages
• Obtaining RDF data from XHTML pages
• Explicitly associated transformation algorithms
(XSLT)
168 22 January 2008
169. Acknowledgements
• Self-tuning Personalized Information Retrieval in an
Ontology-Based Framework, Pablo Castells, Miriam
Fernández, David Vallet, et al, OTM Workshop 2005
• An Approach for Semantic Search by Matching RDF Graphs,
Haiping Zhu, Jiwei Zhong, Jianming Li and Yong Yu
• Semantic Web Tutorials
169 22 January 2008
170. Concluding Remarks
• Personalization: An upcoming area of technology
• Personalization aims at faster access to information to improve
user productivity
• Server-side Vs Client-side personalization
• Technologies
• Machine Learning techniques
• Semantic Web
• New Markup Languages
• Challenges
• Understanding the user behaviour, intentions, likes, …
• Relating human edited content to the profile
170 22 January 2008