Harvesting Intelligence from User Interactions

1,426 views

Published on

Published in: Education, Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,426
On SlideShare
0
From Embeds
0
Number of Embeds
216
Actions
Shares
0
Downloads
2
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Harvesting Intelligence from User Interactions

  1. 1. HARVESTING INTELLIGENCEFROM USER INTERACTIONS Rajendra Akerkar
  2. 2. Outline What is collective intelligence? B i technical concepts behind collective intelligence Basic h i l b hi d ll i i lli Many forms of user interaction Example of how user interaction is converted into intelligence
  3. 3. Web users are undergoing a transformation… Users are expressing themselves. This expression may be in the form of:  sharing their opinions on a product or a service through reviews or comments; through sharing and tagging content; through participation in an online community; or by contributing new content.  This increased user interaction and participation gives rise to data that can be converted into intelligence in your application. The use of intelligence to personalize a site for a user, to aid him in searching and making decisions, and to make the application more sticky are cherished goals that web applications try to f lf ll h b l fulfill.
  4. 4. Wisdom of the Crowd “Under the right circumstances, groups are extraordinarily intelligent, and are often smarter than the smartest people in them ” them. “If the process is sound, the more people you involve in solving a problem, the better the result will be.” problem be A crowd’s collective intelligence will produce better results than those of a small group of experts if four basic conditions are met.
  5. 5. The “wise crowds” are valuable when they’recomposed of individuals who… Have diverse opinions; Wh When the i di id l aren’t afraid to express their h individuals ’ f id h i opinions; When there’s diversity in the crowd; and h h ’ d h d d When there’s a way to aggregate all the information and use it in the decision-making process.
  6. 6. Collective intelligence To effectively use the information provided by others to improve one’s application. When a group of individuals collaborate or compete with each other, intelligence or behavior that otherwise didn t didn’t exist suddenly emerges
  7. 7.  A user may be influenced by other users either directly or through intelligence derived from the applications by mining the data.
  8. 8. Collective intelligence of users is The intelligence that’s extracted out from the collective set of interactions and contributions made by users. The use of this intelligence to act as a filter for what’s valuable in your application for a user —This filter takes into account a user’s preferences and interactions to provide relevant information to the user. Th There are a huge number of ways this information can h b f thi i f ti be processed and interpreted
  9. 9. To apply collective intelligence in yourapplication. You need to1. allow users to interact with your site and with each other, learning about each user through their interactions and contributions contributions.2. aggregate what you learn about your users and their contributions using some useful models models.3. leverage those models to recommend relevant content to a user user.
  10. 10. Three components to harnessing intelligence:1 – Allow users to interact, 2 – Learn about your users in aggregate, aggregate 3 – Personalise content using user interactions data and aggregate data.
  11. 11. Sources of information Content-based—  based on information about the item itself, usually keywords or phrases occurring in the item. k d h i i th it Collaborative-based—  based on the interactions of users.
  12. 12. Algorithms for applying Intelligence correlate users with content and with each other,  need a common language to compute relevance between items, b t it between users, and between users and items. db t d it Content-based relevance is anchored in the content itself… (i f it lf (information retrieval systems) ti t i l t ) Collaborative-based relevance leverages the user interaction data to d t t meaningful relationships i t ti d t t detect i f l l ti hi Unstructured text: to understand how metadata can be developed f d l d from unstructured text t t dt t
  13. 13. Abstracting types of content  applications  li ti users and items  I Items ?  social-networking: user is also a type of l f itemMetadata  professionally developed keywords, user-generated tags, keywords extracted by an algorithm after analyzing the text, ratings, popularity ranking etc. Profile based Profile-based and user-action based data user actionMetadata as a set of attributes that help qualify an item.
  14. 14. Sources for generating metadata about an item users and items having an associated vector of metadata attributes. The similarity or relevance between two users or two items or a user and item  measured by looking at the similarity between the two vectors vectors.
  15. 15. Representing user information
  16. 16. Users provide a rich set of information
  17. 17. Generating intelligence Content-based analysis and collaborative filtering  to build a representation for the content • Terms or phrases • Terms are converted into their basic form by a process known as stemming. Terms with their associated weights (term vectors), then represent the metadata associated with the text. Similarity between two content items is measured by measuring the similarity associated with their term vectors.  to use the information provided by the interactions of users to predict items of interest for a user • to match a user’s metadata to that of other similar users and recommend items liked by them (Language independent methods) • E.g. users rate items, so CF approach find patterns in the way items have been rated by the user and other users to find additional items of interest for a user • Amazon, Netflix, and Google
  18. 18. Collaborative filtering Memory-based and model-based  a similarity measure is used to find similar users and then make a prediction using a weighted average of the ratings of the similar users  to build a model for prediction using a variety of approaches: linear algebra, probabilistic methods, neural networks, clustering, latent classes, and so on A collaborative filtering algorithm usually works by g g y y searching a large group of people and finding a smaller set with tastes similar to yours. Collecting Preferences Recommending It R di Items Matching Products Item-Based Filtering
  19. 19. Harnessing Collective Intelligence to transformfrom content-centric to user-centric applications Prior to the user-centric revolution, many applications put little emphasis on the user. These applications, known as content-centric applications, content centric applications focused on the best way to present the content and were generally static from user to user and from day to day. User-centric applications leverage Collective Intelligence to fundamentally change how the user interacts with the Web application application. User-centric applications make the user the center of the web experience and dynamically reshuffle the content based on what’s known about the user and what the user explicitly asks for.
  20. 20. User-centric applications are composed of thefollowing four components Core competency: The main reason why a user comes to the application. Community: Connecting users with other users of interest, social networking, finding other users who may p provide answers to a user’s q questions. Leveraging user-generated content: Incorporating generated content and interactions of users to provide additional content to users. Building a marketplace: Monetizing the application by product and/or service placements and showing relevant advertisements.
  21. 21. Classifying User – Generated Information
  22. 22. Concept of a dataset  Densely populated dataset • It has more rows than columns • The dataset is richly populated Clustering & build a predictive model  E E.g. similar users according to age and/or sex might be a i il di d/ i h b good predictor of the number of minutes a user will spend on the site • age  a good predictor • the number of minutes spent is inversely proportional to the age • a simple linear model • minutes spent = 50 – age of user
  23. 23. Concept of a dataset • Set of users viewed any of the videos on our site within the timeframe High-dimensional, sparsely populated  a generalization of th term vector representation li ti f the t t t ti  This representation is useful to find similar users and is known as the User Item matrix  users are represented as rows  the total number of videos represented as columns  Properties: more rows than columns, richly populated
  24. 24.  Users are represented as columns  the videos as rows Users who have viewed this video have also viewed these other videos Properties  number of columns is large,  sparsely populated with nonzero entries in a few columns  multidimensional vector
  25. 25. Forms of user interaction need to quantify the quality of the interaction R i Rating and voting interaction d i i i  explicit in the user’s intent  way of getting feedback on how well the user liked the item  is quantifiable and can be used directly  Voting is similar to rating. However, a vote can have only g g , y two values—1 for a positive vote and -1 for a negative vote  interactions such as using clicks are noisy—the intent of the user isn’t known perfectly and is implicit
  26. 26. Persistence ofratings Entities:  User & Items user_item_rating is a mapping table that has a composite key, consisting of the user ID and content ID  The cardinality between the entities show that • Each user may rate 0 or more items. • Each rating is associated with only one user. • An item may contain 0 or more ratings. • Each rating is associated with only one item. digg.com, allows users to contribute and vote  What are the top 10 rated items? on interesting articles
  27. 27. Forwarding a lin Similar to voting, forwarding the content to others can be considered a positive vote for the item by the user
  28. 28. Bookmarking and saving By bookmarking URLs, a user is explicitly expressing interest in the material associated with the bookmark. URLs that are commonly bookmarked bubble up higher in the site.
  29. 29. Purchasing items users purchase items  casting an explicit vote of confidence in the item  Amazon (Item-to-Item recommendation engine)  Users that buy similar items can be correlated  Items that have been bought by other users can be recommended to a user …
  30. 30. Click-stream news.google.com personalisation When a list of items is presented to a user … …  positive vote  item clicked  Looking at whether an item was visited and the time spent on it provides useful t id f l information.  You can also gather useful statistics from this data: • ■ What is the average time a user spends on a particular item? • ■ For a user, what is the average time spent on any given article?
  31. 31. Reviews Opinions and tastes are often expressed through reviews and recommendations. These have the greatest impact on other users when  They’re unbiased  The reviews are from similar users  They’re from a person of influence Just like voting for articles at Digg, other users can endorse a reviewer or vote on his reviews
  32. 32. Converting user interaction into intelligence User interaction gets converted into a dataset for learning.  three users who’ve rated photographs a number of ways to transform raw ratings from users into intelligence  aggregate all the ratings about the item and provide the average • to create a Top 10 Rated Items list • constantly promoting the popular content
  33. 33.  Given the set of data, we answer two questions in our example:  What are the set of related items for a given item?  For a user, who are the other users that are similar to the user? Three approaches: • cosine-based similarity, • correlation-based similarity, and • adjusted-cosine-based s a ty adjusted cos e based similarity.
  34. 34. Cosine-based similarity computation  takes the dot product of two vectors  to learn about the photos, we transpose the matrix photos a row corresponds to a photo while the columns (users) correspond to dimensions that describe the photo  normalize the values for each of the rows  by dividing each of the cell entries by the square root of the sum of the squares of entries in a particular rowThe similarity between Photo 1 and Photo 2 is computed as(0.8018 * 0.7428) + (0.5345 * 0.3714) + (0.2673 * 0.557) = 0.943
  35. 35.  Item-to-item similarity table Wh t are the set of related items for a given item? What th t f l t d it f i it ?  According to the table, Photo1 and Photo2 are very similar. To determine similar users,  associated with each user is a vector, where the rating associated with each item corresponds to a dimension in the vector  analysis process is similar to calculating the item-to-item similarity table User-to-user similarity table
  36. 36. Intelligence from other forms of userinteractions How other forms of user-interaction get transformed into metadata? Approaches  content-based and  collaboration-based
  37. 37. content-based approach metadata is associated with each item Thi t This term vector could b created b analyzing th content of t ld be t d by l i the t t f the item or using tagging information by users The term vector consists of keywords or tags with a relative weight associated with each term.  As the user saves content, visits content, or writes recommendations, she i h i the metadata associated with each d i h inherits h d i d ih h
  38. 38. Collaboration-based approach analysis of data collected by bookmarking, saving an item, recommending an item  a sparsely populated dataset What are other items that have been bookmarked by other users who bookmarked the same articles as a specific user?  When the user is John, the answer is Article 3 — Doe has bookmarked Article 1 and also Article 3. What are the related items based on the bookmarking patterns of the users?
  39. 39. Collaboration-based approach Here useful to invert the dataset:  The users correspond to the dimensions of the vector for an article.  Similarities between two items are measured by computing the dot product between them  The normalized matrix is  The item-to-item similarity matrix based on this data is LEARNING: if someone bookmarks Article 1, you should recommend Article 3 to the user, user and if the user bookmarks Article 2, you should also recommend Article 3
  40. 40.  A similar analysis can be performed by using information from the items the user  saves,  purchases, and  recommends. You can further refine your analysis by associating  data only from users that are similar to a user based on user-profile information.
  41. 41. Summary Metadata associated with users and items can be used to derive intelligence in the form of building recommendation engines and predictive models for personalization, and for enhancing search.
  42. 42. References S. Alag, Collective intelligence in action, Manning, 2009 H. Marmanis, D. Babenko , Algorithms of the Intelligent Web, , g g , Manning, 2009 T. Segaran , Programming Collective Intelligence: Building Smart Web 2.0 Applications O’Reilly 2 0 Applications, O Reilly Wang, Jun, Arjen P. de Vries, and Marcel J.T. Reinders. Unifying User- based and Item-based Collaborative Filtering Approaches by Similarity Fusion. 2006 Fusion 2006. http://ict.ewi.tudelft.nl/pub/jun/sigir06_similarityfuson.pdf
  43. 43. Thank you! y Rajendra Ak k R j d AkerkarVestlandsforsking, Sogndal, NORWAY E mail: E-mail: rak@vestforsk.no URL: www.tmrfindia.org/ra.html

×