Kdd for personalization

1,104 views

Published on

tutorial on KDD for personalization

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,104
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Kdd for personalization

  1. 1. KDD for Personalization PKDD 2001 Tutorial September 6, 2001Bamshad Mobasher - DePaul University, ChicagoBettina Berendt - Humboldt University BerlinMyra Spiliopoulou - Leipzig Graduate School of Management Web Personalization • The Problem – dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests • Personalization v. Customization – In customization, user controls and customizes the site or the product based on his/her preferences – usually manual, but sometimes semi-automatic based on a given user profile – Personalization is done automatically based on the user’s actions, the user’s profile, and (possibly) the profiles of others with “similar” profiles PKDD 2001 Tutorial: “KDD for Personalization” [I-2] [2]
  2. 2. Customization Example my.yahoo.com my.yahoo.comPKDD 2001 Tutorial: “KDD for Personalization” [I-3] [3] Personalization Example amazon.com amazon.comPKDD 2001 Tutorial: “KDD for Personalization” [I-4] [4]
  3. 3. A simplified scheme for personalization what kind? selects - document etc. - query user how? information object(s) - request, specification - rating related to why? - similarity (syntactic/semantic) - co-occurrence in other users´ navigation histories - co-occurrence in user´s other navigation histories system recommends other information object(s)PKDD 2001 Tutorial: "KDD for Personalization" [I-5] ÃÒÓÛ Ì Ý Ù×ØÓÑ Ö ÃÒÓÛÐ × ÈÓÛ Ö Ê Ð Ø ÓÒ× Ô× × ÓÒ Ù×ØÓÑ Ö Ò× Ø ÔÖÓÔ Ð Ò ÓÖ Ò Þ Ø ÓÒ ÖÓÑ × ÑÔÐÝ ØÖ ØÒ Ù×ØÓÑ Ö× ÒØÐÝ ØÓ ØÖ ØÒ Ø Ñ Ö Ð ØÚ ØÓ Ø Ö Ò ×¸ ÔÖ Ö Ò ×¸ Ò Ú ÐÙ ÔÓØ ÒØ Ðº º º º ÃÒÓÛ Ò Ø Ù×ØÓÑ Ö × Ô Ö ÑÓÙÒØ Ò ØÓ Ý³× Ñ Ö ØÔÐ Û Ö Ø Ù×ØÓÑ Ö × ÑÓÖ ÓÔØ ÓÒ׸ Ö Ø Ö Ü Ð ØÝ Ò Ö ÜÔ Ø Ø ÓÒ׺ ººº ÂÓ Ò º Æ × ´ ÒØÙÖ µ ÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-6]
  4. 4. Ù×ØÓÑ Ö ÒÓÛÐ ÑÔÐ × ½ºµ ÕÙ × Ø ÓÒ Ó Ù×ØÓÑ Ö Ø ¾ºµ Ò ÐÝ× × Ó Ù×ØÓÑ Ö Ø ¿ºµ Ø ÓÒ Ò ÓÖ Ò ÛØ Ø Ò Ò× Ø×Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-7] ÕÙ × Ø ÓÒ Ó Ù×ØÓÑ Ö Ø Ù×ØÓÑ Ö Ø Ö Ö ÓÖ Ò × Ó ¯ ÔÖ Ö Ò × ¯ ØÖ Ò× Ø ÓÒ× ¯ ÔÖ ¹× Ð × ÓÒØ Ø× ¯ Ø Ö¹× Ð × ×ÙÔÔÓÖØ ¯ ÑÓ Ö Ô Ò ÓÖÑ Ø ÓÒ ËÓÑ Ó Ø × Ø ¬ ÑÝ ÔÙÖ × ÖÓÑ Ø Ö Ô ÖØ × ¬ ÑÝ Ð Ò ÑÙÐØ ÔÐ ×Ô Ö Ø Ø × × Ø Ø × ÖÚ ÓÑÔÐ Ø ÐÝ Ö ÒØ ÔÙÖÔÓ× × ¬ Ö Ó Ú ÖÝ Ò ÕÙ Ð ØÝ Û Ø Ö ×Ô Ø ØÓ ÖÖÓÖ Ö Ø ×¸ Ö Ð Ð Øݸ ÓÚ Ö ¸ Ö ÔÖ × ÒØ Ø Ú Ò ××   Ø ÈÖ Ô Ö Ø ÓÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-8]
  5. 5. Ò ÐÝ× × Ó Ù×ØÓÑ Ö Ø Ø Ò ÐÝ× × × ÓÙÐ ÔÖÓÚ ÓÒ ÕÙ ×Ø ÓÒ× Ð ¯ Ï Ù× Ö× Û ÐÐ ÓÑ Ù×ØÓÑ Ö× ¯ Ï Ù×ØÓÑ Ö× Û ÐÐ Ö ØÙÖÒ Ò ¯ Ï Ó × ÑÓÖ Ð ÐÝ ØÓ Ö ×ÔÓÒ ØÓ ÔÖÓÑÓØ ÓÒ Ø ÓÒ ¯ Ï Ó ÛÓÙÐ ÒØ Ö ×Ø Ò ÖÓ××¹× Ð »ÙÔ¹× Ð ×Ù ×Ø ÓÒ× ÐÓ× ÐÝ Ö Ð Ø ØÓ ÕÙ ×Ø ÓÒ× Ð ¯ Á× Ø Ï ¹× Ø ÔÔÖÓÔÖ Ø ÐÝ × Ò ØÓ × ÖÚ Ø ÓÖ Ò × Ø ÓÒ³× Ó Ð× ¯ Ö Ø Ù×ØÓÑ Ö× × Ø × ¯ Ö Ø Ù×ØÓÑ Ö× × Ø × ÒÓÙ ØÓ ÓÑ Ò ¯ Ö Ø Ù×ØÓÑ Ö× × Ø × ÒÓÙ ØÓ ÓÑ ÔÖÓÑÓØ Ö× Ó Ø ×Ø   Ø ÅÒÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-9] Ø ÓÒ Ò ÓÖ Ò Û Ø Ø Ò Ò× Ø× ¯ Ð ÒÑ ÒØ Ó Ø Ñ Ö Ø Ò ÔÓÐ Ý ¯ Ð ÒÑ ÒØ Ó Ø ×ÙÔÔÐÝ Ò¸ Ò ÐÙ Ò Ø Ö × Ð × ×ÙÔÔÓÖØ ¯ Ù×ØÑ ÒØ Ó Ø Û × Ø ¡ ×Ø Ø × Ø Ö ¹ × Ò ¡ ÖÓÛ× Ò »Æ Ú Ø ÓÒ ×Ù ×Ø ÓÒ× ¡ Ê ÓÑÑ Ò Ø ÓÒ× ÓÒ Ø Ô ¡ ÁÒØ ÐÐ ÒØ ×× ×Ø Ò ¡ È Ö×ÓÒ Ð Þ Ð ÝÓÙØ Ò ÓÒØ ÒØ Ø Ì Ø Ñ Ð ØÛ Ò Ò× Ø Ò Ø ÓÒ × ÓÙÐ Ñ Ò Ñ Þ ºÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-10]
  6. 6. Ì Ø ÓÒ × ÓÙÐ Ö Ø Ú ÐÙ ¯ ÓÖ Ø Ù×ØÓÑ Ö ¯ ÓÖ Ø ÓÖ Ò × Ø ÓÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-11] × ÓÖØ Ü ÙÖ× ÓÒ ÓÒ Ú ÐÙ Ö Ø ÓÒ ÁÒ ¾ ¹ ÓÑÑ Ö ¸ × ÒÓØ ×Ù ÒØ ØÓ ¯ Ó Ö Ò Ü ×Ø Ò ÔÖÓ Ù Ø Ø ÖÓÙ Ø ÁÒØ ÖÒ Ø ¯ Ø Þ Ô ÖØ» ÐÐ Ó Ø ÑÖ Ò ÞÒ Ò ¯ ÒØÖÓ Ù Ö ÐÐ ÒØ Ò Û ÔÖÓ Ù Ø Ò Ø ÑÖ Ø Ì ÔÖÓ Ù Ø ÑÙ×Ø Ö Ò Ú ÐÙ ØÓ ¯ ÛÒ Ø Ù×ØÓÑ Ö Ù×ØÓÑ Ö ÓÒÚ Ö× ÓÒ ¯ Ö Ø ÒØ Ù×ØÓÑ Ö Ù×ØÓÑ Ö Ê Ø ÒØ ÓÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [1-12]
  7. 7. Ì ÑÓ Ð Ó ÃÙ Ð Ò ÓÒ× Ö× Ø ÓÐÐÓÛ Ò ØÝÔ × Ó Ú ÐÙ ¿¾ ´½µ ÓÑÔ Ö Ø Ú ´¾µ ÑÔÖÓÚ Ò ÒÝ ´¿µ ÑÔÖÓÚ Ò Ø Ú ØÝ ´ µ ÒØ Ö Ø Ú ´ µ ÓÖ Ò × Ø ÓÒ Ð ´ µ ×ØÖ Ø ´ µ ÒÒÓÚ Ø ÚÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [1-13] ÖÓÑ ÕÙ × Ø ÓÒ ØÓ Ø ÓÒ ¯ Ì Ö × ÒÓ Ð Ó Ø º ¡ Ð ×ØÖ Ñ Ø ÙÑÙÐ Ø Ò ØÖ Ñ Ò ÓÙ× Ô º ¡ ÑÓ Ö Ô Ø Ò ÕÙ Ö º ¡ Ù×ØÓÑ Ö ÔÖÓ Ð × Ö Ú Ð Ð ÓÖ Ò ÕÙ Ö º ¯ Ì Ö × ÒÓ Ð Ó Ñ Ø Ó ÓÐÓ × ÓÖ Ø Ò ÐÝ× ×º ¯ Ì Ð ØÝ ØÓ ÜÔÐÓ Ø Ø Ø Ò Ö × × Ø ÑÙ ×ÐÓÛ Ö Ô Ò Ø ÒÙÑ Ö Ó Ô Ö×ÓÒ Ð Þ Ï × Ø × × ÒÓØ Ö ÐÐÝ Ð Ö º ¯ Ì ØÓÐ Ö Ð Ð Ô× ØÑ ØÛ Ò ÕÙ × Ø ÓÒ Ò Ø ÓÒ × ÐÓÛ ½ ºÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-14]
  8. 8. Personalization: An HCI perspective = does personalization increase usability? A Web site’s usability is high if users - achieve their goals / perform their tasks in little time, - do so with a low error rate, - experience high subjective satisfaction. Usability testing: - qualitative and quantitative methods - experts and "normal" users - questionnaires and experiments Usability is a special concern on the Web because unlike with other products / software, "users experience usability first and pay later". (Nielsen [49] [B12])PKDD 2001 Tutorial: "KDD for Personalization" [I-15] Data Preparation for Personalization PKDD 2001 Tutorial: “KDD for Personalization” [DP-1]
  9. 9. Web Usage Mining • Discovery of meaningful patterns from data generated by client-server transactions on one or more Web servers • Typical Sources of Data – automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies – e-commerce and product-oriented user events (e.g., shopping cart changes, ad or product click-throughs, etc.) – user profiles and/or user ratings – meta-data, page attributes, page content, site structure PKDD 2001 Tutorial: “KDD for Personalization” [DP-2]What’s in a Typical Server Log?<ip_addr><base_url> -- <date><method><file><protocol><code><bytes><referrer><user_agent> <ip_addr><base_url> <date><method><file><protocol><code><bytes><referrer><user_agent>203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:21 -0600] "GET /Calls/OWOM.htmlHTTP/1.0" 200 3942 "http://www.lycos.com/cgi-bin/pursuit?query=advertising+psychology&maxhits=20&cat=dir" "Mozilla/4.5 [en] (Win98;I)"203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:23 -0600] "GET/Calls/Images/earthani.gif HTTP/1.0" 200 10689 "http://www.acr-news.org/Calls/OWOM.html""Mozilla/4.5 [en] (Win98; I)"203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:24 -0600] "GET /Calls/Images/line.gifHTTP/1.0" 200 190 "http://www.acr-news.org/Calls/OWOM.html" "Mozilla/4.5 [en] (Win98; I)"203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:25 -0600] "GET /Calls/Images/red.gifHTTP/1.0" 200 104 "http://www.acr-news.org/Calls/OWOM.html" "Mozilla/4.5 [en] (Win98; I)"203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:31 -0600] "GET / HTTP/1.0" 200 4980"" "Mozilla/4.06 [en] (Win95; I)"203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/line.gifHTTP/1.0" 200 190 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/red.gifHTTP/1.0" 200 104 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/earthani.gifHTTP/1.0" 200 10689 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"203.252.234.33 www.acr-news.org - [01/Jun/1999:03:33:11 -0600] "GET /CP.html HTTP/1.0"200 3218 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"
  10. 10. The Web Usage Mining Process C ontent and S tructure D ata P re processing P attern D iscove ry P attern A n alysis R aw U sage P reprocessed "Interesting" R ules, P atterns, D ata C lickstream R ules, P atterns, and S tatistics D ata and S tatisticsPKDD 2001 Tutorial: “KDD for Personalization” [DP-4] Usage Data Preprocessing Raw Usage Data Data User/Session Page View Path Cleaning Identification Identification Completion Server Session File Episode Identification Usage Statistics Site Structure and Content Episode FilePKDD 2001 Tutorial: “KDD for Personalization” [DP-5]
  11. 11. Data Preprocessing for Web Usage Mining • Data cleaning – remove irrelevant references and fields in server logs – remove references due to spider navigation – remove erroneous references – add missing references due to caching (done after sessionization) • Data integration – synchronize data from multiple server logs – integrate e-commerce and application server data – integrate meta-data (e.g., content labels) – integrate demographic / registration dataPKDD 2001 Tutorial: “KDD for Personalization” [DP-6] Data Preparation for Web Usage Mining (Cooley, Mobasher, Srivastava, 1999 [15]) • Data Transformation – user identification – sessionization / episode identification – pageview identification • a pageview is a set of page files and associated objects that contribute to a single display in a Web Browser • Data Reduction – sampling and dimensionality reduction (ignoring certain pageviews / items) • Identifying User Transactions (i.e., sets or sequences of pageviews possibly with associated weights)PKDD 2001 Tutorial: “KDD for Personalization” [DP-7]
  12. 12. User and Session Identification: Need for Reliable Usage Data • Validity of results in Web usage mining is affected by the ability to: – distinguish among different users to a site – reconstruct the activities of the users within the site • Difficult to obtaining reliable usage data – proxy servers and anonymizers – rotating IP addresses connections through ISPs – missing references due to caching – inability of servers to distinguish among different visitsPKDD 2001 Tutorial: “KDD for Personalization” [DP-8] Identifying Users and Sessions • Server log L is a list of log entries each containing timestamp, host identifier, URL request (including URL stem and query), referrer, agent, cookie, etc. • User identification and sessionization – user activity log is a sequence of log entries in L belonging to the same user – user identification is the process of partitioning L into a set of user activity logs – the goal of sessionization is to further partition each user activity log into sequences of entries corresponding to each user visitPKDD 2001 Tutorial: “KDD for Personalization” [DP-9]
  13. 13. Sessionization Heuristics • Real v. Constructed Sessions – Conceptually, the log L is partitioned into an ordered collection of “real” sessions R – Each heuristic h partitions L into an ordered collection of “constructed sessions” Ch – The ideal heuristic h*: Ch* = R • Two Basic Types of Sessionization Heuristics – Time-oriented heuristics – Navigation-oriented heuristicsPKDD 2001 Tutorial: “KDD for Personalization” [DP-10] Time-Oriented Heuristics • Consider boundaries on time spent on individual pages or in the entire a site during a single visit – Boundaries can be based on a maximum session length or maximum time allowable for each pageview – Additional granularity can be obtained by treating different boundaries on different (types of) pageviews h1: Given t0, and a threshold θ, the timestamp for first request in a constructed session S, the request with timestamp t is assigned to S, iff t - t0 ≤ θ. h2: Given t1, and a threshold δ, the timestamp for a request in constructed session S, the next request with timestamp t2 is assigned to S, iff t2 - t1 ≤ δ.PKDD 2001 Tutorial: “KDD for Personalization” [DP-11]
  14. 14. Navigation-Oriented Heuristics • Take the linkage between pages into account – “linkage” can be based on site topology (e.g., split a session at a request that could not have been reached from previous requests in the session) – or can be usage-based (using referrers in log entries) • usually more restrictive than topology-based heuristics and more difficult to implement in frame-based sites href: Given two consecutive requests p and q, with p belonging to constructed session S. Then q is assigned to S, if the referrer for q was previously invoked in S, or if the referrer for q is “undefined” and tq - tp ≤ ∆ (time delay ∆ is to allow for proper loading of frameset pages).PKDD 2001 Tutorial: “KDD for Personalization” [DP-12] Measures for Sessionization Accuracy (Berendt, Mobasher, Spiliopoulou, 2001 [7]) • A heuristic h maps entries in the log L into elements of constructed sessions, such that: – (a) each entry in L is mapped to exactly one element of a constructed session – (b) the mapping is order-preserving • Measures quantify the successful mappings of real sessions to constructed sessions – a measure M evaluates a heuristic h based on the differences between Ch and R – each measure assigns to h a value M(h) ∈ [0,1] so that M(h*) = 1PKDD 2001 Tutorial: “KDD for Personalization” [DP-13]
  15. 15. Measures for Sessionization Accuracy • Categorical and Gradual Measures – categorical measures: based on the number of real sessions that are reconstructed by the heuristics – gradual measures: based on the degree to which the real sessions are reconstructed by the heuristicsPKDD 2001 Tutorial: “KDD for Personalization” [DP-14] Categorical Measures • Based on the notion of “complete reconstruction” – a real session is completely reconstructed if all its elements are contained in the same constructed session – the measure Mcr(h) is the ratio of the number of completely reconstructed real sessions in Ch to the total number of real sessions |R|PKDD 2001 Tutorial: “KDD for Personalization” [DP-15]
  16. 16. Categorical Measures • Derived categorical measures: – Mcrs considers only completely reconstructed real sessions whose first element is also the first element of a constructed session – Mcre considers only completely reconstructed real sessions whose last element is also the last element of a constructed session – Mcrse considers only completely reconstructed real sessions with correct starts and ends • in absence of overlapping real sessions for individual users, this gives the number of constructed sessions that are identical to corresponding real sessionsPKDD 2001 Tutorial: “KDD for Personalization” [DP-16] Gradual Measures • Allow for measuring partial overlaps between real and constructed sessions – degree of overlap between real sessions r and constructed session c, dego(r,c), is the number of elements they have in common divided by total number of elements in r. – degree of overlap for a real session r is the maximum dego(r,c) over all constructed sessions c. – the measure Mo(h) is the average degree of overlap over all real sessions – if a real session is completely reconstructed, its overlap degree is 1PKDD 2001 Tutorial: “KDD for Personalization” [DP-17]
  17. 17. Gradual Measures • To take the size of constructed session into account, we define the degree of similarity – degs(r,c) = | r ∩ c | / | r ∪ c | – Ms(h) is is the average degree of similarityt over all real sessions – if a real session is completely reconstructed, its similarity degree is 1PKDD 2001 Tutorial: “KDD for Personalization” [DP-18] Which Measures? • The choice of the measures depends on the goals of usage analysis, for example: – “complete reconstruction” may be appropriate for clustering and association-based analyses (it correctly shows set of pages accessed together) • it also preserves sequential order of accesses, so it can be used for the analysis of users’ navigational behavior – Mcrs: useful for analyzing access to entry points – Mcre: useful for analyzing access to exit points – overlap-based measures can be useful for comparing overall effectiveness of sessionization heuristics in grouping pages or objectsPKDD 2001 Tutorial: “KDD for Personalization” [DP-19]
  18. 18. Which Sessionization Heuristics? • The choice of sessionization heuristic depends on the characteristics of the data – if individual users visit the site in short but temporally dense sessions, h2 may perform better than h1 – in cases when timestamps are not reliable (e.g., using integrated data across many log files), href may be a better choice for sessionization – referrer-based heuristics tend to perform worse in highly dynamic, frame-based sitesPKDD 2001 Tutorial: “KDD for Personalization” [DP-20] Comparison of Sessionization Heuristics h1-30 h2-10 h-ref •• cookies used to identify cookies used to identify unique users unique users 1.00 •• server generated session server generated session 0.95 variable used to identify variable used to identify 0.90 “real” sessions “real” sessions 0.85 •• site was frame-based and site was frame-based and 0.80 highly dynamic highly dynamic 0.75 •• thresholds of 30 and 10 thresholds of 30 and 10 0.70 minutes were used for h1 minutes were used for h1 and h2, respectively and h2, respectively 0.65 •• href performed poorly, due href performed poorly, due 0.60 to propagated errors in to propagated errors in 0.55 misclassified frameset misclassified frameset 0.50 references references M_o M_crse M_cr M_crs M_cre M_s •• 30% of users had multiple 30% of users had multiple IP addresses (coming from IP addresses (coming from behind proxy servers) behind proxy servers)PKDD 2001 Tutorial: “KDD for Personalization” [DP-21]
  19. 19. Mechanisms for User Identification Method Description Priv acy Adv antages Disadv antages Concerns IP A ddre s s + A s s um e e a c h unique Lo w A lw a ys a va ila ble . N o N o t g ua ra nte e d to be A g e nt IP a ddre s s /A g e nt a dditio na l unique . D e fe a te d by pa ir is a unique us e r te c hno lo g y re quire d. ro ta ting IP s . E m be dde d U s e dyna m ic a lly Lo w to A lw a ys a va ila ble . C a nno t c a pture S e s s io n Ids g e ne ra te d pa g e s to m e dium Inde pe nde nt o f IP re pe a t vis ito rs . a s s o c ia te ID w ith a ddre s s e s . A dditio na l o ve rhe a d e ve ry hype rlink fo r dyna m ic pa g e s . R e g is tra tio n U s e r e xplic itly lo g s M e dium C a n tra c k M a ny us e rs w o nt in to the s ite . individua ls no t jus t re g is te r. N o t bro w s e rs a va ila ble be fo re re g is tra tio n. C o o k ie S a ve ID o n the c lie nt M e dium to C a n tra c k re pe a t C a n be turne d o ff by m a c hine . hig h vis its fro m s a m e us e rs . bro w s e r. S o ftw a re P ro g ra m lo a de d into H ig h A c c ura te us a g e da ta Lik e ly to be re je c te d A g e nts bro w s e r a nd s e nds fo r a s ing le s ite . by us e rs . ba c k us a g e da ta .PKDD 2001 Tutorial: “KDD for Personalization” [DP-22] Impact of User Identification Heuristics These experiments show the impact of using IP+Agent heuristic for user These experiments show the impact of using IP+Agent heuristic for user identification on sessionization heuristics (as compared to cookies) identification on sessionization heuristics (as compared to cookies) h1-30-real h1-30-ipa h -ref-real h -ref-ipa 1.00 1.00 0.90 0.90 0.80 0.80 0.70 0.70 0.60 0.60 0.50 0.50 0.40 0.40 0.30 0.30 _s _o r e rs re _s r e _o rs re _c _c rs rs _c _c _c _c M M M M _c M _c M M M M M M MPKDD 2001 Tutorial: “KDD for Personalization” [DP-23]
  20. 20. Inferring User Transactions from Sessions • Observation: reference lengths follow an exponential distribution • Page types correlate with Histogram of reference lengths page reference lengths (secs) • Page types: navigational, content, or hybrid • Can automatically classify pages as navigational or content using statistical modeling • A transaction can be defined as an intra-session path ending in a content page, or as a set of navigational content content pages in a session pages pagesPKDD 2001 Tutorial: “KDD for Personalization” [DP-24] Path Completion • Refers to the problem of inferring missing user references due to caching. • Effective path completion requires extensive knowledge of the link structure within the site • Referrer information in server logs can also be used in disambiguating the inferred paths. • Problem gets much more complicated in frame- based sites.PKDD 2001 Tutorial: “KDD for Personalization” [DP-25]
  21. 21. Path Completion - An Example A User’s navigation path: A => B => D => E => D => B => C URL Referrer B C A -- B A D B E D D E F C B • There may be multiple candidates for completing the path. For example consider the two paths : E => D => B => C and E => D => B => A => C. • In this case, the referrer field allows us to partially disambiguate. But, what about: E => D => B => A => B => C? • One heuristic: always take the path that requires the fewestPKDD 2001 Tutorial: “KDD for Personalization” [DP-26] Integrating E-Commerce Events • Either product oriented or visit oriented • Not necessarily a one-to-one correspondence with user actions • Used to track and analyze conversion of browsers to buyers • Major difficulty for E-commerce events is defining and implementing the events for a site – however, in contrast to clickstream data, getting reliable preprocessed data is not a problem • Another major challenge is the successful integration with clickstream dataPKDD 2001 Tutorial: “KDD for Personalization” [DP-27]
  22. 22. Product-Oriented Events • Product View – Occurs every time a product is displayed on a pageview – Typical Types: Image, Link, Text • Product Click-through – Occurs every time a user “clicks” on a product to get more information • Category click-through • Product detail or extra detail (e.g. large image) click- through • Advertisement click-throughPKDD 2001 Tutorial: “KDD for Personalization” [DP-28] Product-Oriented Events • Shopping Cart Changes – Shopping Cart Add or Remove – Shopping Cart Change - quantity or other feature (e.g. size) is changed • Product Buy or Bid – Separate buy event occurs for each product in the shopping cart – Auction sites can track bid events in addition to the product purchasesPKDD 2001 Tutorial: “KDD for Personalization” [DP-29]
  23. 23. Content and Structure Preprocessing • Processing content and structure of the site are often essential for successful usage analysis • Two primary tasks: – determine what constitutes a unique page file (i.e., pageview) – represent content and structure of the pages in a quantifiable formPKDD 2001 Tutorial: “KDD for Personalization” [DP-30] Content and Structure Preprocessing • Basic elements in content and structure processing – creation of a site map • captures linkage and frame structure of the site • also needs to identify script templates for dynamically generated pages – extracting important content elements in pages • meta-information, keywords, internal and external links, etc. – identifying and classifying pages based on their content and structural characteristicsPKDD 2001 Tutorial: “KDD for Personalization” [DP-31]
  24. 24. Quantifying Content and Structure • Static Pages – All of information is contained within the HTML files for a site – Each file can be parsed to get a list of links, frames, images, and text – Files can be obtained through the file system, or HTTP requests from an automated agent (site spider)PKDD 2001 Tutorial: “KDD for Personalization” [DP-32] Quantifying Content and Structure • Dynamic Pages – Pages do not exist until they are created due to a specific request – Relevant information can come from a variety of sources: Templates, databases, scripts, HTML, etc. – Three methods of obtaining content and structure information: • Series of HTTP requests from a site mapping tool • Compile information from internal sources • Content server toolsPKDD 2001 Tutorial: “KDD for Personalization” [DP-33]
  25. 25. Integrating content and structure I Domain knowledge: content - purpose: group pages by their content - method: analyze text, meta-tags, and/or URL (query string) - grouping by classification or clustering Concept hierarchies Entertainment Performing Music ... Example of a Arts content-based Artists Genres New Releases ... concept hierarchy Blues Jazz New Age ...PKDD 2001 Tutorial: "KDD for Personalization" [DP-34] Integrating content and structure II Content profiles from feature clusters 1, vector space model: each unique word in corpus = one dimension, each page(view) is a vector with a non-zero weight for each word in that page(view), zero weight for other words 2. feature - pageview matrix (note: "feature" = word, "pageview" because of frames) music jazz artist ... pv1 1.00 0.80 0.05 pv2 1.00 0.00 0.70 ... 3. features as weighted vectors of pageviews jazz = [ <pv1,0.80>, <pv2,0.00>, ... ] 4. group features -> feature clusters -> content profilesPKDD 2001 Tutorial: "KDD for Personalization" [DP-35]
  26. 26. Integrating content and structure III Structure - purpose: group pages by their hyperlink structure - ex. page types in Pirolli et al. [54] and Cooley et al. [B20]: [B24] [15]: head, navigation, content, look-up, personal - ex. path distance to a reference page A.html B.html C.html dA = 1 dA = 2 - structure as weighted vector of page(view)s S = [ <A.html,0>, <B.html,1>, <C.html,0>, ... ](only B content page) S = [ <A.html,0>, <B.html,1>, <C.html,3>, ... ] (path distances) - grouping by classification or clusteringPKDD 2001 Tutorial: "KDD for Personalization" [DP-36] Relating content and structure to mined usage I : Content/structure mining as pre-/post-processing steps Ex. online catalog search (Berendt & Spiliopoulou [B18, B17]): [8, 6]): 1. service-based concept hierarchy: which query options? Info on schools indiv. school list of schools ... 1 parameter 2 par.s 3 parameters Location Name ... Location+Name ... ...PKDD 2001 Tutorial: "KDD for Personalization" [DP-37]
  27. 27. Relating content and structure to mined usage I 2. discovering and comparing navigation patterns in classified pages part of a resulting WUM navigation pattern:PKDD 2001 Tutorial: "KDD for Personalization" [DP-38] Relating content and structure to mined usage I Ex. WebSIFT Information Filter (from Cooley [14]): [B19]): Mined knowledge domain know- interesting belief example ledge source general site structure The head page is not the most usage statistics common entry point general site content A page designed to provide usage statistics content is being used as a navigation page frequent itemsets site structure A set of pages is frequently accessed together, but not usage clusters site content directly linked A usage cluster contains => discover patterns at different pages from multiple content levels of abstraction, discover categories deviations from intended usagePKDD 2001 Tutorial: "KDD for Personalization" [DP-39]
  28. 28. Relating content and structure to mined usage II : Usage, content, and structure mining as 3 ways of deriving a common kind of representation Mobasher, Dai, Luo, Sun, & Zhu [44] [B22] - a vector of tuples <pageview,weight>: usage: sessions / visits, or parts of them (past + current) content: features structure: pages and their characteristics - unordered or ordered collections => identify clusters that are similar, where similarity is by usage, content, or structurePKDD 2001 Tutorial: "KDD for Personalization" [DP-40] È ØØ ÖÒ × ÓÚ ÖÝ ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ ÅÝÖ ËÔ Ð ÓÔÓÙÐÓÙ ÀÀÄ º[PD-1] ºº
  29. 29. Ï ÒØ Ý Ø ÓÐÐÓÛ Ò ×Ô Ø× Ó Ø Ô Ö×ÓÒ Ð Þ Ø ÓÒ × ÖÚ ×¸ Û Ò ÒÚ × ×Ø Ö ×ÙÐØ Ó Ô ØØ ÖÒ × ÓÚ ÖÝ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ ¯ Ô Ö×ÓÒ Ð Ö ÓÑÑ Ò Ø ÓÒ ¯ ´Ð Ò ØÓµ Ô ¯ × Ð ÒØ ÝÒ Ñ Ù×ØÑ ÒØ ¯ ÔÔÐ Ø ÓÒ Ó Ø ¯ ×Ø Ø Ô »× Ø Ù×ØÑ ÒØ Å Ø Ò × ÓÒ ÕÙ × Ø ÓÒ Ø ÐÐ Ø ÓÒ ¯ Ù× Ö ÔÖÓ Ð × ¯ ÐÐ ×Ø Ô× ÓÒ¹Ð Ò ¯ Ù× Ö Ö Ø Ò × ¯ Ó ¹Ð Ò Ô ØØ ÖÒ × ÓÚ ÖÝ ¯ Ù× Ö Ú ÓÙÖ ² ÓÒ¹Ð Ò Ñ Ø Ò ¯ ÓÒØ ÒØ Ó Ó Ø×Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ ÅÝÖ ËÔ Ð ÓÔÓÙÐÓÙ ÀÀÄ º[PD-2] ºº È ØØ ÖÒ × ÓÚ ÖÝ ÔØ Ú Û × Ø × Ì ÔÔÖÓ Ó È Ö ÓÛ ØÞ ² ØÞ ÓÒ ¾¸ ¿ Ì ÁÒ Ü Ò Ö ÓÒ× ×Ø× Ó Ø Ö Ô × × ½º ÄÓ ÔÖÓ ×× Ò ×Ø Ð × Ñ ÒØ Ó × ×× ÓÒ× × × Ø× Ó Ô Ö ÕÙ ×Ø× ¾º ÐÙ×Ø Ö Ñ Ò Ò ÖÓÙÔ Ò Ó Ó¹Ó ÙÖ Ò ÒÓÒ¹Ð Ò Ô × ÛØ ÐÔ Ó Ø ×Ø Ö Ô ¿º ÓÒ ÔØÙ Ð ÐÙ×Ø Ö Ò ¡ Ì Ö ÔÖ × ÒØ Ø Ú ÓÒ ÔØ Ó ÐÙ×Ø Ö × ÒØ º ¡ ÐÙ×Ø Ö Ñ Ñ Ö× ÒÓØ Ö Ò ØÓ Ø × ÓÒ ÔØ Ö Ö ÑÓÚ ÖÓÑ Ø ÐÙ×Ø Öº ¡ È × Ö Ò ØÓ Ø × ÓÒ ÔØ Ò ÒÓØ ÔÔ Ö Ò Ò Ø ÐÙ×Ø Ö Ö ØØ ØÓ Ø ÐÙ×Ø ÖºÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-3]
  30. 30. ÓÖ ÐÙ×Ø Ö¸ Ø ÁÒ Ü Ò Ö ÔÖ × ÒØ× ØÓ Ø Ï × ÒÖ ¯ Ò Ò ÜÔ Û Ø Ð Ò × ØÓ ÐÐ Ô ×Ó ÐÙ×Ø Ö Ì Ï × ÒÖ × ¬ Û Ø ÖØ Ò ÛÔ × ÓÙÐ Ò ×Ø Ð × ¬ Û Ø Ø× Ð Ð × ÓÙÐ ¬ Û Ö Ø × ÓÙÐ ÐÓ Ø Ò Ø × Ø ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ Ô ÓÒØ Ò Ò ËØ Ø Ô »× Ø Ù×ØÑ ÒØ × Ò Ð ÔÔÐ Ø ÓÒ Ó Ø Å Ø Ò × ÓÒ Ç ¹Ð Ò Ô ØØ ÖÒ × ÓÚ ÖÝ Ù× Ö Ú ÓÙÖ Ò Ô ÓÒØ ÒØÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-4] È ØØ ÖÒ × ÓÚ ÖÝ ÓÖ Ê ÓÑÑ Ò Ø ÓÒ× Ì ÓÐÐ ÓÖ Ø Ú ÐØ Ö Ò ÔÔÖÓ Å Ò Ì Ó Ø× ×Ù ×Ø ØÓ Ù× Ö Ö Ø Ó× ÔÖ ÖÖ Ý Ù× Ö× × Ñ Ð Ö ØÓ Öº ½º Ì Ù× Ö³× ØÖ Ò× Ø ÓÒ × Ñ Ø Ò×Ø ÐÓ ØÖ Ò× Ø ÓÒ׺ ¾º Ì Ñ Ø × Ö Ö Ò º ¿º Ì ×Ø ´× Ø Ó µ Ñ Ø ´ ×µ Ö × Ð Ø º º Ì Ó Ø× Ø Ø Û Ö × ÓÛÒ Ò Ø × Ð Ø ØÖ Ò× Ø ÓÒ× Ö ÖÒ Ü ÐÙ Ò Ó Ø× ÐÖ Ý × Òº º Ì Ó Ø× Û Ø Ø ÖÑÓ×Ø Ö Ò Ö × ÓÛÒ ØÓ Ø Ù× Öº ÐÐ ×Ø Ô× ÓҹРÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-5]
  31. 31. È ØØ ÖÒ × ÓÚ ÖÝ ÓÖ Ê ÓÑÑ Ò Ø ÓÒ× Ì Ø Å Ò Ò ÔÔÖÓ Å Ò Í× Ö × Ñ Ð Ö ØÝ Ò Ò Ò Ø ÖÑ× Ó Ú ÓÙÖ¸ ÒØ Ö ×Ø׸ ÔÖ Ö Ò × Ø Ø Ø Ò ÑÓ ÐÐ Ó ¹Ð Ò ½º È ØØ ÖÒ × ÓÚ ÖÝ ÓÚ Ö Ø ÐÓ Ø ¾º Ì ÓÒØ ÒØ× Ó Ø Ù× Ö³× ØÖ Ò× Ø ÓÒ Ö Ñ Ø Ò×Ø Ø × ÓÚ Ö Ô ØØ ÖÒ׺ ¿º Ì Ñ Ø × Ö Ö Ò º º Ì Ó Ø× ××Ó Ø Û Ø Ø ×Ø Ñ Ø × Ö Ö Ò Ü ÐÙ Ò Ó Ø× ÐÖ Ý × Òº º Ì Ó Ø× Û Ø Ø ÖÑÓ×Ø Ö Ò Ö × ÓÛÒ ØÓ Ø Ù× Öº ×Ó Ø Ø µ Ì ÚÓÐÙÑ ØÒÓÙ× ÐÓ × Ô Ö ÓÖÑ Ö ÓÒÐÝ ÔÖÓ Ö Ú×× ÔÓØعÐÖÒ׺º µ ÇÒ¹Ð Ò Ñ Ò Ø Ò×Ø ÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-6] È ØØ ÖÒ × ÓÚ ÖÝ Ê ÓÑÑ Ò Ø ÓÒ× ÓÒ ÓÖÖ Ð Ø Ø Ñ× Ì ÔÔÖÓ Ó ÎÙ Ø Ò Ç Ö ÓÚ ¼ Ì Ö ÓÑÑ Ò Ø ÓÒ ÔÖÓ Ð Ñ × Ò × Ú Ò Ø Ö ØÒ × Ó Ø Ø Ú Ù× Ö ÓÒ × Ø Ó Ø Ñ׸ Û Û ÐÐ Ö Ö Ø Ò × ÓÒ Ø Ö Ñ Ò Ò Ø Ñ× Ì Ö ØÒ × Ó Ò Ø Ñ Ò ÔÖ Ø ÖÓÑ Ø Ö ØÒ × Å Ò ÓÒ ÓÖÖ Ð Ø Ø Ñ׺ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ ÔÔÐ Ø ÓÒ Ó Ø È Ö×ÓÒ Ð Ö ÓÑÑ Ò Ø ÓÒ Å Ø Ò × ÓÒ Ê Ø¹ Ç ¹Ð Ò × ÓÚ ÖÝ Ó ÔÖ ØÓÖ× ÓÖ Ø Ò × Ó ÓÖÖ Ð Ø Ø Ñ× ÑÔ Ø Ó Ø Ñ ÓÖÖ Ð Ø ÓÒ ÓÒ Ö Ø Ò ×Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-7]
  32. 32. Å Ø Ó ÓÐÓ Ý ¯ Ì Ö ØÒ Ó Ø Ñ Ú Ò ÒÓØ Ö Ø Ñ × ÔÔÖÓÜ Ñ Ø Ù× Ò Ð Ò Ö ÙÒ Ø ÓÒ ´Ò Ñ ÜÔ Öصº ¯ Ì ÚÖ ÓÖÖ Ð Ø ÓÒ ÑÓÒ Ô Ö× Ó Ø Ñ× × ÔÔÖÓÜ Ñ Ø Ù× Ò Ö Ò ÓÑ × ÑÔÐ Ò ÓÚ Ö Ø Ù× Ö Ö Ø Ò ×º ¯ Û Ø Ò × Ñ × ÔÖÓÔÓ× ØÓ Ð ÛØ Ø Ø Ø Ø Ù× Ö× Û Ø × Ñ Ð Ö ÔÖ Ö Ò × Ñ Ý ÔÖÓÚ Ö ÒØ Ö Ø Ò × ÓÖ Ø × Ñ × Ø Ó Ø Ñ׺ ÁÒ Ø × × Ñ ¬ Ì Ð Ò Ö ÜÔ ÖØ× ÓÖ ÐÐ Ô Ö× Ó Ø Ñ× Ò ÓÑÔÙØ Ó ¹Ð Ò º ¬ Ì Ö Ø Ò × ÓÖ Ò Ø Ú Ù× Ö Ö ÔÖ Ø ÖÓÑ Ø × Ø Ó Ô Ö× Ó Ø Ñ× Ö Ø Ö Ø Ò Ø × Ø Ó Ù× Ö Ö Ø Ò ×ºÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-8] È ØØ ÖÒ × ÓÚ ÖÝ Ê Ô Ø¹ ÙÝ Ò Ø ÓÖÝ ÓÖ Ô Ö×ÓÒ Ð Þ Ø ÓÒ Ì ÔÔÖÓ Ó Ý Ö¹Ë ÙÐÞ Ø Ð ¾ Å Ò µ Ê ÓÑÑ Ò Ø ÓÒ× Ö × ÓÒ ÓÖÖ Ð Ø ÔÖÓ Ù Ø׺ µ ÓÖÖ Ð Ø ÓÒ× Ò ÒØ ÛØ Ö Ò Ö ³× Ö Ô Ø¹ ÙÝ Ò Ø ÓÖݸ µ Ø Ö Ù×Ø Ò Ø ØÓ Ø Ô ÖØ ÙÐ Ö Ø × Ó ÒÓÒÝÑÓÙ× Ù× Ö × ×× ÓÒ׺ ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ Ê ÓÑÑ Ò Ø ÓÒ Ó Ò¹ Ë ÖÚ Ð Ñ ÒØ ÔÔÐ Ø ÓÒ ÓÖÑ Ø ÓÒ ÔÖÓ Ù Ø× Ó Ø ÓÖ ÍÊÄ Å Ø Ò × ÓÒ Ù× Ö ÔÖ Ö¹ Ç ¹Ð Ò × ÓÚ ÖÝ Ó ÓÖÖ Ð Ø Ò × ÓÖ ÔÔÐ Ø ÓÒ Ó Ø× ÔÔÐ Ø ÓÒ Ó Ø×Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-9]
  33. 33. Ö Ò Ö ³× Ö Ô Ø¹ ÙÝ Ò Ø ÓÖÝ ¡ ÔÖ Ø× ÙÝ Ö Ú ÓÙÖ ÖÓÑ ´ µ Ô Ò ØÖ Ø ÓÒ Ò ´ µ Ú Ö ÔÙÖ × Ö ÕÙ Ò Ý Ó Ò Ø Ñ ¡ Ý ÔÖÓÚ Ò Ö Ö Ò ÑÓ Ð Ø Ø Ö Ø Ö Þ × Ö Ô Ø Ó¹Ó ÙÖ Ò ÔÙÖ × × Ó Ø Ñ× × Ö Ò ÓÑ ÓÖ ÒÓØ Ö Ò ÓÑ Û Ö Ô Ò ØÖ Ø ÓÒ Ö Ö× ØÓ Ø ÔÖ Ö Ò Ó Ù×ØÓÑ Ö ÓÖ Ö Ò Ú Ö ÔÙÖ × Ö ÕÙ Ò Ý Ö Ö× ØÓ Ö Ô Ø ÔÙÖ × × Ó Ø Ø Ñ¸ ÒÓÖ Ò Ö Ø Ö ×Ø × Ó Ø Ø Ñ¸ ÑÓÙÒØ Ó Ø Ø Ñ Ò × Þ Ó Ø ÔÙÖ × × Û ÓÐ ºÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-10] ××ÙÑÔØ ÓÒ× Ó ¾ ¬ Ì ÔÖÓ Ð ØÝ Ó Ö Ó¹Ó ÙÖ Ò × Ó ØÛÓ ÔÖÓ Ù Ø× Ò ×Ù × ÕÙ ÒØ ÔÙÖ × × ÓÐÐÓÛ× ÐÓ ÖØ Ñ × Ö × ×ØÖ ÙØ ÓÒº ¬ ËÙ × ÕÙ ÒØ ÔÙÖ × × Ó Ø × Ñ Ù×ØÓÑ Ö´×µ Ò Ó × ÖÚ × ÕÙ Ú Ð ÒØ ØÓ × Ø Ó ÔÙÖ × × ×× ÓÒ× ÙÖ Ò Ø ÐÓ Ô ÖÓ º Å Ø Ó ÓÐÓ Ý ¯ ÓÑÔÙØ Ø ÓÒ Ó Ø Ö ÕÙ Ò Ý ×ØÖ ÙØ ÓÒ× Ó ÐÐ Ó¹Ó ÙÖ Ò × Ó ÔÖÓ Ù Ø Ô Ö׸ ÓÙÒØ Ò ÓÒ Ó¹Ó ÙÖ Ò Ô Ö × ×× ÓÒ ÓÒÐÝ ¯ Ð Ñ Ò Ø ÓÒ Ó ×ØÖ ÙØ ÓÒ× Û Ø ×Ñ ÐÐ ÒÙÑ Ö Ó Ó × ÖÚ Ø ÓÒ× ¯ Ð Ñ Ò Ø ÓÒ Ó Ø Ô Ö ÒØ Ð Ó Ø Ö Ô Ø¹ ÙÝ Ô Ö× ¯ ÓÑÔÙØ Ø ÓÒ Ó Ø Ó¹Ó ÙÖ Ò ÔÖ ØÓÖ ÓÖ Ô Ö ×Ó Ø Ø ÓÙØÐ Ö× ÓÖ ÔÖ ØÓÖ Ò Ó × ÖÚ × ÓÖÖ Ð Ø Ø Ñ׺Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-11]
  34. 34. Pattern Discovery Association mining for personalization Basic Idea: match left-hand side of rules with the active user session and recommend items in the rule’s consequent Essential to store patterns in efficient data structures • the search of all rules in real-time is computationally ineffective Ordering of accessed pages is not taken into account Good recommendation accuracy, but the main problem is “coverage” • high support thresholds lead to low coverage and may eliminate important, but infrequent items from consideration • low support thresholds result in very large model sizes and computationally expensive pattern discovery phasePKDD 2001 Tutorial: “KDD for Personalization” [PD-12] [1] Association Mining - Basic Concepts We start with a set I of items and a set D of transactions. A transaction T is a set of items (a subset of I): I = { i1 , i 2 ,..., i m } T ⊆ I An Association Rule is an implication on itemsets X and Y, denoted by X ==> Y, where X ⊆ I , Y ⊆ I , X ∩Y =∅ The rule meets a minimum confidence of c, meaning that c% of transactions in D which contain X also contain Y. In addition for each itemset a minimum support of s must be satisfied: s ≤ X ∪Y / D c ≤ X ∪Y / XPKDD 2001 Tutorial: “KDD for Personalization” [PD-13] [2]
  35. 35. È ØØ ÖÒ × ÓÚ ÖÝ ××Ó Ø » ××Ó Ø Ø Ñ× Ò Ù× Ö× Ì ÔÔÖÓ Ó Ä Ò¸ ÐÚ Ö Þ ² ÊÙ Þ ¿ Å Ò µ Í× Ö× Ö ××Ó Ø ØÓ ÓØ Ö Ò Ø ÖÑ× Ó ÓÛ Ø Ý Ö Ø Ø Ñ׺ µ ÁØ Ñ× Ö ××Ó Ø ØÓ ÓØ Ö Û Ø Ö ×Ô Ø ØÓ Ù× Ö ÔÖ Ö Ò ×º ××Ó Ø ÓÒ× ÑÓÒ Ø Ñ× Ò ÓÙÒ Ó ¹Ð Ò º ××Ó Ø ÓÒ× ØÓ Ø Ø Ú Ù× Ö Ò ÓÙÒ ÓÒ¹Ð Ò º ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ ÔÔÐ Ø ÓÒ Ó Ø È Ö×ÓÒ Ð Ö ÓÑÑ Ò Ø ÓÒ Å Ø Ò × ÓÒ ××Ó Ø ÓÒ× ÇÒ¹Ð Ò × ÓÚ ÖÝ Ó ××Ó º ÑÓÒ Ø Ñ× Ò ÑÓÒ Ù× Ö× ÖÙÐ × Û Ø Ú Ò ÊÀËÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-14] Å Ø Ó ÓÐÓ Ý ¯ Ê ÓÑÑ Ò Ø ÓÒ× Ö ×Ù Ø ØÓ Ñ Ò ÑÙÑ ÓÒ Ò Ò Ñ Ò ÑÙÑ ÒÙÑ Ö Ó ÖÙÐ × ÓÒ×ØÖ ÒØ׺ ¯ Ì Ñ Ò Ö × ÓÚ Ö× ××Ó Ø ÓÒ ÖÙÐ × Ø Ö Ø Ú Ðݸ ÙÒØ Ð Ø ×Ö ÒÙÑ Ö Ó ÖÙÐ × × ÜØÖ Ø º Ì ×ÙÔÔÓÖØ ÙØÓ × Ù×Ø Ò Ø Ö Ø ÓÒº ¯ ÊÙÐ × ÓÒ ÖÒ ÓØ Ø Ñ× Ò Ù× Ö× Í× Ö½ Ð Æ Í× Ö¾ ×Ð µ Ì Ö ØÍ× Ö Ð ÁØ Ñ½ Ð Æ ÁØ Ñ¾ Ð µ Ì Ö ØÁØ Ñ Ð ¯ Ò Ø Ø Ñ× Ö ÓÑÔÙØ ÖÓÑ ××Ó Ø ÓÒ× ÒÚÓÐÚ Ò Ù× Ö× × Ñ Ð Ö ØÓ Ø Ø Ú Ù× Öº ÓÒ¹Ð Ò ¯ Ë ÓÖ × Ó Ø Ñ× Ö ÓÑÔÙØ ÖÓÑ ××Ó Ø ÓÒ× Ö Ø Ò Ù× Ö ÔÖ Ö Ò ×º Ó ¹Ð Ò ¯ Ì Ò Ø Ø Ñ× Û Ø ×Ø × ÓÖ × Ö ×Ù ×Ø ØÓ Ø ØÚ Ù× Öº ÓҹРÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-15]
  36. 36. Pattern Discovery Association mining for personalization The approach of Mobasher, et al, 2001 [45] Main Idea: avoid offline generation of all association rules; generate recommendations directly from itemsets • discovered frequent itemsets of are stored into an “itemset graph” (an extension of lexicographic tree structure of Agrawal, et al 1999 [2]) • recommendation generation can be done in constant time by doing a directed search to a limited depth According to our categorization Visibility: Personal recommenda- Service element: pageview tions or silent dynamic adjustment Matching based on: user behaviourPKDD 2001 Tutorial: “KDD for Personalization” [PD-16] [3] Methodology: • Construct Frequent Itemset Graph – each node at depth d in the graph corresponds to an itemset – I, of size d and is linked to itemsets of size d+1 that contain I at level d+1 – the single root node at level 0 corresponds to the empty itemset • frequent itemsets are matched against a users active session S by performing a search of graph to depth |S| • a recommendation r is an item at level |S+1| whose recommendation score is the confidence of rule S ==> rPKDD 2001 Tutorial: “KDD for Personalization” [PD-17] [4]
  37. 37. Pattern Discovery Sequence mining for personalization Main Idea: take the ordering of accessed items into account Two basic approaches • use contiguous sequences (e.g., Web navigational patterns) • use general sequential patterns Contiguous sequential patterns are often modeled as Markov chains and used for prefetching (i.e., predicting the next user access based on previously accessed pages In context of recommendations, they can achieve higher accuracy than other methods, but may be difficult to obtain reasonable coveragePKDD 2001 Tutorial: “KDD for Personalization” [PD-18] [5] Pattern Discovery Sequence mining for personalization Markov chain representation often leads to high space complexity due to model sizes Some Solutions • selective Markov Models (Deshpande, Karypis, 2000 [17]) use various pruning strategies to reduce the number of states (e.g., support or confidence pruning, error pruning) • longest repeating subsequences (Pitkow, Pirolli, 1999 []) similar to support pruning, used to focus only on significant navigational paths • increased coverage can be achieved by using all-Kth-order models (i.e., using all possible sizes for user histories)PKDD 2001 Tutorial: “KDD for Personalization” [PD-19] [6]
  38. 38. È ØØ ÖÒ × ÓÚ ÖÝ Ë ÕÙ Ò Ñ Ò Ò ÓÖ Ô Ö×ÓÒ Ð Þ Ø ÓÒ Ì ÔÔÖÓ Ó ÙÐ ² Ë Ñ Ø¹Ì Ñ ¾ Å Ò µ Ê ÓÑÑ Ò Ø ÓÒ× Ö × ÓÒ Ö ÕÙ ÒØ Ô ØØ ÖÒ× Ó Ô ×Ø Ú ÓÙÖº µ Ö ÓÑÑ Ò Ö × ÔÖ ØÓÖ ÓÖ Ð ×× Ó Ú ÒØ׺ µ Ì ÓÒ×Ø ÐÐ Ø ÓÒ Ó Ø Ö ÓÑÑ Ò Ö× ÓÖ ÐÐ Ð ×× × Ö ØÙÖÒ× Ø ×Ø Ö ÓÑÑ Ò Ø ÓÒ× ÓÖ Ú Ò Ù× Ö ×ØÓÖݺ ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ ÍÊÄ׸ × Ø Ó Ø× Ê ÓÑÑ Ò Ø ÓÒ Å Ø Ò × ÓÒ Ò Ú Ø ÓÒ Ç ¹Ð Ò ØÖ Ò Ò Ó Ð ×× Ö× ×ØÓÖ × Ò ÍÊÄ ÔÖÓÜ Ñ ØÝ ÐÓ Ð Ö ÓÑÑ Ò Ö ×Ý×Ø Ñ×Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-20] Ò Ö Ö Ñ ÛÓÖ ¯ Û Ø Ñ ×ÙÖ × ÓÖ Ø ÕÙ Ð ØÝ Ó Ö ÓÑÑ Ò Ø ÓÒ¸ Ø Ò Ø ×Ø Ò ØÛ Ò Ò Ø ÍÊÄ× ÒØÓ ÓÙÒØ ¯ ×Ø Ò Ù × Ò ØÛ Ò ÝÒ Ñ Ò ×Ø Ø Ö ÓÑÑ Ò Ö× Ø Ø Ó» Ó ÒÓØ Ø Ù× Ö ×ØÓÖ × ÒØÓ ÓÙÒØ ¯ ÓÑ Ò Ò ÐÓ Ð Ö ÓÑÑ Ò Ö ×Ý×Ø Ñ׸ Ó Û ÔÖ Ø× Ð ×× Ó Ú ÒØ× Û Ö Ð ×× Ò ÓÒ Ù× Ö ×ØÓÖݸ ÖÓÙÔ Ó ×ØÓÖ × ÓÖ Ø Û ÓÐ Ø × Øº Ì Ö Ý¸ Ò Ú Ø ÓÒ ×ØÓÖÝ × ¬ × Ø Ó Ú ÒØ× ¬ × ÕÙ Ò Ó Ú ÒØ× ¬ ÑÓÖ ÓÑÔÐ Ü ×ØÖÙ ØÙÖ Ó Ó¹Ó ÙÖ Ò Ú ÒØ×Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-21]
  39. 39. È ØØ ÖÒ × ÓÚ ÖÝ Í× ÔÖÓ Ð × ÓÖ Ô Ö×ÓÒ Ð Þ Ø ÓÒ Ì ÔÔÖÓ Ó ÅÓ × Ö Ø Ð ¿¸ ¾ ÌÛÓ ØÝÔ × Ó Ù× ÔÖÓ Ð × ÐÙ×Ø Ö× Ó × Ñ Ð Ö Ù× Ö ØÖ Ò× Ø ÓÒ× Ò¹ ÐÙ×Ø Ö× Ó Ô × ×× Ò Ý Û ØÒ × Ñ Ø Ø Ö ÑÓÚ × ØÓ Ø Ö Ô × ÛØ ×ÙÔÔÓÖØ Ð ×× Ø Ò Ñ Ò Ú ÐÙ Ö ØÒ Ø Ñ Ñ Ö× Ó ÐÙ×Ø Ö ÒØÓ ÓÒ Ö ÔÖ × ÒØ Ø Ú ÔÖÓ Ð ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ È Ö×ÓÒ Ð Ö ÓÑÑ Ò ¹ Ë ÖÚ Ð Ñ ÒØ Ô Ú Û Ø ÓÒ ÓÖ × Ð ÒØ ÝÒ Ñ Ù×ØÑ ÒØ Å Ø Ò × ÓÒ Ù× Ö Ú ÓÙÖ Ç ¹Ð Ò × ÓÚ ÖÝ Ó Ð×Ó Ô ÓÒØ ÒØ Ò Ö Ø ÔÖÓ Ð ×Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-22] µ Ú × Ñ Ð Ö Ô Ö ÓÖÑ Ò ØÓ ÓÒ¹Ð Ò ÓÐÐ ÓÖ Ø Ú ÐØ Ö Ò Ñ× µ Ù× Ò Ñ Ò Ñ Ð ÒÙÑ Ö Ó Ô Ú Û× ÓÖ Ø Ø Ú Ù× Ö Å Ø Ó ÓÐÓ Ý ¯ ÈÖ ÔÖÓ ×× Ò Ô × ¬ ×× ÒÑ ÒØ Ó Û Ø× ØÓ Ø Ô Ú Û× ¬ Ë Ò Ò Ø ×Ø Ò ¸ × ÓÒ Ô ×Ø Ý Ø Ñ ¬ ÆÓÖÑ Ð Þ Ø ÓÒ Ó Ô Ú Û Û Ø× ¯ È Ì ÈÖÓ Ð Ö Ø ÓÒ × ÓÒ ÐÙ×Ø Ö Ò Ì Ò ÕÙ × ½º ÐÙ×Ø Ö Ò Ó Ù× Ø ØÓ ×Ø Ð × Ø Ö Ø ÔÖÓ Ð × ¾º Å Ø Ö Ð Þ Ø ÓÒ Ó Ø ÔÖÓ Ð × × Ú ØÓÖ× Ó ´Ô ¸Û ص Ô Ö× ¿º Ë Ò Ó Ø Ù× Ö³× ×ØÓÖÝ Ý Ñ Ò× Ó ×Ð Ò Û Ò ÓÛ Ø Ø ÐÐÓÛ× ÓÒÐÝ × Ø Ó Ô ×× × ØÓ ÓÒ× Ö Ò Ø ÔÖÓ Ð º Å Ø Ò Ø Ù× Ö × ×× ÓÒ Û Ø ÔÖÓ Ð º Å Ø Ö Ò ÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-23]
  40. 40. A Framework for Personalization Based on Aggregate Profiles Offline PhasePKDD 2001 Tutorial: “KDD for Personalization” [PD-24] [7] A Framework for Personalization Based on Aggregate Profiles Input from the batch process Online Usage Profiles Phase Content Profiles • Match current user’s activity against the discovered profiles • Each recommended item is assigned a score based on – matching criteria and quality of aggregate profiles – “information value” of the item based on domain knowledgePKDD 2001 Tutorial: “KDD for Personalization” [PD-25] [8]
  41. 41. Aggregate Profiles Based on Clustering Transactions (PACT) (Mobasher, et al, [42, 43]) • Input – set of relevant pageviews in preprocessed log P = { p1 , p2 ,! , pn } – set of user transactions T = {t1 , t 2 , ! , t m } – each transaction is a pageview vector t = w( p1 , t ), w( p2 , t ),..., w( pn , t )PKDD 2001 Tutorial: “KDD for Personalization” [PD-26] [9] Aggregate Profiles Based on Clustering Transactions (PACT) • Transaction Clusters – each cluster contains a set of transaction vectors – for each cluster compute centroid as cluster representative " c = u1c , u2 ,!, un c c • Aggregate Usage Profiles – a set of pageview-weight pairs: for transaction cluster c C, select each pageview pi such that ui (in the cluster centroid) is greater than a pre-specified thresholdPKDD 2001 Tutorial: “KDD for Personalization” [PD-27] [10]
  42. 42. Example Aggregate Profiles • Example Profiles based on the PACT method – Based on data from the Association for Consumer Research Site: 1.00 1.00 Call for Papers Call for Papers 0.67 0.67 ACR News Special Topics ACR News Special Topics 0.67 0.67 CFP: Journal of Psychology and Marketing I CFP: Journal of Psychology and Marketing I 0.67 0.67 CFP: Journal of Psychology and Marketing II CFP: Journal of Psychology and Marketing II 0.67 0.67 CFP: Journal of Consumer Psychology II CFP: Journal of Consumer Psychology II 0.67 0.67 CFP: Journal of Consumer Psychology I CFP: Journal of Consumer Psychology I 1.00 1.00 CFP: Winter 2000 SCP Conference CFP: Winter 2000 SCP Conference 1.00 1.00 Call for Papers Call for Papers 0.36 0.36 CFP: ACR 1999 Asia-Pacific Conference CFP: ACR 1999 Asia-Pacific Conference 0.30 0.30 ACR 1999 Annual Conference ACR 1999 Annual Conference 0.25 0.25 ACR News Updates ACR News Updates 0.24 0.24 Conference Update Conference UpdatePKDD 2001 Tutorial: “KDD for Personalization” [PD-28] [11] Hypergraph-Based Clustering (Han, Karypis, Kumar, Mobasher, 1998 [26]) • Construct a hypergraph from sets of related items – Each hyperedge represents a frequent itemset – Weight of each hyperedge can be based on the characteristics of frequent itemsets or association rules (e.g., support, confidence, interest, etc.)PKDD 2001 Tutorial: “KDD for Personalization” [PD-29] [12]
  43. 43. Hypergraph-Based Clustering • Recursively partition hypergraph so that each partition contains only highly connected items – Given a hypergraph we find a k-way partitioning such that the weight of the hyperedges that are cut is minimized – The fitness of partitions measured in terms of the ratio of weights of cut edges to the weights of uncut edges within the partitions – The connectivity measures the percentage of edges within the partition with which the vertex is associated -- used for filtering partitions – Vertices from partial edges can be added back to clusters based on a user-specified overlap factorPKDD 2001 Tutorial: “KDD for Personalization” [PD-30] [13] Profiles Based on Hypergraph Clusters (Mobasher, Cooley, Srivastava, 1999 [41]) • Input – input for clustering is the set of large itemsets from association rule module – each itemset is a hyperedge (weights are a function of the interest of the itemset) support( I ) Interest ( I ) = ∏ i∈I support(i) – In practice can use the log of interest to avoid few highly frequent patterns from totally dominatingPKDD 2001 Tutorial: “KDD for Personalization” [PD-31] [14]
  44. 44. Profiles Based on Hypergraph Clusters • Aggregate Profiles (Item/Pageview Clusters) – clustering program directly outputs a set of overlapping pageview clusters – the weight associated with pageview p in a cluster C is based on the connectivity value of p in hypergraph partition: {e | e ⊆ C , p ∈ e} conn( p, C ) = {e | e ⊆ C}PKDD 2001 Tutorial: “KDD for Personalization” [PD-32] [15] Recommendation Engine for Using Aggregate Profiles • Match user’s activity against discovered profiles – a sliding window over the active session to capture the current user’s “short-term” history depth – profiles and the active session are treated as vectors – matching score is computed based on the similarity between vectors (e.g., normalized cosine similarity) • Recommendation scores are based on • matching score to aggregate profiles • “information value” of the recommended item (e.g., link distance of the recommendation to the active session) – recommendations are contributed by multiple profilesPKDD 2001 Tutorial: “KDD for Personalization” [16] [PD-33]
  45. 45. Active Session Window • Example: Session window of size 5 A.html ! B.html ! C.html ! D.html ! E.html ! D.html ! F.html active user session Session window • Associating weight with items in the active session: – assigned by site owner based on perceived importance – based on recency (recent pages weighted higher) or time spent on pages – based on page types (e.g., content v. navigational)PKDD 2001 Tutorial: “KDD for Personalization” [PD-34] [17] Example: Recommendations Based on PACT Example profiles: Current User Session U: A.html => B.html => C.html => E.html PROFILE 0 ------------- Assume session window size of 3 and unit weights, using 1.00 D.html (cosine) similarity between active session and each profile: 0.50 A.html 0.50 C.html Sim(U, P0) = (0.5+0.5) / SQRT (1.75 * 3) = 0.44 0.50 E.html Sim(U, P1) = (0.5+0.5+0.5) / SQRT(2.5*3) = 0.20 Sim(U, P2) = (0.75+0.5) / SQRT(1.69*3) = 0.25 PROFILE 1 ------------- Recommendations 1.00 A.html Candidate Recommendations: 0.50 B.html 0.50 C.html P0: D.html (SQRT(0.44*1.00) = 0.66) 0.50 D.html A.html (SQRT(0.44*0.50) = 0.47) 0.50 E.html 0.50 F.html P1: A.html (SQRT(0.20*1.00) = 0.45) PROFILE 2 D.html (SQRT(0.20*0.50) = 0.32) ------------- F.html (SQRT(0.20*0.50) = 0.32) 0.75 B.html 0.75 F.html 0.50 A.html P2: F.html (SQRT(0.22*0.75) = 0.41) 0.50 C.html A.html (SQRT(0.22*0.50) = 0.33) 0.25 D.html D.html (SQRT(0.22*0.25) = 0.23)PKDD 2001 Tutorial: “KDD for Personalization” [PD-35] [18]
  46. 46. Integration of Content Profiles (Mobasher, et al., 2000 [44]) • Cluster features over the n-dimensional space of pageviews • For each feature cluster derive a content profile by collecting pageviews in which these features appear as significant (represented as overlapping collections of pageview-weight pairs) Weight Pageview ID Significant Features (stems) 1.00 CFP: One World One Market world challeng busi co manag global 0.63 CFP: Intl Conf. on Marketing & Development challeng co contact develop intern 0.35 CFP: Journal of Global Marketing busi global 0.32 CFP: Journal of Consumer Psychology busi manag global Weight Pageview ID Significant Features (stems) 1.00 CFP: Journal of Psych. & Marketing psychologi consum special market 1.00 CFP: Journal of Consumer Psychology I psychologi journal consum special market 0.72 CFP: Journal of Global Marketing journal special market 0.61 CFP: Journal of Consumer Psychology II psychologi journal consum special 0.50 CFP: Society for Consumer Psychology psychologi consum special 0.50 CFP: Conf. on Gender, Market., Consumer Behavior journal consum marketPKDD 2001 Tutorial: “KDD for Personalization” [PD-36] [19] Integration of Content Profiles • Integration with Recommendation Engine – Usage and content profiles have similar representation, so they can be used by the recommendation engine in the same way • Item weights in profiles must be normalized, so content and usage profiles can be compared on the same scale – One approach: match active user session with all profiles (both content and usage); then use the maximal recommendation score for candidate recommendations – Another approach: use content profiles for generating recommendations only if no matching usage profiles (with sufficient confidence) is foundPKDD 2001 Tutorial: “KDD for Personalization” [PD-37] [20]
  47. 47. Evaluating Personalization PKDD 2001 Tutorial: “KDD for Personalization” [E-1] Evaluating usability: goals / tasks? Recall operational definition: A Web site’s usability is high if users - achieve their goals / perform their tasks in little time, - do so with a low error rate, - experience high subjective satisfaction. Depending on the site, relevant goals / tasks may be to: - stay in the site, return to the site, buy... => E-metrics - locate content (search), - learn, - ...PKDD 2001 Tutorial: "KDD for Personalization" [E-2]
  48. 48. Evaluating usability: methodological caveats Questionnaire data: self-reports are often biased; observation of behavior in experiments advisable Comparisons of sites with/without personalization, or before/after personalization introduced, with respect to "normal user behavior" (server logs): usually a quasi-experiment - many uncontrolled variables (e.g., user intentions) - poss. several differences between sites/site versions => causal attribution of success to personalization becomes difficultPKDD 2001 Tutorial: "KDD for Personalization" [E-3] Evaluating usability: results I CyberBehavior Research Center 1999 survey - 81% of 694 respondents have visited a person. site - 64% of those found it useful: helpful, time saving - perceived usefulness changes with product (books > music > inf.technol. > news/articles > other) - main problems: privacy, ineffectiveness when behav. did not reflect user "personally" (e.g., buying a gift) - concern that possible choices may be limited - little differences of opinion between personalization occurring in response to behavior or to solicited inputPKDD 2001 Tutorial: "KDD for Personalization" [E-4]
  49. 49. Evaluating usability: results II Belkin [3], reviewing studies of recommendations in IR systems carried out at Rutgers Univ. since 1995: - measures of performance and subj. satisfaction - relevance feedback worked well, but bettter with both increased knowledge of how it worked, and with increased control by the user of its suggestions: - relevance feedback + term suggestion performed better than, and was preferred to, pure relevance feedback - users preferred to save effort: were willing to hand over the subsidiary task of term selection to a system they trust edPKDD 2001 Tutorial: "KDD for Personalization" [E-5] Evaluating usability: results III Nielsen Net Ratings 1999 registered visitors of portal sites, i.e., those who can customize, - spend > 3 times longer at home portal than others - view 3-4 times more pagesPKDD 2001 Tutorial: "KDD for Personalization" [E-6]
  50. 50. Why are results scarce? Possible reasons "In essence, web design is a problem in user interface design. However, ... few web designers can afford to subject their web sites to formal usability testing in special labs." Perkowitz & Etzioni [52]: Adaptive web sites: an AI challenge. "Web personalization is much over-rated and mainly used as a poor excuse for not designing a navigable website." Nielsen [47]: Personalization is over-rated. "Personalization costs. ... You’re more likely to get a good return on your efforts ... by fixing other problems, such as difficulty in locating content." Lighthouse on the Web [36], quoting from Mainspring and User Interface EngineeringPKDD 2001 Tutorial: "KDD for Personalization" [E-7] Can other results be transferred? Research on adaptive educational software since ~ 1970 - usually, user control helpful for learning; adaptive interfaces particularly helpful for novices - interfaces changing over time: difficult to learn - adaptive presentation (more info depending on user knowledge) improves comprehension and reduces reading time - adaptive link annotation - can reduce no. of visited pages + learning time - encourages novices to navigate non-sequentially - enables users to rate the difficulty of a page betterPKDD 2001 Tutorial: "KDD for Personalization" [E-8]
  51. 51. Can other results be transferred? (contd.) - adaptive link ordering improves user performance in information search tasks - but unstable order of options is confusing for novices so hiding is better for novices - for novices, direct guidance is useful ("next" link is most popular choice) - the more users agree with the system’s suggestions, the better their test results (surveys in [11,12])PKDD 2001 Tutorial: "KDD for Personalization" [E-9] Further factors affecting subjective satisfaction- user control (general guideline for software development)- must match user’s interests at the moment- users don’t want extra work: "paradox of the active user"- users don’t like to be recognized too soon- users want to be anonymous, at least at certain times- users want openness / disclosure- people don’t want relationships with corporations, but with other people- be specific without being exclusive- consider information structure on Web (non-monetary rewards better than differential pricing) respect the user !PKDD 2001 Tutorial: "KDD for Personalization" [E-10]
  52. 52. È ØØ ÖÒ Ú ÐÙ Ø ÓÒ ÖÓÑ Ø Ù× Ò ×× È Ö×Ô Ø ÚÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ ÅÝÖ ËÔ Ð ÓÔÓÙÐÓÙ ÀÀÄ º[E-11] ºº Í× Ö Ë Ø × Ø ÓÒ ² Ù× Ò ×× ËÙ ×× ÓÑÔ ÒÝ ÓÔ Ö Ø Ò Ï ×Ø × ÓÙÐ Ö ØÓ Ö Ø Ú ÐÙ ÓÖ Ø× ´ÔÖÓ×Ô ØÚ µ Ù×ØÓÑ Ö× µ Á Ø Ö × ÒÓ Ú ÐÙ ÓÖ Ø Ù× Ö׸ Ø Ý Û ÐÐ ÒÓØ ÙÝ Ò Ø Ý Û ÐÐ ÒÓØ ÓÑ Òº µ Á Ø Ù× Ö×» Ù×ØÓÑ Ö× Ö ÒÓØ × Ø × ¸ Ø Ý Û ÐÐ ÒÓØ ÙÝ Ò »ÓÖ Ø Ý Û ÐÐ ÒÓØ ÓÑ Òº µ Í× Ö» Ù×ØÓÑ Ö × Ø × Ø ÓÒ × ÔÖ Ö ÕÙ × Ø ÓÖ Û ÒÒ Ò Ø Ñ ØÓ Ø ÓÑÔ Òݺ ¯ ÓÒÚ Ö× ÓÒ Ì Ù× Ö ÓÑ × Ù×ØÓÑ Öº Ï ÒÒ Ò Ñ Ò× ¯ Ê Ø ÒØ ÓÒ Ì Ù×ØÓÑ Ö ×Ø Ý× ÐÓÝ ÐºÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-12]
  53. 53. Í× Ö Ë Ø × Ø ÓÒ ÅÓ ÐÐ Ò ÁÒ ØÓÖ× Ø Ø Ö ÕÙ Ö ÒØ Ö Ø ÓÒ Û Ø Ø Ù× Ö ¯ ÁÒØ Ö Ø Ú ØÝ ¯ × Ó Ù× ¯ ÈÐ × Ò ÒÚ ÖÓÒÑ Òظ ÒØ ÖØ Ò Ò ÒÚ ÖÓÒÑ ÒØ ¯ ÅÙÐØ ÔÐ Ò Ú Ø ÓÒ Ñ Ø Ô ÓÖ× ¯ ººº ¯ Î ÐÙ Ö Ø ÓÒ¸ × Ô Ö Ú Ý Ø Ù× Ö ÁÒ ØÓÖ× Ø Ø Ò Ñ ×ÙÖ » ÔÔÖÓÜ Ñ Ø Û Ø ÓÙØ Ù× Ö ÒØ Ö Ø ÓÒ ¯ È × Ô Ö Ú × ØÓÖ ¯ ÙÖ Ø ÓÒ Ó ×Ø Ý ¯ Î × ØÓÖ× Ô Ö Ô ¼ ¯ Ê ×ÔÓÒ× Ø Ñ ¼Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-13] Í× Ö Ë Ø × Ø ÓÒ ÓÑÔÙØ Ø ÓÒ ¯ Á ÒØ Ø ÓÒ Ó × Ø Ó × Ø× Ø ÓÒ Ò ØÓÖ× ¯ × ÒÓ Ò ÔÔÖÓÔÖ Ø ÕÙ ×Ø ÓÒÒ Ö ¯ ÈÖ × ÒØ Ø ÓÒ Ó Ø ÕÙ ×Ø ÓÒÒ Ö ØÓ Ö ÔÖ × ÒØ Ø Ú Ù× Ö × ÑÔÐ ¯ Ò ÐÝ× × Ó Ø Ö ×ÔÓÒ× × ¯ ÓÒ ÐÙ× ÓÒ× ÓÒ Ø ÑÔ Ø Ó Ø ÓÖÖ Ð Ø ÓÒ× ÑÓÒ Ø × Ø× Ø ÓÒ Ò ØÓÖ×Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-14]
  54. 54. Í× Ö Ë Ø × Ø ÓÒ Ò ÜÔ Ö Ñ ÒØ Ì ×ØÙ Ý Ó Ñ Ý ¾½ ¯ ØÓÖ× Ö Ø Ò Ù× Ö × Ø × Ø ÓÒ ¡ × Ó Ù× ¡ ÁÒ ÓÖÑ Ø ÓÒ ÙØ Ð ØÝ Ó Ø ÔÖ × ÒØ ÓÒØ ÒØ ¡ ØØÖ Ø Ú Ò ×× Ó Ø ÔÖ × ÒØ Ø ÓÒ Ñ Ø Ô ÓÖ ¡ ººº ¯ ÜÔ Ö Ñ ÒØ Ð × ØØ Ò × ÓÖ Ø Ú ÐÙ Ø ÓÒ Ó × ØÓ ÓÑÑ Ö Ð ×Ø × ¡ Å ÔÔ Ò Ó Ø ØÓÖ× ÓÒ ÕÙ ×Ø ÓÒÒ Ö ¡ ×Ø Ð × Ñ ÒØ Ó ÖÓÙÔ Ó Ö ÔÖ × ÒØ Ø Ú Ù× Ö× ¡ ÜÔ Ö Ñ ÒØ Ø ÓÒ ÓÒ ÐÓ Ð ÓÑÔÙØ Ö ÔÓÓÐ Ò Ú ØÖÓ ¯ ËØ Ø ×Ø Ð Ò ÐÝ× × Ó Ø Ù× Ö Ö ×ÔÓÒ× × ¯ Ê Ò Ò Ó Ø ØÓÖ× Ý ÑÔÓÖØ ÒÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-15] Ì Ò Ò × Ó ¾½ Ö ¯ ÉÙ Ð ØÝ Ó Ø ÔÖ × ÒØ Ø ÓÒ Ñ Ø Ô ÓÖ ÒØ ÖØ ÒÑ ÒØ Û Ò ×× Ò Ø ×Ø ÔÐ Ý× Ø ÑÓ×Ø ÑÔÓÖØ ÒØ ÖÓÐ º ¯ ÁÒ ÓÖÑ Ø ÓÒ ÙØ Ð ØÝ Ì ÑÓÙÒØ Ó Ò ÓÖÑ Ø ÓÒ Ñ Ú Ð Ð × Ø × ÓÒ ÑÓ×Ø ÑÔÓÖØ ÒØ ØÓÖº ÙÖØ Ö Ò Ò × Ì Û × Ø × Ø ×Ø ÒÓØ Ñ ×ØÖÓÒ Ò Ù× ÙÐ ÓÒÒ Ø ÓÒ ÛØ Ø ÒØ Ö ×Ø× Ó Ø ×ØÙ Ý Ô ÖØ Ô ÒØ× Ò ÒÓØ ×Ù Ò Ö ØÒ ÓÒØ ÜØ Ò × Ò× Ó ÓÑÑÙÒ ØÝ Ò ØÓ ÙÐ ÓÒØ ÒÙ Ò Ö Ð Ø ÓÒ× Ô ÛØ Û ×Ø Ù× Ö× ºÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-16]
  55. 55. ź ËÔ Ò ÓÐ Ò Ö ÔØÙÖ × Ú Ý Ö× Ó ÒØ ¹ Ù×ØÓÑ Ö¹× Ø × Ø ÓÒ Ö ÔÓÖØ× ÒØÓ Ø ÕÙ ×Ø ÓÒ Á× Ù×ØÓÑ Ö Ë Ø × Ø ÓÒ ÁÖÖ Ð Ú ÒØ Ò×Û Ö Ù×ØÓÑ Ö Ñ ×ÙÖ Ñ ÒØ ×Ý×Ø Ñ× × ÓÙÐ Ö Ú×Ø ºÈà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-17] Í× Ö Ë Ø × Ø ÓÒ ² Ù× Ò ×× ËÙ ×× ¯ Í× Ö» Ù×ØÓÑ Ö × Ø × Ø ÓÒ × ÔÖ Ö ÕÙ × Ø ÓÖ Û ¹× Ø ³× ×Ù ×׺ ¯ Í× Ö» Ù×ØÓÑ Ö × Ø × Ø ÓÒ Ó × ÒÓØ ÑÔÐÝ Û ¹× Ø ³× ×Ù À

×