Major Seminar On Knowledge Discovery from Web LogsGuided By: Presented By:Saurabh Anand Avtar kishore GaurLecturer (IT/09/53)Department Of IT VIII Sem, IT Poornima College Of Engineering Sitapura,Jaipur
Introduction• Vast amount of Web site traversal information in the form of Web logs are present.• By analyzing these logs, it is possible to discover various kinds of knowledge, which can be applied to improve the performance of Web services.• It is possible to learn the behavior of the Web users by analyzing these logs.
Introduction• A particularly kind of knowledge which can be immediately applied to the operation of the Web site is called Actionable knowledge.• Mining of such knowledge is known as Knowledge Discovery from Web Logs.
How big is the Web• More then 4 billion websites are on Internet.(According to alexa.com)• At least 7.92 billion pages (Thursday, 23 February, 2012).(according to worldwidewebsize.com).
History• Previous approaches was only aimed to mine Web-log knowledge for human consumption.• These days mining actionable knowledge from Web logs is been used to improve the performance of Web Services.
Fields in Web Log File• Reference Website www.hdwally.com Web Server: Apache 1. 188.8.131.52 - - [23/Feb/2012:06:23:46 -0600] "GET /robots.txt HTTP/1.1" 500 7370 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)“ 2. 184.108.40.206 - - [23/Feb/2012:06:11:04 -0600] "GET / HTTP/1.1" 500 7370 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)“• IP Adress:-220.127.116.11 and 18.104.22.168• UserName:- -- and --• Timestamp :- [23/Feb/2012:06:23:46 -0600] and - [23/Feb/2012:06:11:04 -0600] (time of visit by webserver)
Fields in Web Log File• Access request : "GET /robots.txt HTTP/1.1“ and "GET / HTTP/1.1”• Result status code : 500 and 500 (Internal Server Error)• Bytes transferred : 7370 and 7370• User Agent: Mozilla/5.0• Referrer URL : (compatible; Googlebot/2.1; +http://www.google.com/bot.html) and (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Mining Web Logs for Path Profiles• Data Cleaning on Web Log Data• Mining Web Logs for Path Profiles• Web Object Prediction• Learning to Prefetch Web Documents
Data Cleaning on Web Log Data• Break apart a long sequence of visits by the users into user sessions.• Identify user by an individual IP address.• Thus, data cleaning means to separate the visiting sequence of pages into visiting sessions.
Web Log Mining for Prefetching• We have separate visiting sessions.• Now we can develop path profiles from these sessions as user visiting a sequence of Web pages often leaves a trail of the pages URL’s in a Web log.• A path profile consists frequent subsequences from the frequently occurring paths.• Path profile helps us to predict the next pages that are most likely to occur.
Web Object Prediction• it is possible to train a path-based model for predicting future URLs based on a sequence of current URL accesses.• This can be done on a per-user basis, or on a per-server basis.• The former requires that the user-session be recognized and broken down nicely through a filtering system, and the latter takes the simplistic view that the accesses on a server is a single long thread.
Learning to Prefetch Web Documents• Original cache memory is partitioned into two parts: cache- buffer and prefetching-buffer.• A prefetching agent(Script) keeps pre-loading the prefetching-buffer with documents predicted to access next.
Web Page Clustering for Intelligent User Interfaces• Web Logs can be used to build server-side customization and transformation to make website more convenient for users to visit and find their objectives.• They path prediction algorithms that guess where the user wants to go next in a browsing session like WebWatcher and PageGather algorythm.
Advantages• Its easy to implement.• The companies can establish better customer relationship by giving them exactly what they need.• To create personalized search engines, which can understand a person’s search queries in a personal way by analyzing and profiling user’s search behavior.• To improving caching and prefetching of Web objects.• Use the mined knowledge for building better, adaptive user interfaces.• Applying Web query log knowledge to improving Web search for a search engine application.
Reference• Weblogs from www.hdwally.com and www.hdwallpaper4u.com .• www.jafsoft.com/searchengines/log_sample.html• Research paper on Knowledge Discovery From Weblogs by S Chandra and Dr B Kalpana.• Researcalpana. paper on Mining Web Logs for Actionable Knowledge by Qiang Yang, Charles X. Ling and Jianfeng Gao.• http://www.galeas.de/webmining.html