Major Seminar                             On        Knowledge Discovery from Web LogsGuided By:                           ...
Introduction• Vast amount of Web site traversal information in the form  of Web logs are present.• By analyzing these logs...
Introduction• A particularly kind of knowledge which can be immediately  applied to the operation of the Web site is calle...
How big is the Web• More then 4 billion websites are on Internet.(According to  alexa.com)• At least 7.92 billion pages (T...
History• Previous approaches was only aimed to mine Web-log  knowledge for human consumption.• These days mining actionabl...
Fields in Web Log File• Reference Website www.hdwally.com Web Server: Apache         1. 66.249.71.6 - - [23/Feb/2012:06:23...
Fields in Web Log File• Access request : "GET /robots.txt HTTP/1.1“ and "GET /  HTTP/1.1”• Result status code : 500 and 50...
Example Of a Web Log File• fcrawler.looksmart.com - - [26/Apr/2000:00:00:12 -0400]  "GET /contacts.html HTTP/1.0" 200 4595...
Mining Web Logs for Path Profiles•   Data Cleaning on Web Log Data•   Mining Web Logs for Path Profiles•   Web Object Pred...
Data Cleaning on Web Log Data• Break apart a long sequence of visits by the users into user  sessions.• Identify user by a...
Web Log Mining for Prefetching• We have separate visiting sessions.• Now we can develop path profiles from these sessions ...
Web Object Prediction• it is possible to train a path-based model for predicting  future URLs based on a sequence of curre...
Learning to Prefetch Web Documents• Original cache memory is partitioned into two parts: cache-  buffer and prefetching-bu...
Web Page Clustering for Intelligent              User Interfaces• Web Logs can be used to build server-side customization ...
Applications•    Search Engines•    Similarity Measures•    Ontology•   information aggregation•    Recognition technology...
Advantages• Its easy to implement.• The companies can establish better customer relationship  by giving them exactly what ...
Reference• Weblogs from www.hdwally.com and  www.hdwallpaper4u.com .• www.jafsoft.com/searchengines/log_sample.html• Resea...
Queries ?
Thanks
Upcoming SlideShare
Loading in...5
×

Avtar's ppt

438

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
438
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Avtar's ppt

  1. 1. Major Seminar On Knowledge Discovery from Web LogsGuided By: Presented By:Saurabh Anand Avtar kishore GaurLecturer (IT/09/53)Department Of IT VIII Sem, IT Poornima College Of Engineering Sitapura,Jaipur
  2. 2. Introduction• Vast amount of Web site traversal information in the form of Web logs are present.• By analyzing these logs, it is possible to discover various kinds of knowledge, which can be applied to improve the performance of Web services.• It is possible to learn the behavior of the Web users by analyzing these logs.
  3. 3. Introduction• A particularly kind of knowledge which can be immediately applied to the operation of the Web site is called Actionable knowledge.• Mining of such knowledge is known as Knowledge Discovery from Web Logs.
  4. 4. How big is the Web• More then 4 billion websites are on Internet.(According to alexa.com)• At least 7.92 billion pages (Thursday, 23 February, 2012).(according to worldwidewebsize.com).
  5. 5. History• Previous approaches was only aimed to mine Web-log knowledge for human consumption.• These days mining actionable knowledge from Web logs is been used to improve the performance of Web Services.
  6. 6. Fields in Web Log File• Reference Website www.hdwally.com Web Server: Apache 1. 66.249.71.6 - - [23/Feb/2012:06:23:46 -0600] "GET /robots.txt HTTP/1.1" 500 7370 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)“ 2. 180.76.5.92 - - [23/Feb/2012:06:11:04 -0600] "GET / HTTP/1.1" 500 7370 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)“• IP Adress:-66.249.71.6 and 180.76.5.92• UserName:- -- and --• Timestamp :- [23/Feb/2012:06:23:46 -0600] and - [23/Feb/2012:06:11:04 -0600] (time of visit by webserver)
  7. 7. Fields in Web Log File• Access request : "GET /robots.txt HTTP/1.1“ and "GET / HTTP/1.1”• Result status code : 500 and 500 (Internal Server Error)• Bytes transferred : 7370 and 7370• User Agent: Mozilla/5.0• Referrer URL : (compatible; Googlebot/2.1; +http://www.google.com/bot.html) and (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
  8. 8. Example Of a Web Log File• fcrawler.looksmart.com - - [26/Apr/2000:00:00:12 -0400] "GET /contacts.html HTTP/1.0" 200 4595 "-" "FAST- WebCrawler/2.1-pre2 (ashen@looksmart.net)" fcrawler.looksmart.com - - [26/Apr/2000:00:17:19 -0400] "GET /news/news.html HTTP/1.0" 200 16716 "-" "FAST- WebCrawler/2.1-pre2 (ashen@looksmart.net)“• 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC )"
  9. 9. Mining Web Logs for Path Profiles• Data Cleaning on Web Log Data• Mining Web Logs for Path Profiles• Web Object Prediction• Learning to Prefetch Web Documents
  10. 10. Data Cleaning on Web Log Data• Break apart a long sequence of visits by the users into user sessions.• Identify user by an individual IP address.• Thus, data cleaning means to separate the visiting sequence of pages into visiting sessions.
  11. 11. Web Log Mining for Prefetching• We have separate visiting sessions.• Now we can develop path profiles from these sessions as user visiting a sequence of Web pages often leaves a trail of the pages URL’s in a Web log.• A path profile consists frequent subsequences from the frequently occurring paths.• Path profile helps us to predict the next pages that are most likely to occur.
  12. 12. Web Object Prediction• it is possible to train a path-based model for predicting future URLs based on a sequence of current URL accesses.• This can be done on a per-user basis, or on a per-server basis.• The former requires that the user-session be recognized and broken down nicely through a filtering system, and the latter takes the simplistic view that the accesses on a server is a single long thread.
  13. 13. Learning to Prefetch Web Documents• Original cache memory is partitioned into two parts: cache- buffer and prefetching-buffer.• A prefetching agent(Script) keeps pre-loading the prefetching-buffer with documents predicted to access next.
  14. 14. Web Page Clustering for Intelligent User Interfaces• Web Logs can be used to build server-side customization and transformation to make website more convenient for users to visit and find their objectives.• They path prediction algorithms that guess where the user wants to go next in a browsing session like WebWatcher and PageGather algorythm.
  15. 15. Applications• Search Engines• Similarity Measures• Ontology• information aggregation• Recognition technology• Summarization• E-commerce• Content management
  16. 16. Advantages• Its easy to implement.• The companies can establish better customer relationship by giving them exactly what they need.• To create personalized search engines, which can understand a person’s search queries in a personal way by analyzing and profiling user’s search behavior.• To improving caching and prefetching of Web objects.• Use the mined knowledge for building better, adaptive user interfaces.• Applying Web query log knowledge to improving Web search for a search engine application.
  17. 17. Reference• Weblogs from www.hdwally.com and www.hdwallpaper4u.com .• www.jafsoft.com/searchengines/log_sample.html• Research paper on Knowledge Discovery From Weblogs by S Chandra and Dr B Kalpana.• Researcalpana. paper on Mining Web Logs for Actionable Knowledge by Qiang Yang, Charles X. Ling and Jianfeng Gao.• http://www.galeas.de/webmining.html
  18. 18. Queries ?
  19. 19. Thanks
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×