• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
How To Web - Introduction To Data Mining For Web Applications
 

How To Web - Introduction To Data Mining For Web Applications

on

  • 2,009 views

 

Statistics

Views

Total Views
2,009
Views on SlideShare
1,933
Embed Views
76

Actions

Likes
1
Downloads
0
Comments
0

4 Embeds 76

http://www.how-to-web.net 60
http://2009.how-to-web.net 14
http://www.mobile-news.ro 1
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului
  • “secrete” – “secretomanie” – oamenii nu suntinformati. Se tin informatiile la nivelulmanagerilor. Oamenii se simtdezinformati. Pierdereaincrederii in managerul direct. Zvonurilesuntincurajate.Oamenii nu stiu ce trebuie sa facapentru a puteaavansa in cariera deoarecemanagerii nu spun:Cum se avanseazaCare suntdirectiile de avansareIn ce directie se indreaptaechipaCare e strategiacampusului

How To Web - Introduction To Data Mining For Web Applications How To Web - Introduction To Data Mining For Web Applications Presentation Transcript

  • Introduction to Data Mining forWeb Applications
    Paul-Alexandru Chirita, Ph.D.
  • About Me
    Education:
    Ph.D., Information Retrieval & Data Mining, Univ. of Hannover, Germany
    B.Sc., Ecole Polytechnique, Paris, France + “Politehnica” Univ. Bucharest, CS Dept.
    Roughly 8 yrs. in IT, out of which 7 in IR & DM
    Now in Adobe Romania (L3S, Yahoo!, Schlumberger and others in the past)
  • Web Mining
    The application of Data Mining algorithms to discover patterns in the Web.
    Three dimensions:
    Usage Mining
    Analyzes various access logs in order to provide input to Business Decisions
    By far the most used, with the highest ROI
    Content Mining
    Analyzes Web page content in order to extract useful information (e.g., keywords, topic, content type, sentiment, etc.)
    Structure Mining
    Also known as “Link Analysis”
    Investigates the hyperlink structure of the Web to improve current algorithms
  • Agenda
    Client side tools
    Google Analytics
    Omniture
    Server side tools
    AW-Stats
    Webalizer / AWF-Full
    Advanced analytics
  • Agenda
    Client side tools
    Google Analytics
    Omniture
    Server side tools
    AW-Stats
    Webalizer / AWF-Full
    Advanced analytics
  • Client side tools
    Purpose:
    Return basic information about traffic on your Web Site, SEO
    Most of them are also (partly) integrated with Monetization Tools (e.g., AdWords)
    Pros:
    Hosted by third party sites, zero or minimal cost for you
    Easy to implement and integrate, no maintenance
    Cons:
    The client side tracking code will eat some of your bandwidth (~200-600 ms. additional response time)
    If your traffic increases “too much” you have to pay
  • Client-side tools: Google Analytics
    Free, and well-engineered!
    Shows statistics about:
    Basic stuff: Visits, Pages, etc.
    Visitor profiles: Browser, OS, Language/Locale
    Visitor loyalty: How many times did each visitor return to your site, When was the last time they did it, For how long
    Trends: Is your traffic & popularity growing or decreasing
    Traffic sources: Entry/Exit pages, Referring sites & search engines
    Some customization planned for the near-term future
    Good for personal or small scale sites
    https://www.google.com/analytics
  • Client-side tools: Google Analytics [2]
  • Omniture: Site Catalyst
    Low price per thousand of entries, but may become costly if you have a lot of traffic (millions of visits per day) or if you have many dozens of sensors
    Same statistics as Google Analytics, but you can drill down very deep:
    Statistics per hour of day, per file type (html, cfm, etc.), per action type (download, view page, etc.)
    Visitor segmentation down to the level of city
    Purchases, Promotions, and Many metrics for e-commerce (e.g., how many products added to the cart have actually been checked out)
    Most importantly, you can define ANY metric you want! (e.g., how many people click on my survey link, how many of them fill it in, etc.)
    www.omniture.com
  • Omniture: Site Catalyst [2]
  • Agenda
    Client side tools
    Google Analytics
    Omniture
    Server side tools
    AW-Stats
    Webalizer / AWF-Full
    Advanced analytics
  • Server side tools
    Purpose:
    Return basic information about traffic on your Web Site
    Similar to the client-side tools, but currently more focused on Reliability & Application Improvements
    Pros:
    Most importantly, zero bandwidth overhead for your app (Every ms counts!)
    Show a lot of developer specific information (errors, visitor browsers/OS, etc.)
    Very easy to install
    Cons:
    Usually open source, but hard to extend with your own metrics
  • FREE Server side tools
    Similar statistics as with the Client Side tools, but…
    Less business specific information (do not include Visitor Loyalty, Trends, etc.)
    More developer specific data (errors & error types, HTTP status codes, etc.)
    Good for medium and large scale sites
    http://awstats.sourceforge.net/
    http://www.stedee.id.au/awffull/
  • Server side tools: AW Stats
  • Server side tools: Webalizer / AWF-Full
  • Paid Server side tools
    Overcome most limitations of the free tools
    Log everything into text files (see next Section)
    Provide some sort of SQL-like query language which helps you define any type of query you want
    Run reports much faster
    The most expensive of them all, meant for professional use
    http://www.splunk.com/
  • Agenda
    Client side tools
    Google Analytics
    Omniture
    Server side tools
    AW-Stats
    Webalizer / AWF-Full
    Advanced analytics
  • How is this done in the heavy weight category ;-)
    Multiple log files, one per each functionality checked
    As simple as possible (see next slide for an example)
    The main guideline is to be able to parse any log file and generate statistics using only the command line
    Example: Tab separated
  • Sample log
    Date & Time IP (hashed) User ID (hashed) Query Parameters
    Sep 28 06:49:42 Ea9hjnc4ufTfU anonymous spell checker :0:10:en_US:en_US:0:0
    Sep 28 06:49:42 8NCTsHqR366 anonymous javascript :0:10:fr_FR:fr_FR:0:1
    Sep 28 06:49:42 K4nD5xy/R5fw anonymous text :0:10:en_US:en_US:0:1
    Sep 28 06:49:43 lRqBaIaUWxna yxDkhBEqC6xxR8z= module :0:10:en_US:en_US:0:0
    Sep 28 06:49:44 jMjJpy6bHAdb hPFLKaMNeShD0= delete spread :0:10:en_US:en_US:0:0
    Sep 28 06:49:44 r3xgRLagX1cQ6 anonymous _x :0:10:ru_RU:ru_RU:0:0
    Sep 28 06:49:45 b2DLBl3VTT67Q anonymous anti a :0:10:de_DE:de_DE:0:0
    Sep 28 06:49:45 KaKiB2ITEdPeM VcLic9CIy4QxVtJQ= create a star :0:10:en_US:en_US:0:0
  • What can be done using this data
    You can basically measure everything ;-)
    Plus you can enable loads of new features:
    Personalization for search, sold/promoted products, etc.
    Browsing recommendations
    Improve site organization (make popular pages more accessible, promote some other pages and track their traffic increase, etc.)
    Search suggestions
    Advertising (keyword selection, etc.)
  • Personalized search and promotions
    Show different results/ads to different users
  • Browsing recommendations
  • Search suggestions