Web Log Files Web servers capture logs of browser HTTP calls HTTP call from user is generated  When a user clicks on a hyperlink in a browser window When the page loaded in browser has an embedded image Code in javascript to redirect page  Web log file records these calls
Web Log file sample ip68-14-105-135.no.no.cox.net - - [01/Jul/2004:00:50:30 -0500] "GET /images/mba_main.gif HTTP/1.1" 200 13694 "http://www.business.uno.edu/mba/index.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)" 199.72.78.18 - - [01/Jul/2004:03:46:54 -0500] "GET /images/mba_main.gif HTTP/1.1" 304 - "http://www.business.uno.edu/mba/index.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
Data in log file ip68-14-105-135.no.no.cox.net - - IP Address Dynamic or Static Dynamic addresses recorded by ISP Static Addresses in database: Reverse DNS lookup – can yield location 199.72.78.18 - - [01/Jul/2004:03:46:54 -0500] "GET /images/mba_main.gif HTTP/1.1" 304 -
IP Reverse lookup Public databases of assigned IP addresses http://www.ip2location.com/free.asp   IP Address  Country (Short)  Country (Full)  Flag  Region  City  ISP  Map  199.72.78.18USUNITED STATES LOUISIANANEW ORLEANSRAMADA PLAZA HOTEL INN  Note: IP addresses get reassigned – reverse lookup is valid only when log file is generated, not years later Location value can be user to customize site    personalization, mass customization
Log  Time stamp ip68-14-105-135.no.no.cox.net - - [01/Jul/2004:00:50:30 -0500] Time on server (many servers use GMT for a common global reference) If log is in local time, it typically provides offset from GMT. Note daylight savings time changes in your analysis
Download Information GET /images/mba_main.gif HTTP/1.1" 200 13694  http://www.business.uno.edu/mba/index.html Note: each image is a separate download HTTP status and file size Page called from is also provided
HTTP Status codes http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html 200 OK  The request has succeeded. 301 Moved Permanently  R equested resource has been assigned a new permanent URI 400 Bad Request R equest could not be understood 403 Forbidden S erver understood the request, but is refusing to fulfill it 404 Not Found S erver has not found anything matching the Request-URI. 500 Internal Server Error  S erver encountered an unexpected condition
Browser Info Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)" User’s Browser and version User’s OS and version Java version running on browser Other plug-ins installed + versions Plus a whole lot more http://www.cyscape.com/products/bhawk/new.aspx?bhcp=1 Customize site for best user experience
Analytics Hits – one HTTP request Visit – Group of hits from one visitor A visit (session)  is a collection of hits from one IP address with a break no longer than 20 minutes (common standard) This groups hits into more meaningful data Hits by themselves don’t make much sense from a management perspective – other than load management on server.
3,936 Visitors Who Visited More Than Once 4,497 Visitors Who Visited Once 8,433 Unique Visitors Visitors 98.11% Visits From Your Country: United States (US) 0.01% Visits of Unknown Origin 1.88% International Visits 00:09:54 Average Visit Length 1,223 Average Per Day 15,899 Visits Visits 60,919 Document Views 10,041 Dynamic Pages and Forms Views 5,458 Average Per Day 70,960 Page Views (Impressions) Pages 3,590 Home Page Hits 7,975 Average Hits Per Day 103,685 Successful Hits For Entire Site Hits General Statistics
Page View (Impressions)  - A hit to any file classified as a page. Contrast the value for "Page Views" with the value for "Successful Hits For Entire Site," which includes hits to files of every type. Page  - Any document, dynamic page, or form. Different types of profiles have different default settings for which file extensions qualify a file as a document. These settings can be changed by the Reporting Center system administrator. Any URL containing a question mark is considered a dynamic page. Any file with a POST command is considered a form. Hits  - Each file requested by a visitor registers as a hit. There can be several hits on each page. While the volume of hits reflects the amount of server traffic, it is not an accurate reflection of the number of pages viewed.  Dynamic Pages and Forms Views  - Number of hits to pages that are considered dynamic pages or forms. Reporting Center considers any URL containing options (with a question mark in the URL) a dynamic page. Any file with a POST command is considered a form.  Document Views  - Number of hits to pages that are considered documents--not dynamic pages or forms--as defined by the system administrator.  Average Visits Per Day  - Number of visits divided by the total number of days in the log file.  Average Visit Length  - Average of all non-zero length visits in the reporting period. A zero-length visit occurs when all hits in that visit are logged with the exact same time stamp. Average Hits Per Day  - Number of successful hits divided by the total number of days in the log file.
The General Statistics page provides an overview of your Web site's performance and visitor behavior, and can help you determine which chapters will be most valuable to you. Visitors Who Visited Once  - Number of visitors who visited the site exactly once during the reporting period. Visitors Who Visited More Than Once  - Number of visitors who visited the site more than once during the reporting period.  Visits of Unknown Origin  - Percentage of visits where the visitor's domain name could not be determined or the country associated with the domain name could not be determined. Visits From Your Country  - Percentage of visits from your country. The name of your country and the country code are shown. Your system administrator configures the selection for your country.  Visits  - Number of visits to your site. A visit is a series of actions that begins when a visitor views their first page from the server, and ends when the visitor leaves the site or remains idle beyond the idle-time limit. The default idle-time limit is thirty minutes. This time limit can be changed by the system administrator.  Unique Visitors  - The total number of unique visitors during the report period. A unique visitor is identified by their IP address, domain name, or cookie. Successful Hits For Entire Site  - Number of successful hits including HTML pages, images, forms, scripts, and downloaded files.
Referring Sites and Entry/Exit Pages Referring Sites: A Web site which refers a visitor to your site by linking to it.  Where do you advertise?  What do you pay for advertising Entry Page:  The first page viewed during a visit  Exit Page:  The page on which users leave the site Study this to prepare your site (entry to you store) and exit patterns to ensure maximum sales/best interaction
Paths through site Path Through Site  - The sequence of pages a visitor views, from the entry page to the exit page.  Paths from Start  - With the exception of the starting page, the pages of the top paths taken through your site.  Use this information to evaluate the design of your Web site. Where do your visitors go once they reach your site? Which pages are visited first? Do your visitors appear to be looking for pages that should be more accessible?
Visitor analysis New Visitors   Visitors who didn't have a cookie from your site on their first hit, but had one on later hits.  Returning Visitors   Visitors who already had a cookie from your site when they visited.  Visitors Without Cookies   Visitors who came to your site with cookies disabled. There is no way to determine if these visitors are new or returning.  Authenticated Username   A unique visitor tracked by user name and password rather than by IP address. Authentication is a much more accurate way to identify visitors since many ISP use dynamic IP addressing.
Summary Web log files contain a wealth of information about visitor to site E-business need to track visits and improve site to meet customer requirements Global, dynamic environment: Live analysis of site performance and customer interaction  Need to face new competition Analysis tools http:// www.webtrends.com

Web Log Files

  • 1.
    Web Log FilesWeb servers capture logs of browser HTTP calls HTTP call from user is generated When a user clicks on a hyperlink in a browser window When the page loaded in browser has an embedded image Code in javascript to redirect page Web log file records these calls
  • 2.
    Web Log filesample ip68-14-105-135.no.no.cox.net - - [01/Jul/2004:00:50:30 -0500] "GET /images/mba_main.gif HTTP/1.1" 200 13694 "http://www.business.uno.edu/mba/index.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)" 199.72.78.18 - - [01/Jul/2004:03:46:54 -0500] "GET /images/mba_main.gif HTTP/1.1" 304 - "http://www.business.uno.edu/mba/index.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
  • 3.
    Data in logfile ip68-14-105-135.no.no.cox.net - - IP Address Dynamic or Static Dynamic addresses recorded by ISP Static Addresses in database: Reverse DNS lookup – can yield location 199.72.78.18 - - [01/Jul/2004:03:46:54 -0500] "GET /images/mba_main.gif HTTP/1.1" 304 -
  • 4.
    IP Reverse lookupPublic databases of assigned IP addresses http://www.ip2location.com/free.asp   IP Address  Country (Short)  Country (Full)  Flag  Region  City  ISP  Map  199.72.78.18USUNITED STATES LOUISIANANEW ORLEANSRAMADA PLAZA HOTEL INN Note: IP addresses get reassigned – reverse lookup is valid only when log file is generated, not years later Location value can be user to customize site  personalization, mass customization
  • 5.
    Log Timestamp ip68-14-105-135.no.no.cox.net - - [01/Jul/2004:00:50:30 -0500] Time on server (many servers use GMT for a common global reference) If log is in local time, it typically provides offset from GMT. Note daylight savings time changes in your analysis
  • 6.
    Download Information GET/images/mba_main.gif HTTP/1.1" 200 13694 http://www.business.uno.edu/mba/index.html Note: each image is a separate download HTTP status and file size Page called from is also provided
  • 7.
    HTTP Status codeshttp://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html 200 OK The request has succeeded. 301 Moved Permanently R equested resource has been assigned a new permanent URI 400 Bad Request R equest could not be understood 403 Forbidden S erver understood the request, but is refusing to fulfill it 404 Not Found S erver has not found anything matching the Request-URI. 500 Internal Server Error S erver encountered an unexpected condition
  • 8.
    Browser Info Mozilla/4.0(compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)" User’s Browser and version User’s OS and version Java version running on browser Other plug-ins installed + versions Plus a whole lot more http://www.cyscape.com/products/bhawk/new.aspx?bhcp=1 Customize site for best user experience
  • 9.
    Analytics Hits –one HTTP request Visit – Group of hits from one visitor A visit (session) is a collection of hits from one IP address with a break no longer than 20 minutes (common standard) This groups hits into more meaningful data Hits by themselves don’t make much sense from a management perspective – other than load management on server.
  • 10.
    3,936 Visitors WhoVisited More Than Once 4,497 Visitors Who Visited Once 8,433 Unique Visitors Visitors 98.11% Visits From Your Country: United States (US) 0.01% Visits of Unknown Origin 1.88% International Visits 00:09:54 Average Visit Length 1,223 Average Per Day 15,899 Visits Visits 60,919 Document Views 10,041 Dynamic Pages and Forms Views 5,458 Average Per Day 70,960 Page Views (Impressions) Pages 3,590 Home Page Hits 7,975 Average Hits Per Day 103,685 Successful Hits For Entire Site Hits General Statistics
  • 11.
    Page View (Impressions) - A hit to any file classified as a page. Contrast the value for "Page Views" with the value for "Successful Hits For Entire Site," which includes hits to files of every type. Page - Any document, dynamic page, or form. Different types of profiles have different default settings for which file extensions qualify a file as a document. These settings can be changed by the Reporting Center system administrator. Any URL containing a question mark is considered a dynamic page. Any file with a POST command is considered a form. Hits - Each file requested by a visitor registers as a hit. There can be several hits on each page. While the volume of hits reflects the amount of server traffic, it is not an accurate reflection of the number of pages viewed. Dynamic Pages and Forms Views - Number of hits to pages that are considered dynamic pages or forms. Reporting Center considers any URL containing options (with a question mark in the URL) a dynamic page. Any file with a POST command is considered a form. Document Views - Number of hits to pages that are considered documents--not dynamic pages or forms--as defined by the system administrator. Average Visits Per Day - Number of visits divided by the total number of days in the log file. Average Visit Length - Average of all non-zero length visits in the reporting period. A zero-length visit occurs when all hits in that visit are logged with the exact same time stamp. Average Hits Per Day - Number of successful hits divided by the total number of days in the log file.
  • 12.
    The General Statisticspage provides an overview of your Web site's performance and visitor behavior, and can help you determine which chapters will be most valuable to you. Visitors Who Visited Once - Number of visitors who visited the site exactly once during the reporting period. Visitors Who Visited More Than Once - Number of visitors who visited the site more than once during the reporting period. Visits of Unknown Origin - Percentage of visits where the visitor's domain name could not be determined or the country associated with the domain name could not be determined. Visits From Your Country - Percentage of visits from your country. The name of your country and the country code are shown. Your system administrator configures the selection for your country. Visits - Number of visits to your site. A visit is a series of actions that begins when a visitor views their first page from the server, and ends when the visitor leaves the site or remains idle beyond the idle-time limit. The default idle-time limit is thirty minutes. This time limit can be changed by the system administrator. Unique Visitors - The total number of unique visitors during the report period. A unique visitor is identified by their IP address, domain name, or cookie. Successful Hits For Entire Site - Number of successful hits including HTML pages, images, forms, scripts, and downloaded files.
  • 13.
    Referring Sites andEntry/Exit Pages Referring Sites: A Web site which refers a visitor to your site by linking to it. Where do you advertise? What do you pay for advertising Entry Page: The first page viewed during a visit Exit Page: The page on which users leave the site Study this to prepare your site (entry to you store) and exit patterns to ensure maximum sales/best interaction
  • 14.
    Paths through sitePath Through Site - The sequence of pages a visitor views, from the entry page to the exit page. Paths from Start - With the exception of the starting page, the pages of the top paths taken through your site. Use this information to evaluate the design of your Web site. Where do your visitors go once they reach your site? Which pages are visited first? Do your visitors appear to be looking for pages that should be more accessible?
  • 15.
    Visitor analysis NewVisitors Visitors who didn't have a cookie from your site on their first hit, but had one on later hits. Returning Visitors Visitors who already had a cookie from your site when they visited. Visitors Without Cookies Visitors who came to your site with cookies disabled. There is no way to determine if these visitors are new or returning. Authenticated Username A unique visitor tracked by user name and password rather than by IP address. Authentication is a much more accurate way to identify visitors since many ISP use dynamic IP addressing.
  • 16.
    Summary Web logfiles contain a wealth of information about visitor to site E-business need to track visits and improve site to meet customer requirements Global, dynamic environment: Live analysis of site performance and customer interaction Need to face new competition Analysis tools http:// www.webtrends.com