A clickstream is the recording of the parts of the
screen a computer user clicks on while web browsing
or using another software application.
As the user clicks anywhere in the webpage or
application, the action is logged on a client or inside
the web server, as well as possibly the web browser,
router, proxy server.
Clickstream analysis is useful for web activity analysis,
software testing, market research, and for analyzing
Clickstream as defined by Internet Advertising Bureau
“The electronic path a user takes while navigating from site
to site, and from page to page within a site.
It is a comprehensive body of data describing the sequence of
activity between a user’s browser and any other Internet
resource, such as a Web site or third party ad server”
The click stream data is analyzed to identify different
paths taken by the visitors and the sequence of pages that
lead to payment of membership fee.
Based on this analysis, specific strategies are
recommended to maximize the revenue for the website.
The main point of clickstream tracking is to give
webmasters insight into what visitors on their site are
Data is obtained from the site in the form of click stream
records. Each record consists of the details of clicks by the
visitors and each record contains the following details:
Time stamp with Date
Status: HTTP Status code
URL requested: has three subfields namely The request
method, resource requested and the protocol used
No. of bytes transferred
The country of origin for a specific request is identified
using the IP address.
URL is used to identify the information/web page browsed by
Time stamp of each click is used to sequence the movement of
the visitors across different pages in the website.
Identifying a unique user session is an important step in the
analysis of click stream data. Inactivity for more than 30
minutes is considered as a break of session.
This is an approximation since there could be multiple users
accessing from the same IP, or the same user accessing from
Due to lack of more data available we consider hits from each
unique IP as belonging to a unique user for a unique session.
The Web provides marketers with huge amounts of
information about users
⇒This data is collected automatically
Server-side data collection
Log file analysis - historical data
Real-time profiling (tracking user Clickstream analysis)
Client-side data collection (cookies)
These techniques did not exist prior to the Internet.
⇒They allow marketers to make quick and responsive changes in
Web pages, promotions, and pricing.
⇒The main challenge is analysis and interpretation
Web server log files
• All web servers automatically log (record) each http request
• A server log is a log file (or several files) automatically
created and maintained by a server of activity performed by
• A typical example is a web server log which maintains a
history of page requests.
• Most log file formats can be extended to include “cookie”
– This allows you to identify a user at the “visitor” level
Web Server Logging –
How Does it Work?
Web servers such as Apache or Microsoft IIS record
activity as they receive and fulfill requests.
Web servers provide general-purpose logging at a
very detailed level.
To prepare the data for analysis, the web team must
clean and organize log records – a big job!
What log files can record includes:
Number of requests to the server (hits)
Number of page views
Total unique visitors (using “cookies”)
The referring web site
Number of repeat visits
Time spent on a page
Route through the site (click path)
Search terms used
Most/least popular pages
Software for log file analysis (web analytics)
• Market leader is Webtrends
How do you use log files effectively?
1. Identify leading indicators of business success
2. Identify the key performance metrics with which
to measure them
3. Establish benchmarks to track changes over time
4. Configure software and use settings consistently
Shortcomings of log file
Cannot identify individual people. The log file records
the computer IP address and/or the “cookie”, not the
Information may be incomplete because of caching.
Assumptions made in defining “user sessions” may be
This is why benchmarking is so important
trends rather than absolute numbers
Log file analysis is a useful tool to:
identify what visitors are looking for
what content they find most interesting
which search and navigation tools they find most useful
whether promotions are being successful
identify normal volatility in usage levels
measure growth in site usage as compared to overall
Enhancing marketing tactics using web analytics - some
Identify point of drop-off in registration or purchasing
Pinpoint problem and concentrate efforts on the apparent
trouble spot to improve conversion rates.
Maximize cross-selling opportunities in an on-line
Identify the top non-purchased products that customers also
looked at before completing the purchasing process.
Add these products in as suggestions
Refine search engine placements by implementing
Use referrer files to identify commonly used search terms and
the search engine or directory that sent the customer.
Improve web site structure using web analytics
- some examples
Analysis of search logs to improve findability on the
Do people search by “category” rather than “uniquely
identifying” search terms?
Redesign home page to enhance visibility of most
commonly used links and therefore promote usability.
Demote least used items to “below the fold”
Analyze “click paths”, entry and exit points to trace
most common routes around the site.
Identify areas where navigation seems unclear or confusing
Improve navigation to match demonstrated user preferences.
Clickstream monitoring and
How does Amazon.com do that?
This type of personalization is very complex and
expensive to achieve
Existing customers and order databases must be mined for
People who bought a Nora Jones CD also bought a John Grisham
Called collaborative filtering
Real-time monitoring of customers on your site needed, so you
can make recommendations or special offers at the right time
Becomes even more complex when combined with
information actually provided by the customer
Data Analysis and Distribution
Data collected from all customer touch points are:
Stored in the data warehouse,
Available for analysis and distribution to marketing
Analysis for marketing decision making:
RFM analysis (recency, frequency, monetary
Data mining = extraction of hidden predictive
information in large databases through statistical
Marketers are looking for patterns in the data such
Do more people buy in particular months
Are there any purchases that tend to be made after a
particular life event
Refine marketing mix strategies,
Identify new product opportunities,
Predict consumer behavior.
Real-space primary data collection occurs at offline
points of purchase with:
Smart card and credit card readers, interactive point
of sale machines (iPOS), and bar code scanners are
mechanisms for collecting real-space consumer data.
Offline data, when combined with online data, paint a
complete picture of consumer behavior for individual
Customer profiling = uses data warehouse information to help
marketers understand the characteristics and behavior of specific
Understand who buys particular products,
How customers react to promotional offers and pricing changes,
Select target groups for promotional appeals,
Find and keep customers with a higher lifetime value to the firm,
Understand the important characteristics of heavy product users,
Direct cross-selling activities to appropriate customers;
Reduce direct mailing costs by targeting high-response customers.
RFM analysis (recency, frequency, monetary) = scans
the database for three criteria.
When did the customer last purchase (recency)?
How often has the customer purchased products
How much has the customer spent on product
purchases (monetary value)?
=> Allows firms to target offers to the customers who are
most responsive, saving promotional costs and increasing