Data Collection Mechanism
• There are four core groups of data
• A click stream is the sequence of clicks or pages requested as a visitor explores a Web site. It
helps in understanding time spent by visitors on your site, how often they return and the
most frequently viewed pages etc..
• It help in understanding that if visitors came to the website and spent so much time on the
site, then what was the outcome for the customer or the company.
• Research (Qualitative)
• Qualitative research allows us to get really close to our customers and get a real-world feel for
their needs, wants, and perceptions of interactions with our websites..
• Competitive Data
• Competitive Data helps in understanding market trends, role of search engines, campaigns,
visitor demographics, and more stats about competitors' sites
• Four main ways of capturing clickstream data
• Web Logs
• Web Beacons
• Packet Sniffing
• Lets look at each of them in detail.
• The data capture process is as follows
• A user types a URL in browser
• The request for the webpage comes to one of the web servers
• The web server accepts the request and creates an entry in web log ( includes page
name, IP Address, Browser details & Date time stamp etc.)
• The web server sends the webpage to the customer
Visitor Browser Server
Easily Accessible Captures errors, server usage etc. but not optimally
suited for business information
Only mechanism to capture search engine visits &
Need to filter Image, Page error, CSS & robot traffic to
get accurate traffic trend
Many Log Parser Available Page caching by ISP & proxy server means that some
of traffic is invisible
You own the data for web log Inaccuracy in identifying visitor without setting
• They are 1 x 1 pixel transparent images that are placed in web pages, within an img src
HTML tag. The transparent images are usually hosted on a third-party server.
• Data capturing mechanism
• The customer types –in a URL in a browser.
• The request comes to one of the web servers, Which sends back the page along with a
get request for a 1 × 1 pixel image from a third-party server.
• As the page loads, it executes the call for the 1 × 1 pixel image, thus sending data
about the page view back to the third-party server.
• The third-party server sends the image back to the browser along with code that can
read cookies and capture anonymous visitor data such as Page view, IP address, time
Visitor Browser Server
Third- Party Server Log
1 x 1
Easy to implement. Targeted for narrow purpose like
banners, emails etc.
Antispyware programs automatically remove the third
party cookies, which makes it difficult to track
You can optimize exactly what data the beacon
collects (for example, just the page viewed, or time, or
cookie values, or referrers), and because robots do not
execute image requests, you won’t collect unwanted
Beacons are not as expansive and customizable as
Useful when it comes to collecting data across
multiple websites or domains for comparison.
If image requests are turned off in email programs or
some browsers, you can’t collect the data.
Third- Party Server
Third- Party ServerWebsite 2
• Data serving was separated from data capture, hence reducing the reliance on corporate
IT departments for various data capture requests
• The data capture process is as follows
• A user types a URL in browser
• The request for the webpage comes to one of the web servers
details about the visitor session, and cookies, and sends it back to the data collection
• In some cases, upon receipt of the first set of data, the server sends back additional
code to the browser to set additional cookies or collect more data.
Third- Party Server /
Easier implementation effort with benefit of massive
amount of data capture
often for privacy or other reasons. (usually 2-6 % of
If you don’t have access to your web servers
tagging is your only choice
Data collected is divorced from other metadata and
hence requires thought & planning in creating the
tags that capture site taxonomy & hierarchy
and the analytics tools will be able to collect data.
Capturing data about downloads (for example, PDFs
or EXEs) and redirects is harder than with web logs
Greater control over exactly what data is collected and
ability to add custom tags on special pages
Inaccuracy in identifying visitor without setting
data serving thereby allowing you to set cookies.
Tracking users across multiple domains becomes
easier, because your third-party cookie and its
identifying elements stay consistent as visitors go
Some websites, rather than storing some data in
cookies or URL parameters, will store data on the
servers during the visitor session. In this case, the tags
will not capture essential data.
• Packet sniffing is one of the most sophisticated ways of collecting web data.
• The data capture process is as follows
• A user types a URL in browser
• The request to the web server passes through a software or hardware-based packet
sniffer that collects attributes of the request that provide more data about the
• The packet sniffer sends the request on to the web server.
• The request is sent back to the customer but is first passed to the packet sniffer. The
packet sniffer captures information about the page going back and stores that data.
the packet sniffer more data about the visitor.
• The packet sniffer sends the page on to the visitor browser.
Visitor Browser Packet Sniffer
all data passed through packet sniffer.
Difficult to convince IT department about adding
additional layer of Software/ Hardware in their data
center to route all traffic through it.
lesser than other methods. (reliance on IT team)
Privacy of data is biggest concern as it captures all raw
packets which includes data such as passwords,
names, addresses & credit card information.
Provides ability to collect more data. For example, you
can get server errors, bandwidth usage, and all the
technical data as well as the page-related business
Cached pages, Adobe Flash Files, AJAX or Rich
Internet Applications. Inability to capture core
structure & metadata about pages with packet sniffers
Provides ability to always use first party cookies Expensive for web farm architecture
Listed below are outcome data capture strategy for different businesses.
order confirmation page.
• Data captures may include following metrics
Order’s unique identifier
Product or service ordered
Quantity and price of each item
Discounts and promotions applied
Metadata about the customer session: A/B or multivariate test IDs, cookie values etc.
Metadata about the products/services: product hierarchy, campaign hierarchy, product
• Lead Generation
• Lead Generation data may be collected on “thank you” page (Which is seen by the
customer after submitting a successful lead)
• Partner with other websites that might be collecting and storing the leads on your
behalf to get the Lead Generation data.
• Plan on identifying where the data is being captured and how you can have access to
• Brand/Advocacy and Support.
• Outcomes in this case are harder to figure out because we do not know whether the
page view resulted in solving customer’s problem.
• For the longest time, if the user sees a certain page, we can call it mission
• However, a great way to start by a statistically significant sample of site visitors to get
their ratings on success.
• Having internal Data Warehouse gives the flexibility to capture more data (for
example, event logs from your Flash or rich Internet applications, Google search
data, metadata from other parts of the company, and CRM or phone channel data).
• This allows you to truly create an end-to-end view of customer behavior and
outcomes that can scale effectively over time
• The goal qualitative analysis is to understand the rationale behind the metrics and trends
that we see and to actively incorporate the voice of the customer (VOC) into our decision
• The following user-centric design (UCD) and human-computer interaction (HCI)
methodologies are commonly used to understand the customer perspective:
• They are the optimal method for collecting feedback (based on questionnaires) from
a very large number of customer relatively inexpensively and quickly.
• Conclusions based on survey data, if done right, will be more accurate and reliable
and provide insights and conclusions that help us better understand customer
• Heuristic evaluations
• Heuristic evaluations follow a set of well-established rules (best practices) in web
design and in how website visitors experience websites and interact with them.
• Here, a user researcher acts as a website customer and attempts to complete a set of
predetermined tasks (tasks related to the website’s reason for existence—for
example, trying to place an order)
• In addition to the best practices, the user researcher will draw from their own
experience of running usability studies and their general knowledge of standard
• Usability testing (lab and remote)
• Lab usability tests measure a user’s ability to complete tasks.
• Usability tests are best for optimizing User Interface (UI) designs and work flows,
understanding the customer’s voice, and understanding what customers really do.
• In a typical usability test, a user attempts to complete a task or set of tasks by using a
website (or software or a product).
• Each of these tasks has a specified goal with effectiveness, efficiency, and satisfaction
identified in a specified usage context.
• Site visits (or follow-me-homes)
• In a site visit, user researchers, and often other key stakeholders, go to the home or office
of the customer to observe them completing tasks in a real-world environment.
• You can observe customers interacting with websites in the midst of all the other
distractions of their environment—for example, ringing phones, weird pop-up blockers,
• This experience is very different from a lab because the complicating environmental
• You would use the best-fit methodology based on the following:
• Scope (both size and complexity) of the problem you are trying to solve (entire website,
core segments of experience, particular pages, and so forth)
• Timing (whether you need it overnight or over the next few weeks)
• Number of participants (how many customers you would like feedback from)
• This competitive intelligence is key to helping you understand your performance in the
context of the greater web ecosystem and allows you to better understand whether a certain
result is caused by eco-system trends or your actions (or lack thereof).
• Having a focused competitive intelligence program can help you exploit market trends, build
off the success of your competitors, or help optimize your search engine marketing program.
• There are three main methodologies used to collect :
• Panel-based measurement
• Here, the participant agrees to have their Web browsing behavior tracked in exchange for
• A company called ComScore Networks uses panel-based measurement to compile data
that is used by many companies for competitive analysis
• ISP-based measurement
• Here, the data is collected from various Internet Service Providers (ISPs), through which
we are connected to internet while surfing the web.
• Companies such as Hitwise have agreements with ISPs worldwide whereby the ISPs share
the anonymous web log data collected on the ISP network with Hitwise
• Search engine data
• They also often know information about their users (ex: Search Keywords, Regions,
• Google (Google Trend) and MSN (adlabs ) have recently opened up lab/beta
environments where they enable users to run queries against their databases to glean
insights into competitive data.
Process of Data Analysis
• Web analytics methodology has following steps:
• Defining Business Metrics (KPIs)
• To get real business metrics, you need to look at your website in the context of your
overall business strategy & desired user behavior
• They include such things as the paths you want users to take, the marketing
initiatives you want them to come into contact with, and the products you want them
• The second step is to monetize these desired behaviors. In other words, you should
figure out the value of each behavior to your business.
• Report is the representation of metrics (or the KPIs) you’ve identified and the other
contributing metrics that can help you to better understand the details behind your
performance. (for ex: which pages are visited by user after successfully submitting
• The data from other sources may include data from call centers, retail stores,
attitudinal surveys, or information about your competitors.
• Analysis involves looking at the factors driving your performance so you can identify
opportunities for improvement
• Optimization and Action
• There are a large number of ways you can take action and optimize a site, with the
most common being A/B and multivariate testing.
• You can also make changes to the design, information architecture, the structure of
your promotions, and much more.
How much did Visitors benefit my Business?
• Here, you try to find out how well your site helps visitors accomplish the
things you hoped they’d do like purchase, clicking on advertisement &
• Conversion & Abandonment
• The percentage of visitors that your site converts to contributors, buyers, or users is
the most important metric you can track
• For site that relies on third-party ads, click-through data is the metric that directly
relates to revenue.
• Offline Activity
• Many actions that start on the Web end elsewhere for ex: the purchase that started
online & ends in a call center
• It is important to associate such call center request with the online support
information (by providing a unique code to visitors) to understand the effectiveness
of a website
• User-Generated Content
• If your site thrives on user-generated content (UGC), contribution is key.
• You need to know how many people are adding to the site, either as editors or
commenter's, and whether your contributors are creating the content that your
• Some media sites offer premium subscriptions that give paying customers more
storage, downloadable content, better bandwidth, and so on.
• This can be the main revenue source for analyst firms, writers, and large media
content sites such as independent video producers.
• Additional bandwidth costs money, so subscriptions need to be monitored for cost to
ensure that the premium service contributes to the business as a whole.
• Billing and Account Use
• If you’re running a subscription website—such as a SaaS application—then your
subscribers pay a recurring fee to use the application.
• This is commonly billed per month, and may be paid for by the individual user or as
a part of a wider subscription from an employer.
• It’s essential to track billing and account use, not only because it shows your
revenues, but also because it can pinpoint users who are unlikely to renew.
• You’ll need to define what constitutes an “active” user, and watch how many of your
users are no longer active. You’ll also want to watch the rate of nonrenewal to
Where is my Traffic coming from?
• Getting the right visitors to your site is a combination of Affiliate Marketing, Search Engine
Marketing, and Search Engine Optimization
• Referring Websites
• If the user linked to that page from elsewhere, the browser includes a referring URL. This lets
you know who’s sending you traffic.
• If you know the page that referred visitors, you can track those visits back to the site that sent
them and see what’s driving them to you. Remember, however, that you need to look not only at
who’s sending you visitors, but also at who’s sending you the ones that convert.
• Inbound Links from Social Networks
• An increasing number of visitors come to you from social networks.
• If your media site breaks a news story or offers popular content, social communities will often
link to it. This includes not only social news aggregators like reddit or Digg, but also bloggers,
comment threads, and sites like Twitter, Facebook etc.
• There is a need to identify the source of the traffic so they can engage the people who brought
them the attention.
• Visitor Motivation
• Sometimes the only way to get inside a visitor’s head is to ask her, using surveys and questions
on the site. This approach is called as the voice of the customer (VOC).
• It involves finding about what customers try to accomplish, did they plan to purchases?, what
product & services they are considering? etc.
What is Working Best (and Worst)?
• Site Effectiveness
• A site that convinces visitors to purchase more than what they initially intended is an
• Many e-commerce sites suggest related purchases or offer package deals(up-selling).
Similarly for collaborative sites tracks how many visitors subscribe to mailing list or
RSS feeds. These are important metrics to track.
• Ad and Campaign Effectiveness
• With the exception of organic traffic, most visitors arrive because of a campaign.
• This may be an online campaign—banner ads, sponsorship, or paid content—or an
offline campaign such as a movie trailer or radio spot, or simply good word of mouth
and an informal community.
• Analytics applications can segment incoming traffic by campaign to measure how
much they helped the bottom line
• Search Effectiveness
• Users prefer to search what they’re looking for and choose from the results rather
than browsing through several hierarchies of a directory.
• If this search data is tied into analytics, we can measure search effectiveness and then
we can better label and index the site.
• Trouble Ticketing and Escalation
• An increase in call center activity and support email messages are sure signs of a
• Site operators need to track the volume of trouble tickets related to the website, and
ideally relate those trouble tickets to the user visits that cause them in order to speed
up problem diagnosis.
• Content Popularity
• Media sites are about content. The successful ones put popular content on the home
page, alongside ad space for which they charge a premium.
• Who you attract with your content, and what they do afterward, is an important part
of what works best. In other words, content popularity has to tie back to site goals
rather than just page views.
• No site will succeed if it’s hard to use.
• Focus groups and prerelease testing can identify egregious errors before you
launch, but there’s no substitute for watching a site in production.
• User Productivity
• user productivity looks at whether visitors could accomplish their tasks quickly and
• Every website operator should care whether visitors can accomplish goals, but for
SaaS sites this is particularly important, as users may spend their entire workday
interacting with the application.
• Community Rankings and Rewards
• Sometimes, it’s important to watch for Top/key contributors and rewards given for
contribution for the sites like Wikipedia etc.
How good is my relationship with my Visitors?
• Once you’ve got your site in order, traffic is flowing in, and you’re making the most
of all of your visitors, it’s time to be sure your relationship with them is long and
• The best visitors are those who keep coming back. Thanks to browser cookies, most web
analytics applications show the ratio of new to returning visitors.
• Strike a healthy balance here: get new blood so you can grow, but encourage existing
visitors to return so they become regular buyers or contributors
• The average time between visits & the number of users who no longer engage with the site
are the metric to watch for user engagement.
• Enrollment is valuable because consumers are increasingly skeptical of web marketing.
• Enrollment also provides better targeting. You can ask subscribers for demographic
information such as gender, interests, and income, then tailor your messages—and those
of your advertisers—to your audience.
• Whether through email subscriptions, alerts, or RSS feeds, Reach is the measurement of
how many enrolled visitors actually see your messages.
• Reach is a far more meaningful measure of subscription, since it discounts “stale”
enrollments and shows how well your outbound messages, blogs, and alerts result in
How healthy is my Infrastructure?
• Slow page loads or excessive downtime can undermine even the best-
designed, most effective, easiest-to-use website. Hence, End-user
monitoring is needed.
• Availability and Performance
• The most basic metrics for web health are availability (is it working?) and
performance (how fast is it?),
• These can be measured on a broad, site-wide basis by running synthetic tests at
regular intervals; or they can be measured for every visit to every page with real
user monitoring (RUM).
• Service Level Agreement Compliance
• If people pay to use your site, you have an implied contract or Service Level
Agreement (SLA) that you’ll be available and usable.
• A properly crafted SLA includes not only acceptable performance and
availability, but also time windows, stakeholders, and which functions of the
website are covered.
• Measure and report the metrics that comprise an SLA in a regular fashion to
both your colleagues and your customers.
• Content Delivery
• Content delivery is important for media companies for ex: a Flash ad may be
measured for its delivery.
• Users may need to interact with the content—by rolling over the ad, clicking a
sound button, playing the ad and the user either clicks on the offer or ignores it.
• Hence it is important to track content engagement; attention; completion of the
media; pauses etc.
• Capacity and Flash Traffic
• When a community (like blogs, social news aggregators) suddenly discovers content
that it likes, the result is a flash crowd (thousands of visitors).
• While flash crowds create dramatic bursts of traffic, a gradual, sustained increase in
traffic can sneak up on you and consume all available capacity..
• You need to monitor long-term increases in page latency or server processing or
decreases in availability that may be linked to increased demand for your website.
• Impact of Performance on Outcomes
• Poor performance has a direct impact on outcomes like conversion rate, as well as on
• Responsive websites leads to increased productivity, while slow sites encourage
• The relationship between performance and conversion can be measured on an
individual basis, by making performance a metric that’s tracked by analytics and by
segmenting conversion rates for visitors who had different levels of page latency.
• Traffic Spikes from Marketing Efforts
• Marketing campaigns should drive site traffic.
• You need to identify the additional volume of visitors to your site not only for
marketing reasons, but also to understand the impact that marketing promotions
have on your infrastructure and capacity.
• Seasonal Usage Patterns
• If your business is highly seasonal, you need to understand historical usage patterns
• It helps to understand usage trends so you can plan for capacity changes & to meet
How Am I Doing Against the Competition?
• In addition to monitoring your own website and the communities that affect your
business, you also need to watch your competition.
• Site Popularity and Ranking
• Most startups and media outlets are judged by their monthly unique visitor count
• Keep a watch on Google PageRank; Google Trends; Google Insights; Compete.com &
• If you’re a media site or portal that has to report traffic estimates as part of your
business, ComScore and Nielsen dominate traffic measurement, with Quantcast and
Hitwise as smaller alternatives.
• How People Are Finding My Competitors
• Knowing which organic terms are leading visitors to your competitors helps you
understand what customers are looking for and how they’re thinking about your
products or services.
• On the other hand, using a competitor’s web domain, you can find out what search
terms the market thinks apply to your product category and change your marketing
• Relative Site Performance
• Compare competitors performance and availability to yourself and to industry
• you may want to set up a synthetic testing service of competitors site or even set up a
transactional benchmark that can show the difference in performance across similar
• Competitor Activity
• Monitor competitor activities like changes to competitors’ pages with business
impact such as pricing information, financing, media materials, screenshots, and
executive teams using alerts.
Where Are My Risks?
• You need to monitor your site for abusive content, legal liabilities & anonymous
detractors attacking you publicly
• Trolling and Spamming
• Any website that offers comment fields, collaboration, and content sharing will become a
target for two main groups of mischief-makers: spammers and trolls.
• Hence, monitor for number of users that exhibit unwanted behaviors; percent of spammy
comments; traffic sources that generate spam; volume of community flags.
• Copyright and Legal Liability
• If your site lets users post content, you may have to take steps to ensure that this content isn’t
subject to copyright from other organizations.
• Best practices today are to ask users to confirm that they are legally permitted to post the
content, and to provide links for someone to report illegal content.
• Fraud, Privacy, and Account Sharing
• Safeguarding your visitors’ personally identifiable information is a legal obligation.
• Make sure you’ve got plenty of detailed log files to monitor breaches in privacy or cases of
• Watch for fraud related to account sharing by keeping track of number of concurrent-user
logins per account; number of states and different user agent from which a user has logged
What Are People Saying About Me?
• You should subscribe to a keyword across various types of sites (blogs, mailing lists, news
aggregators, etc.), then review the results wherever someone is talking about things that
matter to your organization.
• Site Reputation
• Keep a track of site reputation by watching Google PageRank; Technorati ranking;
StumbleUpon rating; other Internet ranking tools.
• Use Google Trends, Yahoo! Buzz & Google Insights to understand the relative
popularity of content on the Internet in order to optimize the wording of your site or
to downplay aging themes.
• Social Network Activity
• Search results for your company name, URL, product names, executives, and relevant
keywords across social sites like Digg, Summize, and Twitter, as well as any that are
relevant to your particular industry or domain.
How Are My Site and Content Being Used Elsewhere?
• Track and monitor other people who are using your site. They may be doing so as part of
a mashup, running search engine crawlers or they may be competitors checking up on
• API Access and Usage
• Your site may offer formal web services or APIs to let your users access your application
programmatically through automated scripts.
• Keep a track of traffic volume and number of requests for each API you offer; number of failed
authentications to the API; number of API requests by developer; top URLs by traffic and
• Mashups, Stolen Content, and Illegal Syndication
• Your site’s data can easily appear online in a mashup. By combining several sites and services,
web users can create a new application, often without the original sites knowing it.
• If this is happening to you, you’ll see referring URLs belonging to the mashup page, and you can
track back to that URL to determine where the traffic is coming from and take action if needed.
• Try to treats mashups as business opportunities, not threats. If you have interesting content,
find a way to deliver it that benefits both you and the mashup site.
• Integration with Legacy Systems
• Some SaaS applications may connect to their subscribers’ enterprise software through dedicated
links in order to exchange customer, employee, and financial data
• Such calls may degrade the performance of the web site and hence keep a track of volume and
performance of API calls between the application and enterprise customers or data partners.
• Web Analytics – An hour a day by Avinash Kaushik
• Complete web monitoring by Alistair Croll & Sean Power
• Groundswell: Winning in a World Transformed by Social Technologies