Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
Who is Jim Jansen? Associate professor at College of Information Sciences and Technology, The  Pennsylvania State University , USA Senior Fellow at the  Pew Research Center  (Pew Internet and American Life Project) -  http://www.pewinternet.org   Active research and teaching efforts -  http://ist.psu.edu/faculty_pages/jjansen/   Several funded and non-funded research project Teach several courses, including keyword advertising 2011 book,  Understanding Sponsored Search  (Cambridge) …  theory of keyword advertising Editor of journal,  Internet Research  (Emerald) Book,  Understanding User-Web Interactions via Web Analytics  (Morgan & Claypool) -  basics of web analytics
Let talk  web analytics !  We’ll discuss: context theory application Begin by setting the stage …  what are we facing ?
Moving too ‘ everything ’ recorded  and indexed A lot  global  but much will remain  local Search  (along with data summarization, trend detection, information and knowledge extraction and discovery) is  foundational technology Raises issues, including: Infrastructure  requirements. How and who pays? Changes the nature of  privacy  and  anonymity As publishers or providers, how do we make sense of how people are using this data?  ---  Web analytics Explosion of Information - the  Zettabytes  are coming There will be nearly  15 billion devices  connected to the Internet, generating nearly a   Zettabyte  (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
How much is a Zettabyte?
The  volume of data  is exploding ( information growth ) The  complexity of data  is growing ( information architecture ) The users have  less time  ( attention economy ) The user expects  improved features  ( technological sophistication ) Explosion of Information - the  Zettabytes  are coming There will be nearly  15 billion devices  connected to the Internet, generating nearly a  Zettabyte  (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
Web analytics can help us … Deal with the  volume of data  ( information growth ) Understand the growing  complexity of data  ( information architecture ) Address users’  less time  ( attention economy ) Lead to  improved features  ( technological sophistication ) expected by users How does web analytics do this?
Thousand years ago:    science was mainly  naturalistic describing natural phenomena Last few hundred years:    theoretical  branch using models, generalizations Last few decades:    a  computational  branch simulating complex phenomena Today:     data exploration  (eScience) unifying theory, experiment, and simulation  Data  captured by sensors, instruments, or generated by simulator Processed  by humans and software Information   / knowledge  stored in computer Analyzes  database / collection content using data management and statistics Network and  Web Science Data    Information    Knowledge This is the realm of Web analytics!
What is web analytics? The Web Analytics Association (WAA) defines  Web analytics  as: the measurement, collection, analysis, and reporting of Internet data for the purposes of understanding and optimizing Web usage   ( http://www.webanalyticsassociation.org/ ) Shares common  theoretical  and  methodology  characteristics with all forms of log analysis (e.g., Intranet logs, systems logs, OPAC logs, search logs, etc.)
Let’s break that definition down …  Collection   -  accumulate  and  store  over a period of time  Internet data   - internet  facts  and  statistics  collected together for reference or analysis   Measurement  –  ascertain  the size, amount, or degree of something by using an instrument or device Analysis   -  examine methodically  the structure of information for purposes of explanation and interpretation.   Reporting   - giving a spoken or written  account  of something that one has investigated.  Understanding   -  perceive  the significance, explanation, or cause of something   Optimizing   - make the best or most  effective use  of a resource  Web usage   – employ or  deploy something as a  means of   accomplishing  a purpose or achieving a result   Data Information Knowledge
How is the data collected?
W3C Extended Log Format -Variety of fields for examining visitors to Web sites. Other common format is  NCSA   Separate Log  that is composed of three logs  Common log  – actions on the server,  Referral log  – where they came from, and  Agent log  – stuff about the client computer Rather than service-side logging, other methods such as page tagging, image cookies, Flash cookies, etc. but the data is still stored in a log.  W3C Extended Log Format
Okay, that’s  collection ?  What about  analysis  and  reporting ?
Variety of tools help make sense of this log data
With that  context , let’s look at the  foundations aspects  …
Theoretical Foundations Web analytics is based on the  behaviorism paradigm Behaviorism   –  an approach focused on the outward  behavioral aspects  of thought and emphases the  observed behaviors Behaviorism   –  Pavlov, Watson, and Skinner Burrhus Frederic Skinner  John B. Watson  Ivan Petrovich Pavlov
Behaviorism Characteristics Inductive ,  data-driven   and characterized by  empirical  observation of measurable behavior   Grounded on  somebody   doing   something  in a  situation  ( all   the environmental and situational features are embedded behaviors) Critics  of behaviorism as a psychological theory have issues with  rejection of mental processes . I agree  - people are more than “ mediators between behavior and the environment ” (Skinner, 1993, p 428) (c.f.c., social learning theory) …  however, don’t throw out the baby with the bath water
What is a Behavior? …   an  observable activity  of a person, animal, team, organization, or system. One can classify  behaviors  into three general categories. Behaviors are  something that one can  detect  and  record actions  or specific goal-driven  events  with some purpose other than the specific action that is observable reactive   responses  to environmental stimuli
What is a Behavior? Behavior is the  essential construct  of the behaviorism and of  web analytics Logs record  behaviors  of users and systems (records behavior but can’t tell  affective ,  cognitive , or  situational  aspects ..  yet, but we’re working on it!  ) A behavior is the key  variable  (i.e., an  entity  representing a  set of events  where each event may have a  different value )
can view the data collected in log files as  trace data   people  conducting the activities of their daily lives many times  create  things, create marks, induce wear, or  reduce  some existing  material within the confines of research, these things, marks, and wear become  data   classically, trace data are the  physical remains of people ’ s interaction   Data Collection: Trace Data Wear on a carpet Trash heap Surfing web
Trace Data In the past, trace data was often  time consuming  to gather and process, making such data costly.  logging software  makes collecting trace data on the Internet  easy  and  cheap Log data is  controlled accretion data , where the researcher or some other entity alters the environment in order to create the accretion data  With the user of client apps (such as desktop search bars), the  collection of data is nearly unlimited  from a technology perspective What is  cool  about  trace data  for researchers?
Data Collection Log data/trace data has  significant advantages  as a data collection approach for the study and investigation of behaviors, including: Scale : not a limiting factor as in lab user studies Power : large sample size for inference testing; in fact, so large must account for the size effect Scope : naturalistic; researchers can investigate  range of interactions in a multi-variable context Location : can collect in distributed environments Duration : collect log data over an extended period
Methodological Foundations Use of  logs  to collect  trace data  is an unobtrusive methods (a.k.a., non-reactive or low-constraint).  Unobtrusive methods   … allows data collection  without directly  interfering   into the context and,  does  not require a direct response  from participants  Customer Behavior (video) Chemistry (surface marking)
Methodological Foundations Three  justifications  for unobtrusive methods:  Uncertainty principle : researchers interjected into an environment become part of the system Observer effect : difference that is made to an activity or a person ’ s behaviors by being observed Observer bias : observers overemphasize behavior they expect to find and fail to notice behavior they do not expect Trace data helps in  overcoming  the  Uncertainty principle ,  Observer effect , and  Observer bias  in the data collection. Note: Observer bias for  data collection  but  not data analysis Example: ethnography studies (where the researcher “bird dogs” a study participant Example: no one searches for porn in a lab study of Web searching Example: is why medical trials are double blind rather than single blind
Methodological Foundations Inherent  characteristics  in the method of log data collection; Web analytics has issues to address as a result: Abstraction   –   how does one relate low-level data to higher-level concepts? Selection   –   how does one separate the necessary from unnecessary data?  Reduction   –   how does one reduce the complexity and size of the data set? Context   –   how does one interpret the significance of events?  Evolution   –   how can one collect data without impacting application deployment or use?
Okay, nice but how to we apply it …
Web analytics process  Every  consulting firm  has a  web analytics process  … (which is fine) However, the  effective ones  all boil down to  four essential steps
Essential steps to any effective web analytics process  Typically counts. Basically, data collection Examples: time stamp referral URL query term Typically ratios. Data becomes metrics. Counts and ratios infused with business strategy. Online goals, objectives, or standards for organization. Examples: time on page bounce rate unique visitors Examples: conversion rate average order value task completion rate Examples: save money make money marketshare Collection of  data Processing of data into information Developing key  performance  indicators Formulating online strategy Drives Drives Drives Drives Drives Drives
Three types ( plus 1 ) of Web analytics metrics Implementation Count  — the most  basic  unit of measure; a single number. Ratio  — typically, a  count divided by a count , although a ratio can use either a count or a ratio in the numerator or denominator. KPI  ( Key Performance Indicator ) — can be either a count or a  ratio , it is frequently a ratio. A KPI is  infused with   business strategy , and therefore the set of appropriate KPIs typically differs between site and process types. Dimension  - data that can be used to define various types of segments and represents a fundamental dimension of visitor behavior or site dynamics. Typically,  not associated with a number . Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Can be applied to three levels of granularity Aggregate  — Total site traffic for a defined period of time. ( typically used for market comparisons ) Segmented  — A subset of the site traffic for a  defined period of time, filtered in some way to gain greater analytical insight. ( by developing personas and profiles in Google Analytics ). Individual  — Activity of a single Web visitor for a defined period of time. ( excellent for persona developing and outlier analysis ) Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Classifications of Metrics Building Block  – foundational metrics  Visit Characterization  – metrics aimed at understanding visits, either single or aggregate Content Characterization  – metrics aimed at understanding content or its use Conversion  – metrics aimed at linking visits and content Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Building Block Page : A page is an analyst  definable unit of content . Page Views : The number of times a  page was viewed . Visits/Sessions : A visit is an interaction by an individual, with a website consisting of  one or more requests for a page . Unique Visitors : The number of inferred  individual people , within a designated reporting timeframe, with activity consisting of one or more visits to a site. New Visitor : The number of  Unique Visitors  with activity including a first-ever Visit to a site during a reporting period Repeat Visitor : The number of Unique Visitors with activity consisting of  two or more Visits  to a site during a reporting period. Return Visitor : The number of Unique Visitors with activity consisting of a Visit to a site during a reporting period and where the  Unique Visitor also Visited the site prior to the reporting period Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Visit Characteristics Entry Page : The  first page  of a visit. Landing Page  : A page intended to identify  the beginning of the user experience . Exit Page : The  last page  on a site accessed during a visit, signifying the end of a visit/session. Visit Duration : The  length of time  in a session. Referrer : The referrer is the page URL that originally  generated the request  for the current page view or object. Click-through : Number of  times a link was clicked  by a visitor. Click-through Rate : The number of  click-throughs for a specific link  divided by the number of times that link was viewed. Page Views per Visit : The  number of page views  in a reporting period divided by number of visits in the same reporting period. Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Content Characterization Page Exit Ratio : Number of  exits  from a page divided by total number of page views of that page Single Page Visits : Visits that  consist of one page  regardless of the number of times the page was viewed. Single Page View Visits (Bounces) : Visits that  consist of one page-view . Bounce Rate :  Single page view visits  divided by entry pages. Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Conversion Metrics Event : Any logged or recorded  action  that has a specific date and time assigned to it by either the browser or server Conversion : A visitor  completing  a target action Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
Translating these metrics Translating  these  metrics  into meaningful and accurate knowledge is not always easy. Real world example  –  the hotel problem  ( excellent illustration of the importance of proper period selection )
The hotel Use Daily Uniques Sam Ted Jane Sam Scott Jane Sam Ara Sam Chi Sam Tom Sam Yen Sam Tim Jane Jane Jane Jane Jane Rooms 1  2  3 Days 1  2  3  4  5  6  7 3 3 3 3 3 3 3 Total Daily Uniques = 21 Use Weekly Uniques 1 1 Count Count 7 Total Weekly Uniques = 9
Bottom line: the time qualifier matters! So,  can’t  just  add   daily uniques  to get  weekly uniques Have to  scrub  the data This just one example of many issues that one can face when digging into the data in order to get meaningful  web analytics data !
50 minutes = Can’t Cover Everything …  some starting points for further reading
Research Work (mine) Book: Jansen, B. J., Spink, A., and Taksa, I. (2009)  Handbook of Research on Web Log Analysis , Hershey, PA: Idea Group Publishing First chapter on theory of log analysis is free!   Lecture: Jansen, B. J. (2009)  Understanding User – Web Interactions via Web Analytics . Morgan-Claypool Lecture Series. Gary. Marchionini (Ed). Morgan-Claypool: San Rafael, CA. manuscript about Web Analytics, soup to nuts companion website (free):  http://faculty.ist.psu.edu/jjansen/webanalytics/understanding_web_analytics.html
Research Work (mine) Article: Jansen, B. J. 2006.  Search log analysis: What is it; what's been done; how to do it .  Library and Information Science Research, 28(3), 407-432 . http://faculty.ist.psu.edu/jjansen/academic/pubs/jansen_search_log_analysis.pdf
Great ‘how to books’ for web analytics Web Analytics: An Hour a Day  by Avinash Kaushik (Jun 5, 2007)  Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity  by Avinash Kaushik (Oct 2009) Advanced Web Metrics with Google Analytics , 2nd Edition by Brian Clifton (Mar 15, 2010)  Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business  by Eric Peterson (Mar 2004)
Thanks! (welcome questions / discussion!) Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
Before we end …
Follow-on Discussion Happy  to  chat  with anyone (get with me either today or contact me via email)  Email  [email_address] LinkedIn  http://www.linkedin.com/in/jjansen Twitter  jimjansen
Again, thanks! Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University

Web analytics presentation

  • 1.
    Web Analytics JimJansen Associate Professor, The Pennsylvania State University
  • 2.
    Who is JimJansen? Associate professor at College of Information Sciences and Technology, The Pennsylvania State University , USA Senior Fellow at the Pew Research Center (Pew Internet and American Life Project) - http://www.pewinternet.org Active research and teaching efforts - http://ist.psu.edu/faculty_pages/jjansen/ Several funded and non-funded research project Teach several courses, including keyword advertising 2011 book, Understanding Sponsored Search (Cambridge) … theory of keyword advertising Editor of journal, Internet Research (Emerald) Book, Understanding User-Web Interactions via Web Analytics (Morgan & Claypool) - basics of web analytics
  • 3.
    Let talk web analytics ! We’ll discuss: context theory application Begin by setting the stage … what are we facing ?
  • 4.
    Moving too ‘everything ’ recorded and indexed A lot global but much will remain local Search (along with data summarization, trend detection, information and knowledge extraction and discovery) is foundational technology Raises issues, including: Infrastructure requirements. How and who pays? Changes the nature of privacy and anonymity As publishers or providers, how do we make sense of how people are using this data? --- Web analytics Explosion of Information - the Zettabytes are coming There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
  • 5.
    How much isa Zettabyte?
  • 6.
    The volumeof data is exploding ( information growth ) The complexity of data is growing ( information architecture ) The users have less time ( attention economy ) The user expects improved features ( technological sophistication ) Explosion of Information - the Zettabytes are coming There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
  • 7.
    Web analytics canhelp us … Deal with the volume of data ( information growth ) Understand the growing complexity of data ( information architecture ) Address users’ less time ( attention economy ) Lead to improved features ( technological sophistication ) expected by users How does web analytics do this?
  • 8.
    Thousand years ago: science was mainly naturalistic describing natural phenomena Last few hundred years: theoretical branch using models, generalizations Last few decades: a computational branch simulating complex phenomena Today: data exploration (eScience) unifying theory, experiment, and simulation Data captured by sensors, instruments, or generated by simulator Processed by humans and software Information / knowledge stored in computer Analyzes database / collection content using data management and statistics Network and Web Science Data  Information  Knowledge This is the realm of Web analytics!
  • 9.
    What is webanalytics? The Web Analytics Association (WAA) defines Web analytics as: the measurement, collection, analysis, and reporting of Internet data for the purposes of understanding and optimizing Web usage ( http://www.webanalyticsassociation.org/ ) Shares common theoretical and methodology characteristics with all forms of log analysis (e.g., Intranet logs, systems logs, OPAC logs, search logs, etc.)
  • 10.
    Let’s break thatdefinition down … Collection - accumulate and store over a period of time Internet data - internet facts and statistics collected together for reference or analysis Measurement – ascertain the size, amount, or degree of something by using an instrument or device Analysis - examine methodically the structure of information for purposes of explanation and interpretation. Reporting - giving a spoken or written account of something that one has investigated. Understanding - perceive the significance, explanation, or cause of something Optimizing - make the best or most effective use of a resource Web usage – employ or deploy something as a means of accomplishing a purpose or achieving a result Data Information Knowledge
  • 11.
    How is thedata collected?
  • 12.
    W3C Extended LogFormat -Variety of fields for examining visitors to Web sites. Other common format is NCSA Separate Log that is composed of three logs Common log – actions on the server, Referral log – where they came from, and Agent log – stuff about the client computer Rather than service-side logging, other methods such as page tagging, image cookies, Flash cookies, etc. but the data is still stored in a log. W3C Extended Log Format
  • 13.
    Okay, that’s collection ? What about analysis and reporting ?
  • 14.
    Variety of toolshelp make sense of this log data
  • 15.
    With that context , let’s look at the foundations aspects …
  • 16.
    Theoretical Foundations Webanalytics is based on the behaviorism paradigm Behaviorism – an approach focused on the outward behavioral aspects of thought and emphases the observed behaviors Behaviorism – Pavlov, Watson, and Skinner Burrhus Frederic Skinner John B. Watson Ivan Petrovich Pavlov
  • 17.
    Behaviorism Characteristics Inductive, data-driven and characterized by empirical observation of measurable behavior Grounded on somebody doing something in a situation ( all the environmental and situational features are embedded behaviors) Critics of behaviorism as a psychological theory have issues with rejection of mental processes . I agree - people are more than “ mediators between behavior and the environment ” (Skinner, 1993, p 428) (c.f.c., social learning theory) … however, don’t throw out the baby with the bath water
  • 18.
    What is aBehavior? … an observable activity of a person, animal, team, organization, or system. One can classify behaviors into three general categories. Behaviors are something that one can detect and record actions or specific goal-driven events with some purpose other than the specific action that is observable reactive responses to environmental stimuli
  • 19.
    What is aBehavior? Behavior is the essential construct of the behaviorism and of web analytics Logs record behaviors of users and systems (records behavior but can’t tell affective , cognitive , or situational aspects .. yet, but we’re working on it! ) A behavior is the key variable (i.e., an entity representing a set of events where each event may have a different value )
  • 20.
    can view thedata collected in log files as trace data people conducting the activities of their daily lives many times create things, create marks, induce wear, or reduce some existing material within the confines of research, these things, marks, and wear become data classically, trace data are the physical remains of people ’ s interaction Data Collection: Trace Data Wear on a carpet Trash heap Surfing web
  • 21.
    Trace Data Inthe past, trace data was often time consuming to gather and process, making such data costly. logging software makes collecting trace data on the Internet easy and cheap Log data is controlled accretion data , where the researcher or some other entity alters the environment in order to create the accretion data With the user of client apps (such as desktop search bars), the collection of data is nearly unlimited from a technology perspective What is cool about trace data for researchers?
  • 22.
    Data Collection Logdata/trace data has significant advantages as a data collection approach for the study and investigation of behaviors, including: Scale : not a limiting factor as in lab user studies Power : large sample size for inference testing; in fact, so large must account for the size effect Scope : naturalistic; researchers can investigate range of interactions in a multi-variable context Location : can collect in distributed environments Duration : collect log data over an extended period
  • 23.
    Methodological Foundations Useof logs to collect trace data is an unobtrusive methods (a.k.a., non-reactive or low-constraint). Unobtrusive methods … allows data collection without directly interfering into the context and, does not require a direct response from participants Customer Behavior (video) Chemistry (surface marking)
  • 24.
    Methodological Foundations Three justifications for unobtrusive methods: Uncertainty principle : researchers interjected into an environment become part of the system Observer effect : difference that is made to an activity or a person ’ s behaviors by being observed Observer bias : observers overemphasize behavior they expect to find and fail to notice behavior they do not expect Trace data helps in overcoming the Uncertainty principle , Observer effect , and Observer bias in the data collection. Note: Observer bias for data collection but not data analysis Example: ethnography studies (where the researcher “bird dogs” a study participant Example: no one searches for porn in a lab study of Web searching Example: is why medical trials are double blind rather than single blind
  • 25.
    Methodological Foundations Inherent characteristics in the method of log data collection; Web analytics has issues to address as a result: Abstraction – how does one relate low-level data to higher-level concepts? Selection – how does one separate the necessary from unnecessary data? Reduction – how does one reduce the complexity and size of the data set? Context – how does one interpret the significance of events? Evolution – how can one collect data without impacting application deployment or use?
  • 26.
    Okay, nice buthow to we apply it …
  • 27.
    Web analytics process Every consulting firm has a web analytics process … (which is fine) However, the effective ones all boil down to four essential steps
  • 28.
    Essential steps toany effective web analytics process Typically counts. Basically, data collection Examples: time stamp referral URL query term Typically ratios. Data becomes metrics. Counts and ratios infused with business strategy. Online goals, objectives, or standards for organization. Examples: time on page bounce rate unique visitors Examples: conversion rate average order value task completion rate Examples: save money make money marketshare Collection of data Processing of data into information Developing key performance indicators Formulating online strategy Drives Drives Drives Drives Drives Drives
  • 29.
    Three types (plus 1 ) of Web analytics metrics Implementation Count — the most basic unit of measure; a single number. Ratio — typically, a count divided by a count , although a ratio can use either a count or a ratio in the numerator or denominator. KPI ( Key Performance Indicator ) — can be either a count or a ratio , it is frequently a ratio. A KPI is infused with business strategy , and therefore the set of appropriate KPIs typically differs between site and process types. Dimension - data that can be used to define various types of segments and represents a fundamental dimension of visitor behavior or site dynamics. Typically, not associated with a number . Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 30.
    Can be appliedto three levels of granularity Aggregate — Total site traffic for a defined period of time. ( typically used for market comparisons ) Segmented — A subset of the site traffic for a defined period of time, filtered in some way to gain greater analytical insight. ( by developing personas and profiles in Google Analytics ). Individual — Activity of a single Web visitor for a defined period of time. ( excellent for persona developing and outlier analysis ) Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 31.
    Classifications of MetricsBuilding Block – foundational metrics Visit Characterization – metrics aimed at understanding visits, either single or aggregate Content Characterization – metrics aimed at understanding content or its use Conversion – metrics aimed at linking visits and content Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 32.
    Building Block Page: A page is an analyst definable unit of content . Page Views : The number of times a page was viewed . Visits/Sessions : A visit is an interaction by an individual, with a website consisting of one or more requests for a page . Unique Visitors : The number of inferred individual people , within a designated reporting timeframe, with activity consisting of one or more visits to a site. New Visitor : The number of Unique Visitors with activity including a first-ever Visit to a site during a reporting period Repeat Visitor : The number of Unique Visitors with activity consisting of two or more Visits to a site during a reporting period. Return Visitor : The number of Unique Visitors with activity consisting of a Visit to a site during a reporting period and where the Unique Visitor also Visited the site prior to the reporting period Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 33.
    Visit Characteristics EntryPage : The first page of a visit. Landing Page : A page intended to identify the beginning of the user experience . Exit Page : The last page on a site accessed during a visit, signifying the end of a visit/session. Visit Duration : The length of time in a session. Referrer : The referrer is the page URL that originally generated the request for the current page view or object. Click-through : Number of times a link was clicked by a visitor. Click-through Rate : The number of click-throughs for a specific link divided by the number of times that link was viewed. Page Views per Visit : The number of page views in a reporting period divided by number of visits in the same reporting period. Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 34.
    Content Characterization PageExit Ratio : Number of exits from a page divided by total number of page views of that page Single Page Visits : Visits that consist of one page regardless of the number of times the page was viewed. Single Page View Visits (Bounces) : Visits that consist of one page-view . Bounce Rate : Single page view visits divided by entry pages. Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 35.
    Conversion Metrics Event: Any logged or recorded action that has a specific date and time assigned to it by either the browser or server Conversion : A visitor completing a target action Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 36.
    Translating these metricsTranslating these metrics into meaningful and accurate knowledge is not always easy. Real world example – the hotel problem ( excellent illustration of the importance of proper period selection )
  • 37.
    The hotel UseDaily Uniques Sam Ted Jane Sam Scott Jane Sam Ara Sam Chi Sam Tom Sam Yen Sam Tim Jane Jane Jane Jane Jane Rooms 1 2 3 Days 1 2 3 4 5 6 7 3 3 3 3 3 3 3 Total Daily Uniques = 21 Use Weekly Uniques 1 1 Count Count 7 Total Weekly Uniques = 9
  • 38.
    Bottom line: thetime qualifier matters! So, can’t just add daily uniques to get weekly uniques Have to scrub the data This just one example of many issues that one can face when digging into the data in order to get meaningful web analytics data !
  • 39.
    50 minutes =Can’t Cover Everything … some starting points for further reading
  • 40.
    Research Work (mine)Book: Jansen, B. J., Spink, A., and Taksa, I. (2009) Handbook of Research on Web Log Analysis , Hershey, PA: Idea Group Publishing First chapter on theory of log analysis is free! Lecture: Jansen, B. J. (2009) Understanding User – Web Interactions via Web Analytics . Morgan-Claypool Lecture Series. Gary. Marchionini (Ed). Morgan-Claypool: San Rafael, CA. manuscript about Web Analytics, soup to nuts companion website (free): http://faculty.ist.psu.edu/jjansen/webanalytics/understanding_web_analytics.html
  • 41.
    Research Work (mine)Article: Jansen, B. J. 2006. Search log analysis: What is it; what's been done; how to do it . Library and Information Science Research, 28(3), 407-432 . http://faculty.ist.psu.edu/jjansen/academic/pubs/jansen_search_log_analysis.pdf
  • 42.
    Great ‘how tobooks’ for web analytics Web Analytics: An Hour a Day by Avinash Kaushik (Jun 5, 2007) Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity by Avinash Kaushik (Oct 2009) Advanced Web Metrics with Google Analytics , 2nd Edition by Brian Clifton (Mar 15, 2010) Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business by Eric Peterson (Mar 2004)
  • 43.
    Thanks! (welcome questions/ discussion!) Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
  • 44.
  • 45.
    Follow-on Discussion Happy to chat with anyone (get with me either today or contact me via email) Email [email_address] LinkedIn http://www.linkedin.com/in/jjansen Twitter jimjansen
  • 46.
    Again, thanks! WebAnalytics Jim Jansen Associate Professor, The Pennsylvania State University