Web analytics presentation

7,078 views
6,695 views

Published on

Web analytics presentation given to Penn State ITS office on 19 Oct 2011

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,078
On SlideShare
0
From Embeds
0
Number of Embeds
73
Actions
Shares
0
Downloads
244
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Web analytics presentation

  1. 1. Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
  2. 2. Who is Jim Jansen? <ul><li>Associate professor at College of Information Sciences and Technology, The Pennsylvania State University , USA </li></ul><ul><li>Senior Fellow at the Pew Research Center (Pew Internet and American Life Project) - http://www.pewinternet.org </li></ul><ul><li>Active research and teaching efforts - http://ist.psu.edu/faculty_pages/jjansen/ </li></ul><ul><li>Several funded and non-funded research project </li></ul><ul><li>Teach several courses, including keyword advertising </li></ul><ul><li>2011 book, Understanding Sponsored Search (Cambridge) … theory of keyword advertising </li></ul><ul><li>Editor of journal, Internet Research (Emerald) </li></ul><ul><li>Book, Understanding User-Web Interactions via Web Analytics (Morgan & Claypool) - basics of web analytics </li></ul>
  3. 3. <ul><li>Let talk web analytics ! </li></ul><ul><li>We’ll discuss: </li></ul><ul><ul><li>context </li></ul></ul><ul><ul><li>theory </li></ul></ul><ul><ul><li>application </li></ul></ul><ul><li>Begin by setting the stage … what are we facing ? </li></ul>
  4. 4. <ul><li>Moving too ‘ everything ’ recorded and indexed </li></ul><ul><li>A lot global but much will remain local </li></ul><ul><li>Search (along with data summarization, trend detection, information and knowledge extraction and discovery) is foundational technology </li></ul><ul><li>Raises issues, including: </li></ul><ul><li>Infrastructure requirements. How and who pays? </li></ul><ul><li>Changes the nature of privacy and anonymity </li></ul><ul><li>As publishers or providers, how do we make sense of how people are using this data? --- Web analytics </li></ul>Explosion of Information - the Zettabytes are coming There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
  5. 5. How much is a Zettabyte?
  6. 6. <ul><li>The volume of data is exploding ( information growth ) </li></ul><ul><li>The complexity of data is growing ( information architecture ) </li></ul><ul><li>The users have less time ( attention economy ) </li></ul><ul><li>The user expects improved features ( technological sophistication ) </li></ul>Explosion of Information - the Zettabytes are coming There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
  7. 7. Web analytics can help us … <ul><li>Deal with the volume of data ( information growth ) </li></ul><ul><li>Understand the growing complexity of data ( information architecture ) </li></ul><ul><li>Address users’ less time ( attention economy ) </li></ul><ul><li>Lead to improved features ( technological sophistication ) expected by users </li></ul>How does web analytics do this?
  8. 8. <ul><li>Thousand years ago: science was mainly naturalistic </li></ul><ul><ul><li>describing natural phenomena </li></ul></ul><ul><li>Last few hundred years: theoretical branch </li></ul><ul><ul><li>using models, generalizations </li></ul></ul><ul><li>Last few decades: a computational branch </li></ul><ul><ul><li>simulating complex phenomena </li></ul></ul><ul><li>Today: data exploration (eScience) </li></ul><ul><ul><li>unifying theory, experiment, and simulation </li></ul></ul><ul><ul><li>Data captured by sensors, instruments, or generated by simulator </li></ul></ul><ul><ul><li>Processed by humans and software </li></ul></ul><ul><ul><li>Information / knowledge stored in computer </li></ul></ul><ul><ul><li>Analyzes database / collection content using data management and statistics </li></ul></ul><ul><ul><li>Network and Web Science </li></ul></ul>Data  Information  Knowledge This is the realm of Web analytics!
  9. 9. What is web analytics? <ul><li>The Web Analytics Association (WAA) defines Web analytics as: </li></ul><ul><ul><li>the measurement, collection, analysis, and reporting of Internet data for the purposes of understanding and optimizing Web usage </li></ul></ul><ul><ul><li>( http://www.webanalyticsassociation.org/ ) </li></ul></ul><ul><li>Shares common theoretical and methodology characteristics with all forms of log analysis (e.g., Intranet logs, systems logs, OPAC logs, search logs, etc.) </li></ul>
  10. 10. Let’s break that definition down … <ul><li>Collection - accumulate and store over a period of time </li></ul><ul><li>Internet data - internet facts and statistics collected together for reference or analysis </li></ul><ul><li>Measurement – ascertain the size, amount, or degree of something by using an instrument or device </li></ul><ul><li>Analysis - examine methodically the structure of information for purposes of explanation and interpretation. </li></ul><ul><li>Reporting - giving a spoken or written account of something that one has investigated. </li></ul><ul><li>Understanding - perceive the significance, explanation, or cause of something </li></ul><ul><li>Optimizing - make the best or most effective use of a resource </li></ul><ul><li>Web usage – employ or deploy something as a means of accomplishing a purpose or achieving a result </li></ul>Data Information Knowledge
  11. 11. <ul><li>How is the data collected? </li></ul>
  12. 12. W3C Extended Log Format -Variety of fields for examining visitors to Web sites. Other common format is NCSA Separate Log that is composed of three logs Common log – actions on the server, Referral log – where they came from, and Agent log – stuff about the client computer Rather than service-side logging, other methods such as page tagging, image cookies, Flash cookies, etc. but the data is still stored in a log. W3C Extended Log Format
  13. 13. <ul><li>Okay, that’s collection ? </li></ul><ul><li>What about analysis and reporting ? </li></ul>
  14. 14. Variety of tools help make sense of this log data
  15. 15. <ul><li>With that context , let’s look at the foundations aspects … </li></ul>
  16. 16. Theoretical Foundations <ul><li>Web analytics is based on the behaviorism paradigm </li></ul><ul><li>Behaviorism – an approach focused on the outward behavioral aspects of thought and emphases the observed behaviors </li></ul><ul><li>Behaviorism – Pavlov, Watson, and Skinner </li></ul>Burrhus Frederic Skinner John B. Watson Ivan Petrovich Pavlov
  17. 17. Behaviorism Characteristics <ul><li>Inductive , data-driven and characterized by empirical observation of measurable behavior </li></ul><ul><li>Grounded on somebody doing something in a situation ( all the environmental and situational features are embedded behaviors) </li></ul><ul><li>Critics of behaviorism as a psychological theory have issues with rejection of mental processes . </li></ul><ul><li>I agree - people are more than “ mediators between behavior and the environment ” (Skinner, 1993, p 428) (c.f.c., social learning theory) … however, don’t throw out the baby with the bath water </li></ul>
  18. 18. What is a Behavior? <ul><li>… an observable activity of a person, animal, team, organization, or system. </li></ul><ul><li>One can classify behaviors into three general categories. Behaviors are </li></ul><ul><li>something that one can detect and record </li></ul><ul><li>actions or specific goal-driven events with some purpose other than the specific action that is observable </li></ul><ul><li>reactive responses to environmental stimuli </li></ul>
  19. 19. What is a Behavior? <ul><li>Behavior is the essential construct of the behaviorism and of web analytics </li></ul><ul><li>Logs record behaviors of users and systems (records behavior but can’t tell affective , cognitive , or situational aspects .. yet, but we’re working on it! ) </li></ul><ul><li>A behavior is the key variable (i.e., an entity representing a set of events where each event may have a different value ) </li></ul>
  20. 20. <ul><li>can view the data collected in log files as trace data </li></ul><ul><li>people conducting the activities of their daily lives many times create things, create marks, induce wear, or reduce some existing material </li></ul><ul><li>within the confines of research, these things, marks, and wear become data </li></ul><ul><li>classically, trace data are the physical remains of people ’ s interaction </li></ul>Data Collection: Trace Data Wear on a carpet Trash heap Surfing web
  21. 21. Trace Data <ul><li>In the past, trace data was often time consuming to gather and process, making such data costly. </li></ul><ul><li>logging software makes collecting trace data on the Internet easy and cheap </li></ul><ul><li>Log data is controlled accretion data , where the researcher or some other entity alters the environment in order to create the accretion data </li></ul><ul><li>With the user of client apps (such as desktop search bars), the collection of data is nearly unlimited from a technology perspective </li></ul>What is cool about trace data for researchers?
  22. 22. Data Collection <ul><li>Log data/trace data has significant advantages as a data collection approach for the study and investigation of behaviors, including: </li></ul><ul><li>Scale : not a limiting factor as in lab user studies </li></ul><ul><li>Power : large sample size for inference testing; in fact, so large must account for the size effect </li></ul><ul><li>Scope : naturalistic; researchers can investigate range of interactions in a multi-variable context </li></ul><ul><li>Location : can collect in distributed environments </li></ul><ul><li>Duration : collect log data over an extended period </li></ul>
  23. 23. Methodological Foundations <ul><li>Use of logs to collect trace data is an unobtrusive methods (a.k.a., non-reactive or low-constraint). Unobtrusive methods … </li></ul><ul><li>allows data collection without directly interfering into the context and, </li></ul><ul><li>does not require a direct response from participants </li></ul>Customer Behavior (video) Chemistry (surface marking)
  24. 24. Methodological Foundations <ul><li>Three justifications for unobtrusive methods: </li></ul><ul><li>Uncertainty principle : researchers interjected into an environment become part of the system </li></ul><ul><li>Observer effect : difference that is made to an activity or a person ’ s behaviors by being observed </li></ul><ul><li>Observer bias : observers overemphasize behavior they expect to find and fail to notice behavior they do not expect </li></ul><ul><li>Trace data helps in overcoming the Uncertainty principle , Observer effect , and Observer bias in the data collection. Note: Observer bias for data collection but not data analysis </li></ul>Example: ethnography studies (where the researcher “bird dogs” a study participant Example: no one searches for porn in a lab study of Web searching Example: is why medical trials are double blind rather than single blind
  25. 25. Methodological Foundations <ul><li>Inherent characteristics in the method of log data collection; Web analytics has issues to address as a result: </li></ul><ul><li>Abstraction – how does one relate low-level data to higher-level concepts? </li></ul><ul><li>Selection – how does one separate the necessary from unnecessary data? </li></ul><ul><li>Reduction – how does one reduce the complexity and size of the data set? </li></ul><ul><li>Context – how does one interpret the significance of events? </li></ul><ul><li>Evolution – how can one collect data without impacting application deployment or use? </li></ul>
  26. 26. <ul><li>Okay, nice but how to we apply it … </li></ul>
  27. 27. Web analytics process <ul><li>Every consulting firm has a web analytics process … (which is fine) </li></ul><ul><li>However, the effective ones all boil down to four essential steps </li></ul>
  28. 28. Essential steps to any effective web analytics process Typically counts. Basically, data collection <ul><li>Examples: </li></ul><ul><li>time stamp </li></ul><ul><li>referral URL </li></ul><ul><li>query term </li></ul>Typically ratios. Data becomes metrics. Counts and ratios infused with business strategy. Online goals, objectives, or standards for organization. <ul><li>Examples: </li></ul><ul><li>time on page </li></ul><ul><li>bounce rate </li></ul><ul><li>unique visitors </li></ul><ul><li>Examples: </li></ul><ul><li>conversion rate </li></ul><ul><li>average order value </li></ul><ul><li>task completion rate </li></ul><ul><li>Examples: </li></ul><ul><li>save money </li></ul><ul><li>make money </li></ul><ul><li>marketshare </li></ul>Collection of data Processing of data into information Developing key performance indicators Formulating online strategy Drives Drives Drives Drives Drives Drives
  29. 29. Three types ( plus 1 ) of Web analytics metrics Implementation <ul><li>Count — the most basic unit of measure; a single number. </li></ul><ul><li>Ratio — typically, a count divided by a count , although a ratio can use either a count or a ratio in the numerator or denominator. </li></ul><ul><li>KPI ( Key Performance Indicator ) — can be either a count or a ratio , it is frequently a ratio. A KPI is infused with business strategy , and therefore the set of appropriate KPIs typically differs between site and process types. </li></ul><ul><li>Dimension - data that can be used to define various types of segments and represents a fundamental dimension of visitor behavior or site dynamics. Typically, not associated with a number . </li></ul>Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  30. 30. Can be applied to three levels of granularity <ul><li>Aggregate — Total site traffic for a defined period of time. ( typically used for market comparisons ) </li></ul><ul><li>Segmented — A subset of the site traffic for a defined period of time, filtered in some way to gain greater analytical insight. ( by developing personas and profiles in Google Analytics ). </li></ul><ul><li>Individual — Activity of a single Web visitor for a defined period of time. ( excellent for persona developing and outlier analysis ) </li></ul>Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  31. 31. Classifications of Metrics <ul><li>Building Block – foundational metrics </li></ul><ul><li>Visit Characterization – metrics aimed at understanding visits, either single or aggregate </li></ul><ul><li>Content Characterization – metrics aimed at understanding content or its use </li></ul><ul><li>Conversion – metrics aimed at linking visits and content </li></ul>Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  32. 32. Building Block <ul><li>Page : A page is an analyst definable unit of content . </li></ul><ul><li>Page Views : The number of times a page was viewed . </li></ul><ul><li>Visits/Sessions : A visit is an interaction by an individual, with a website consisting of one or more requests for a page . </li></ul><ul><li>Unique Visitors : The number of inferred individual people , within a designated reporting timeframe, with activity consisting of one or more visits to a site. </li></ul><ul><ul><li>New Visitor : The number of Unique Visitors with activity including a first-ever Visit to a site during a reporting period </li></ul></ul><ul><ul><li>Repeat Visitor : The number of Unique Visitors with activity consisting of two or more Visits to a site during a reporting period. </li></ul></ul><ul><ul><li>Return Visitor : The number of Unique Visitors with activity consisting of a Visit to a site during a reporting period and where the Unique Visitor also Visited the site prior to the reporting period </li></ul></ul>Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  33. 33. Visit Characteristics <ul><li>Entry Page : The first page of a visit. </li></ul><ul><li>Landing Page : A page intended to identify the beginning of the user experience . </li></ul><ul><li>Exit Page : The last page on a site accessed during a visit, signifying the end of a visit/session. </li></ul><ul><li>Visit Duration : The length of time in a session. </li></ul><ul><li>Referrer : The referrer is the page URL that originally generated the request for the current page view or object. </li></ul><ul><li>Click-through : Number of times a link was clicked by a visitor. </li></ul><ul><li>Click-through Rate : The number of click-throughs for a specific link divided by the number of times that link was viewed. </li></ul><ul><li>Page Views per Visit : The number of page views in a reporting period divided by number of visits in the same reporting period. </li></ul>Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  34. 34. Content Characterization <ul><li>Page Exit Ratio : Number of exits from a page divided by total number of page views of that page </li></ul><ul><li>Single Page Visits : Visits that consist of one page regardless of the number of times the page was viewed. </li></ul><ul><li>Single Page View Visits (Bounces) : Visits that consist of one page-view . </li></ul><ul><li>Bounce Rate : Single page view visits divided by entry pages. </li></ul>Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  35. 35. Conversion Metrics <ul><li>Event : Any logged or recorded action that has a specific date and time assigned to it by either the browser or server </li></ul><ul><li>Conversion : A visitor completing a target action </li></ul>Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  36. 36. Translating these metrics <ul><li>Translating these metrics into meaningful and accurate knowledge is not always easy. </li></ul><ul><li>Real world example – the hotel problem ( excellent illustration of the importance of proper period selection ) </li></ul>
  37. 37. The hotel <ul><li>Use Daily Uniques </li></ul>Sam Ted Jane Sam Scott Jane Sam Ara Sam Chi Sam Tom Sam Yen Sam Tim Jane Jane Jane Jane Jane Rooms 1 2 3 Days 1 2 3 4 5 6 7 3 3 3 3 3 3 3 <ul><li>Total Daily Uniques = 21 </li></ul><ul><li>Use Weekly Uniques </li></ul>1 1 Count Count 7 <ul><li>Total Weekly Uniques = 9 </li></ul>
  38. 38. Bottom line: the time qualifier matters! <ul><li>So, can’t just add daily uniques to get weekly uniques </li></ul><ul><li>Have to scrub the data </li></ul><ul><li>This just one example of many issues that one can face when digging into the data in order to get meaningful web analytics data ! </li></ul>
  39. 39. 50 minutes = Can’t Cover Everything <ul><li>… some starting points for further reading </li></ul>
  40. 40. Research Work (mine) <ul><li>Book: Jansen, B. J., Spink, A., and Taksa, I. (2009) Handbook of Research on Web Log Analysis , Hershey, PA: Idea Group Publishing </li></ul><ul><ul><li>First chapter on theory of log analysis is free! </li></ul></ul><ul><li>Lecture: Jansen, B. J. (2009) Understanding User – Web Interactions via Web Analytics . Morgan-Claypool Lecture Series. Gary. Marchionini (Ed). Morgan-Claypool: San Rafael, CA. </li></ul><ul><ul><li>manuscript about Web Analytics, soup to nuts </li></ul></ul><ul><ul><li>companion website (free): http://faculty.ist.psu.edu/jjansen/webanalytics/understanding_web_analytics.html </li></ul></ul>
  41. 41. Research Work (mine) <ul><li>Article: Jansen, B. J. 2006. Search log analysis: What is it; what's been done; how to do it . Library and Information Science Research, 28(3), 407-432 . </li></ul><ul><li>http://faculty.ist.psu.edu/jjansen/academic/pubs/jansen_search_log_analysis.pdf </li></ul>
  42. 42. Great ‘how to books’ for web analytics <ul><li>Web Analytics: An Hour a Day by Avinash Kaushik (Jun 5, 2007) </li></ul><ul><li>Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity by Avinash Kaushik (Oct 2009) </li></ul><ul><li>Advanced Web Metrics with Google Analytics , 2nd Edition by Brian Clifton (Mar 15, 2010) </li></ul><ul><li>Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business by Eric Peterson (Mar 2004) </li></ul>
  43. 43. Thanks! (welcome questions / discussion!) Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
  44. 44. <ul><li>Before we end … </li></ul>
  45. 45. Follow-on Discussion <ul><li>Happy to chat with anyone (get with me either today or contact me via email) </li></ul><ul><li>Email [email_address] </li></ul><ul><li>LinkedIn http://www.linkedin.com/in/jjansen </li></ul><ul><li>Twitter jimjansen </li></ul>
  46. 46. Again, thanks! Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University

×