Your SlideShare is downloading. ×
Web analytics presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Web analytics presentation

3,329
views

Published on

Web analytics presentation given to Penn State ITS office on 19 Oct 2011

Web analytics presentation given to Penn State ITS office on 19 Oct 2011

Published in: Technology, Education

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,329
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
117
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
  • 2. Who is Jim Jansen?
    • Associate professor at College of Information Sciences and Technology, The Pennsylvania State University , USA
    • Senior Fellow at the Pew Research Center (Pew Internet and American Life Project) - http://www.pewinternet.org
    • Active research and teaching efforts - http://ist.psu.edu/faculty_pages/jjansen/
    • Several funded and non-funded research project
    • Teach several courses, including keyword advertising
    • 2011 book, Understanding Sponsored Search (Cambridge) … theory of keyword advertising
    • Editor of journal, Internet Research (Emerald)
    • Book, Understanding User-Web Interactions via Web Analytics (Morgan & Claypool) - basics of web analytics
  • 3.
    • Let talk web analytics !
    • We’ll discuss:
      • context
      • theory
      • application
    • Begin by setting the stage … what are we facing ?
  • 4.
    • Moving too ‘ everything ’ recorded and indexed
    • A lot global but much will remain local
    • Search (along with data summarization, trend detection, information and knowledge extraction and discovery) is foundational technology
    • Raises issues, including:
    • Infrastructure requirements. How and who pays?
    • Changes the nature of privacy and anonymity
    • As publishers or providers, how do we make sense of how people are using this data? --- Web analytics
    Explosion of Information - the Zettabytes are coming There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
  • 5. How much is a Zettabyte?
  • 6.
    • The volume of data is exploding ( information growth )
    • The complexity of data is growing ( information architecture )
    • The users have less time ( attention economy )
    • The user expects improved features ( technological sophistication )
    Explosion of Information - the Zettabytes are coming There will be nearly 15 billion devices connected to the Internet, generating nearly a Zettabyte (one sextillion bytes) of global IP traffic by 2015, Cisco's fifth annual Visual Networking Index (VNI) Forecast
  • 7. Web analytics can help us …
    • Deal with the volume of data ( information growth )
    • Understand the growing complexity of data ( information architecture )
    • Address users’ less time ( attention economy )
    • Lead to improved features ( technological sophistication ) expected by users
    How does web analytics do this?
  • 8.
    • Thousand years ago: science was mainly naturalistic
      • describing natural phenomena
    • Last few hundred years: theoretical branch
      • using models, generalizations
    • Last few decades: a computational branch
      • simulating complex phenomena
    • Today: data exploration (eScience)
      • unifying theory, experiment, and simulation
      • Data captured by sensors, instruments, or generated by simulator
      • Processed by humans and software
      • Information / knowledge stored in computer
      • Analyzes database / collection content using data management and statistics
      • Network and Web Science
    Data  Information  Knowledge This is the realm of Web analytics!
  • 9. What is web analytics?
    • The Web Analytics Association (WAA) defines Web analytics as:
      • the measurement, collection, analysis, and reporting of Internet data for the purposes of understanding and optimizing Web usage
      • ( http://www.webanalyticsassociation.org/ )
    • Shares common theoretical and methodology characteristics with all forms of log analysis (e.g., Intranet logs, systems logs, OPAC logs, search logs, etc.)
  • 10. Let’s break that definition down …
    • Collection - accumulate and store over a period of time
    • Internet data - internet facts and statistics collected together for reference or analysis
    • Measurement – ascertain the size, amount, or degree of something by using an instrument or device
    • Analysis - examine methodically the structure of information for purposes of explanation and interpretation.
    • Reporting - giving a spoken or written account of something that one has investigated.
    • Understanding - perceive the significance, explanation, or cause of something
    • Optimizing - make the best or most effective use of a resource
    • Web usage – employ or deploy something as a means of accomplishing a purpose or achieving a result
    Data Information Knowledge
  • 11.
    • How is the data collected?
  • 12. W3C Extended Log Format -Variety of fields for examining visitors to Web sites. Other common format is NCSA Separate Log that is composed of three logs Common log – actions on the server, Referral log – where they came from, and Agent log – stuff about the client computer Rather than service-side logging, other methods such as page tagging, image cookies, Flash cookies, etc. but the data is still stored in a log. W3C Extended Log Format
  • 13.
    • Okay, that’s collection ?
    • What about analysis and reporting ?
  • 14. Variety of tools help make sense of this log data
  • 15.
    • With that context , let’s look at the foundations aspects …
  • 16. Theoretical Foundations
    • Web analytics is based on the behaviorism paradigm
    • Behaviorism – an approach focused on the outward behavioral aspects of thought and emphases the observed behaviors
    • Behaviorism – Pavlov, Watson, and Skinner
    Burrhus Frederic Skinner John B. Watson Ivan Petrovich Pavlov
  • 17. Behaviorism Characteristics
    • Inductive , data-driven and characterized by empirical observation of measurable behavior
    • Grounded on somebody doing something in a situation ( all the environmental and situational features are embedded behaviors)
    • Critics of behaviorism as a psychological theory have issues with rejection of mental processes .
    • I agree - people are more than “ mediators between behavior and the environment ” (Skinner, 1993, p 428) (c.f.c., social learning theory) … however, don’t throw out the baby with the bath water
  • 18. What is a Behavior?
    • … an observable activity of a person, animal, team, organization, or system.
    • One can classify behaviors into three general categories. Behaviors are
    • something that one can detect and record
    • actions or specific goal-driven events with some purpose other than the specific action that is observable
    • reactive responses to environmental stimuli
  • 19. What is a Behavior?
    • Behavior is the essential construct of the behaviorism and of web analytics
    • Logs record behaviors of users and systems (records behavior but can’t tell affective , cognitive , or situational aspects .. yet, but we’re working on it! )
    • A behavior is the key variable (i.e., an entity representing a set of events where each event may have a different value )
  • 20.
    • can view the data collected in log files as trace data
    • people conducting the activities of their daily lives many times create things, create marks, induce wear, or reduce some existing material
    • within the confines of research, these things, marks, and wear become data
    • classically, trace data are the physical remains of people ’ s interaction
    Data Collection: Trace Data Wear on a carpet Trash heap Surfing web
  • 21. Trace Data
    • In the past, trace data was often time consuming to gather and process, making such data costly.
    • logging software makes collecting trace data on the Internet easy and cheap
    • Log data is controlled accretion data , where the researcher or some other entity alters the environment in order to create the accretion data
    • With the user of client apps (such as desktop search bars), the collection of data is nearly unlimited from a technology perspective
    What is cool about trace data for researchers?
  • 22. Data Collection
    • Log data/trace data has significant advantages as a data collection approach for the study and investigation of behaviors, including:
    • Scale : not a limiting factor as in lab user studies
    • Power : large sample size for inference testing; in fact, so large must account for the size effect
    • Scope : naturalistic; researchers can investigate range of interactions in a multi-variable context
    • Location : can collect in distributed environments
    • Duration : collect log data over an extended period
  • 23. Methodological Foundations
    • Use of logs to collect trace data is an unobtrusive methods (a.k.a., non-reactive or low-constraint). Unobtrusive methods …
    • allows data collection without directly interfering into the context and,
    • does not require a direct response from participants
    Customer Behavior (video) Chemistry (surface marking)
  • 24. Methodological Foundations
    • Three justifications for unobtrusive methods:
    • Uncertainty principle : researchers interjected into an environment become part of the system
    • Observer effect : difference that is made to an activity or a person ’ s behaviors by being observed
    • Observer bias : observers overemphasize behavior they expect to find and fail to notice behavior they do not expect
    • Trace data helps in overcoming the Uncertainty principle , Observer effect , and Observer bias in the data collection. Note: Observer bias for data collection but not data analysis
    Example: ethnography studies (where the researcher “bird dogs” a study participant Example: no one searches for porn in a lab study of Web searching Example: is why medical trials are double blind rather than single blind
  • 25. Methodological Foundations
    • Inherent characteristics in the method of log data collection; Web analytics has issues to address as a result:
    • Abstraction – how does one relate low-level data to higher-level concepts?
    • Selection – how does one separate the necessary from unnecessary data?
    • Reduction – how does one reduce the complexity and size of the data set?
    • Context – how does one interpret the significance of events?
    • Evolution – how can one collect data without impacting application deployment or use?
  • 26.
    • Okay, nice but how to we apply it …
  • 27. Web analytics process
    • Every consulting firm has a web analytics process … (which is fine)
    • However, the effective ones all boil down to four essential steps
  • 28. Essential steps to any effective web analytics process Typically counts. Basically, data collection
    • Examples:
    • time stamp
    • referral URL
    • query term
    Typically ratios. Data becomes metrics. Counts and ratios infused with business strategy. Online goals, objectives, or standards for organization.
    • Examples:
    • time on page
    • bounce rate
    • unique visitors
    • Examples:
    • conversion rate
    • average order value
    • task completion rate
    • Examples:
    • save money
    • make money
    • marketshare
    Collection of data Processing of data into information Developing key performance indicators Formulating online strategy Drives Drives Drives Drives Drives Drives
  • 29. Three types ( plus 1 ) of Web analytics metrics Implementation
    • Count — the most basic unit of measure; a single number.
    • Ratio — typically, a count divided by a count , although a ratio can use either a count or a ratio in the numerator or denominator.
    • KPI ( Key Performance Indicator ) — can be either a count or a ratio , it is frequently a ratio. A KPI is infused with business strategy , and therefore the set of appropriate KPIs typically differs between site and process types.
    • Dimension - data that can be used to define various types of segments and represents a fundamental dimension of visitor behavior or site dynamics. Typically, not associated with a number .
    Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 30. Can be applied to three levels of granularity
    • Aggregate — Total site traffic for a defined period of time. ( typically used for market comparisons )
    • Segmented — A subset of the site traffic for a defined period of time, filtered in some way to gain greater analytical insight. ( by developing personas and profiles in Google Analytics ).
    • Individual — Activity of a single Web visitor for a defined period of time. ( excellent for persona developing and outlier analysis )
    Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 31. Classifications of Metrics
    • Building Block – foundational metrics
    • Visit Characterization – metrics aimed at understanding visits, either single or aggregate
    • Content Characterization – metrics aimed at understanding content or its use
    • Conversion – metrics aimed at linking visits and content
    Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 32. Building Block
    • Page : A page is an analyst definable unit of content .
    • Page Views : The number of times a page was viewed .
    • Visits/Sessions : A visit is an interaction by an individual, with a website consisting of one or more requests for a page .
    • Unique Visitors : The number of inferred individual people , within a designated reporting timeframe, with activity consisting of one or more visits to a site.
      • New Visitor : The number of Unique Visitors with activity including a first-ever Visit to a site during a reporting period
      • Repeat Visitor : The number of Unique Visitors with activity consisting of two or more Visits to a site during a reporting period.
      • Return Visitor : The number of Unique Visitors with activity consisting of a Visit to a site during a reporting period and where the Unique Visitor also Visited the site prior to the reporting period
    Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 33. Visit Characteristics
    • Entry Page : The first page of a visit.
    • Landing Page : A page intended to identify the beginning of the user experience .
    • Exit Page : The last page on a site accessed during a visit, signifying the end of a visit/session.
    • Visit Duration : The length of time in a session.
    • Referrer : The referrer is the page URL that originally generated the request for the current page view or object.
    • Click-through : Number of times a link was clicked by a visitor.
    • Click-through Rate : The number of click-throughs for a specific link divided by the number of times that link was viewed.
    • Page Views per Visit : The number of page views in a reporting period divided by number of visits in the same reporting period.
    Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 34. Content Characterization
    • Page Exit Ratio : Number of exits from a page divided by total number of page views of that page
    • Single Page Visits : Visits that consist of one page regardless of the number of times the page was viewed.
    • Single Page View Visits (Bounces) : Visits that consist of one page-view .
    • Bounce Rate : Single page view visits divided by entry pages.
    Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 35. Conversion Metrics
    • Event : Any logged or recorded action that has a specific date and time assigned to it by either the browser or server
    • Conversion : A visitor completing a target action
    Burby, J., Brown, A., and WAA Standard Committee (2007) Web Analytics Definitions. Web Analytics Association. Available at: http://www.webanalyticsassociation.org/resource/resmgr/PDF_standards/WebAnalyticsDefinitionsVol1.pdf
  • 36. Translating these metrics
    • Translating these metrics into meaningful and accurate knowledge is not always easy.
    • Real world example – the hotel problem ( excellent illustration of the importance of proper period selection )
  • 37. The hotel
    • Use Daily Uniques
    Sam Ted Jane Sam Scott Jane Sam Ara Sam Chi Sam Tom Sam Yen Sam Tim Jane Jane Jane Jane Jane Rooms 1 2 3 Days 1 2 3 4 5 6 7 3 3 3 3 3 3 3
    • Total Daily Uniques = 21
    • Use Weekly Uniques
    1 1 Count Count 7
    • Total Weekly Uniques = 9
  • 38. Bottom line: the time qualifier matters!
    • So, can’t just add daily uniques to get weekly uniques
    • Have to scrub the data
    • This just one example of many issues that one can face when digging into the data in order to get meaningful web analytics data !
  • 39. 50 minutes = Can’t Cover Everything
    • … some starting points for further reading
  • 40. Research Work (mine)
    • Book: Jansen, B. J., Spink, A., and Taksa, I. (2009) Handbook of Research on Web Log Analysis , Hershey, PA: Idea Group Publishing
      • First chapter on theory of log analysis is free!
    • Lecture: Jansen, B. J. (2009) Understanding User – Web Interactions via Web Analytics . Morgan-Claypool Lecture Series. Gary. Marchionini (Ed). Morgan-Claypool: San Rafael, CA.
      • manuscript about Web Analytics, soup to nuts
      • companion website (free): http://faculty.ist.psu.edu/jjansen/webanalytics/understanding_web_analytics.html
  • 41. Research Work (mine)
    • Article: Jansen, B. J. 2006. Search log analysis: What is it; what's been done; how to do it . Library and Information Science Research, 28(3), 407-432 .
    • http://faculty.ist.psu.edu/jjansen/academic/pubs/jansen_search_log_analysis.pdf
  • 42. Great ‘how to books’ for web analytics
    • Web Analytics: An Hour a Day by Avinash Kaushik (Jun 5, 2007)
    • Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity by Avinash Kaushik (Oct 2009)
    • Advanced Web Metrics with Google Analytics , 2nd Edition by Brian Clifton (Mar 15, 2010)
    • Web Analytics Demystified: A Marketer's Guide to Understanding How Your Web Site Affects Your Business by Eric Peterson (Mar 2004)
  • 43. Thanks! (welcome questions / discussion!) Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University
  • 44.
    • Before we end …
  • 45. Follow-on Discussion
    • Happy to chat with anyone (get with me either today or contact me via email)
    • Email [email_address]
    • LinkedIn http://www.linkedin.com/in/jjansen
    • Twitter jimjansen
  • 46. Again, thanks! Web Analytics Jim Jansen Associate Professor, The Pennsylvania State University