Web Analytics with R

12,615 views

Published on

The aim of this presentation is to:
1. Encourage web data analysts to use R instead of spreadsheets.
2. Help those wanting to learn R to get started.
3. Build interest amongst the R community in developing packages for web analytics.

The presentation briefly discusses what Web Analytics is, why R should be used instead of spreadsheets for web data analysis, ways to learn R, and how to put R to practice in web analytics.

In this presentation Johann shares his experience in creating his first open-source R package, ganalytics, used for accessing Google Analytics data. Reflecting on his journey to date in learning R, Johann gives tips to newcomers in helping them succeed in using R for their day to day work and in creating their own packages. A demonstration of ganalytics is included along with an invitation to the community to get involved in its future development.

The R script used in the demonstration can be located at the following gist: https://gist.github.com/jdeboer/6569077

Published in: Technology, Business
4 Comments
48 Likes
Statistics
Notes
No Downloads
Views
Total views
12,615
On SlideShare
0
From Embeds
0
Number of Embeds
288
Actions
Shares
0
Downloads
324
Comments
4
Likes
48
Embeds 0
No embeds

No notes for slide
  • Digital Analytics Manager at Open Universities Australia
    Focused on web usability and analytics
    Background in computer systems
    Learning and using R for around 2 years
  • I'll briefly explain:
    what Web Analytics is
    why R should be used instead of Excel for web data analysis
    ways to learn R, and
    how to put R to practice in web analytics
  • Eugene Dubossarsky
    Jeff Leek and Rober Peng of John Hopkins School of Public Health
    Hadley Wickham
    R In A Nutshell by Joseph Adler
    Quick R by Rob Kabacoff
    Try R at Code School
    R Podcast by Eric Nantz
    #rstats
  • Web Analytics with R

    1. 1. Web Analytics with R Johann de Boer
    2. 2. Purpose of this presentation 1. Encourage web data analysts to move to R and away from Excel! 2. Help those wanting to learn R to get started. 3. Build interest amongst the R community in developing packages for web analytics.
    3. 3. What is web analytics? Measurement and analysis of (aggregated) internet data for purposes of optimising website usage.
    4. 4. Web analytics data
    5. 5. Dimensions and metrics
    6. 6. Web analytics data Dimensions Metrics User Custom dimensions, eg. existing customer flag Unique users, avg revenue per user Device Browser and OS, isTablet, isMobile Unique devices, total visits Session Traffic source, date and time of visit, landing page Time on site, pageviews per visit, goal completions Hit Page URL, page title, event type, product name Time on page, page loading time, transaction amount
    7. 7. Metrics Unique Visitors, New Visits, % New Visits, Visits, Bounces, Bounce Rate, Bounce Rate, Visit Duration, Avg. Visit Duration, Organic Searches, Impressions, Clicks, Cost, CPM, CPC, CTR, Cost per Transaction, Cost per Goal Conversion, Cost per Conversion, RPC, ROI, Margin, Goal 1 Starts, Goal Starts, Goal 1 Completions, Goal Completions, Goal 1 Value, Goal Value, Per Visit Goal Value, Goal 1 Conversion Rate, Goal Conversion Rate, Goal 1 Abandoned Funnels, Abandoned Funnels, Goal 1 Abandonment Rate, Total Abandonment Rate, Data Hub Activities, Page Value, Entrances, Entrances / Pageviews, Pageviews, Pages / Visit, Unique Pageviews, Time on Page, Avg. Time on Page, Exits, % Exit, Results Pageviews, Total Unique Searches, Results Pageviews / Search, Visits with Search, % Visits with Search, Search Depth, Search Depth, Search Refinements, % Search Refinements, Time after Search, Time after Search, Search Exits, % Search Exits, Goal 1 Conversion Rate, Goal Conversion Rate, Per Search Goal Value, Page Load Time (ms), Page Load Sample, Avg. Page Load Time (sec), Domain Lookup Time (ms), Avg. Domain Lookup Time (sec), Page Download Time (ms), Avg. Page Download Time (sec), Redirection Time (ms), Avg. Redirection Time (sec), Server Connection Time (ms), Avg. Server Connection Time (sec), Server Response Time (ms), Avg. Server Response Time (sec), Speed Metrics Sample, Document Interactive Time (ms), Avg. Document Interactive Time (sec), Document Content Loaded Time (ms), Avg. Document Content Loaded Time (sec), DOM Latency Metrics Sample, Screen Views, Screen Views, Unique Screen Views, Unique Screen Views, Screens / Session, Screens / Session, Time on Screen, Avg. Time on Screen, Total Events, Unique Events, Event Value, Avg. Value, Visits with Event, Events / Visit, Transactions, Ecommerce Conversion Rate, Revenue, Average Value, Per Visit Value, Shipping, Tax, Total Value, Quantity, Unique Purchases, Average Price, Product Revenue, Average QTY, Local Revenue, Local Shipping, Local Tax, Local Product Revenue, Social Actions, Unique Social Actions, Actions Per Social Visit, User Timing (ms), User Timing Sample, Avg. User Timing (sec), Exceptions, Exceptions / Screen, Crashes, Crashes / Screen, Custom Metric Value Dimensions Visitor Type, Count of Visits, Days Since Last Visit, User Defined Value, Visit Duration, Referral Path, Full Referrer, Campaign, Source, Medium, Source / Medium, Keyword, Ad Content, Social Network, Social Source Referral, Ad Group, Ad Slot, Ad Slot Position, Ad Distribution Network, Query Match Type, Matched Search Query, Placement Domain, Placement URL, Ad Format, Targeting Type, Placement Type, Display URL, Destination URL, AdWords Customer ID, AdWords Campaign ID, AdWords Ad Group ID, AdWords Creative ID, AdWords Criteria ID, Goal Completion Location, Goal Previous Step - 1, Goal Previous Step - 2, Goal Previous Step - 3, Browser, Browser Version, Operating System, Operating System Version, Mobile (Including Tablet), Tablet, Mobile Device Branding, Mobile Device Model, Mobile Input Selector, Mobile Device Info, Mobile Device Marketing Name, Device Category, Continent, Sub Continent Region, Country / Territory, Region, Metro, City, Latitude, Longitude, Network Domain, Service Provider, Flash Version, Java Support, Language, Screen Colors, Screen Resolution, Endorsing URL, Display Name, Social Activity Post, Social Activity Timestamp, Social User Handle, User Photo URL, User Profile URL, Shared URL, Social Tags Summary, Originating Social Action, Social Network and Action, Hostname, Page, Page path level 1, Page path level 2, Page path level 3, Page path level 4, Page Title, Landing Page, Second Page, Exit Page, Previous Page Path, Next Page Path, Page Depth, Site Search Status, Search Term, Refined Keyword, Site Search Category, Start Page, Destination Page, App Installer ID, App Version, App Name, App ID, Screen Name, Screen Depth, Landing Screen, Exit Screen, Event Category, Event Action, Event Label, Transaction, Affiliation, Visits to Transaction, Days to Transaction, Product SKU, Product, Product Category, Currency Code, Social Source, Social Action, Social Source and Action, Social Entity, Social Type, Timing Category, Timing Label, Timing Variable, Exception Description, Experiment ID, Variation, Custom Dimension , Custom Variable (Key 1), Custom Variable (Value 01), Date, Year, Month of the year, Week of the year, Day of the month, Hour, Month, Week, Day, Day of week, Day of week name, Hour of Day, Month of Year, Week of Year, ISO week of the year 265 dimensions and metrics in Google Analytics and growing!
    8. 8. Google Analytics
    9. 9. Source: Charles Farina, e-nor.com blog, Published 9 July 2012 The Web Analytics market
    10. 10. Google Analytics - now Universal
    11. 11. Google Analytics APIs ● Data collection ○ Collection APIs and SDKs ● Data extraction ○ Core Reporting API ■ Metadata API ○ Real-time Reporting API ○ Multi-Channel Funnels Reporting API
    12. 12. Why use R for web analytics?
    13. 13. R is free, open source and popular!
    14. 14. Spreadsheets are rigid and fragile
    15. 15. R is agile and robust
    16. 16. 7 reasons to use R instead of Excel 1. Captures each step in your analysis 2. Makes it easier to review and fix your mistakes 3. Easy to reproduce and reuse analyses 4. Join datasets from multiple sources 5. Powerful ways to analyse and visualise your data 6. Automate retrieval of your data via the Core Reporting API 7. Saves time!
    17. 17. Learning and using R
    18. 18. In the beginning...
    19. 19. plyr ggplot2 lubridate reshape2 devtools httr roxygen2 markdown git (version control) Recommended tools and packages
    20. 20. Google Analytics packages for R ● r-google-analytics ○ By Google but stopped working for a long time ● rga ○ By Bror Skardhamar, popular and active ● ganalytics ○ Written by me to create an abstraction from the Core Reporting API protocol
    21. 21. ganalytics Automate extraction of Google Analytics data
    22. 22. Make querying GA data from R an easy and interactive experience ● Queries are manipulated on the fly ● Defining filter and segmentation expressions is easy ● Checks queries for errors before sending ○ corrects them automatically in some cases too! ● Creates a level of abstraction from the Core Reporting API - easier to extend functionality
    23. 23. Query expressions ga:keyword@=buy (search traffic keywords containing “buy”) A single expression comprises of: ● a variable - a dimension or metric ● an operator - e.g. equals, contains, regular expression, greater than, does not equal, ... ● an operand - a number (metric) or a character string (dimension)
    24. 24. Combining expressions ● Expressions can be joint using OR and AND. ● OR takes precedence over AND always, and expressions cannot be grouped. ga:keyword@=buy;ga:city=~^(Sy dney|Melbourne)$,ga:isTablet= =Yes (search traffic keywords containing “buy” AND [city is [Sydney OR Melbourne] OR via a tablet])
    25. 25. Writing expressions with ganalytics ● Filter to pass to core reporting API ga:keyword@=buy;ga:city=~^(Sydney|Melbourne)$,g a:isTablet==Yes ● Using ganalytics to write this GaAnd( GaExpr('keyword', '@', 'buy'), GaOr( GaExpr('city', '~', '^(Sydney|Melbourne)$'), GaExpr('isTablet', '=', ‘Yes’) ) )
    26. 26. ganalytics Demo gist.github.com/jdeboer/6569077
    27. 27. How does traffic from desktop, mobile and tablet users change throughout the day and over the week? Average number of visits per hour and day - split by desktop, mobile and tablet
    28. 28. R + ggplot2 + plyr + ganalytics =
    29. 29. Get involved! Open source R package development is fun!
    30. 30. Package development ● Use RStudio with Git version control ○ Open a free GitHub account ○ Use Roxygen2 for generating your documentation and NAMESPACE file ○ RStudio integrates with Git, Roxygen2 and RTools to make the package build process easy ● Hadley Wickham is a great help ○ devtools package - great for installing straight from a GitHub repository ○ read his online book “Advanced R Programming” - easy to follow package development steps
    31. 31. Learn more... ● Google Analytics: #ganalytics ○ Video lessons: google.com.au/analytics/iq.html ○ Reference: developers.google.com/analytics ● Learn R: #rstats ○ Presciient: presciient.com/courses ○ Code School: tryr.codeschool.com ○ Coursera: coursera.org/course/compdata ○ Intro to R videos by Google: t.co/FQ8DvZEdRW ● Package development: adv-r.had.co.nz ● ganalytics: github.com/jdeboer/ganalytics ● Follow me on Twitter: @johannux

    ×