John Tukey: “ If we need a short suggestion of what exploratory data analysis is, I would suggest that 1. It is an attitude AND 2. A flexibility AND 3. Some graph paper (or transparencies, or both).”
85 Dimensions 92 Metrics 177 different values in 14 different groups
The main challenges are three: Making the transition from passive data collection to active and exploratory data analysis. Defining a collection-level page Dealing with data exports from 5 very different websites
8 example URLs
Reduces to 5 different EAD ID values. The last one, since it is labelled a test, I eventually removed from the data set. There were only a handful of these, and they only occurred at ECU and Maryland.
Just shy of 100k URLs exported. Most were analyzed, since they had EAD IDs in them, but a large section of the Duke URLs were easily separated post-export, after the initial data analysis revealed that all of their “search string” URLs only contained queries, not EAD IDs.
An admittedly poor visualization, but I couldn’t resist a reference to the Olympic games.
Explain UPVs here
This type of distribution was purely a coincidence, but it was nice that the repositories represented a diverse range of yearly PVs / UPVs.
AAA is clearly the odd one out in this grouping. The reason: their most-viewed collections are digitized collections, which contain far more potential collection-page views, each of which receiving a significant amount of use. And that’s interesting (if not completely expected) news: these digitized collections are seeing high amounts of online ‘use’.
Explain EPVHs here. Only about 8760 hours in an entire year, mind you. Ideally, I would like to add a third variable to the mix: “reading room hours” per collection. But, for now, we can correlate EPVHs with UPVs.
All 5 datasets combined. I don’t think that too much should be read into this particular visual, however (especially due to the heterogeneous nature of EAD delivery frameworks).
Duke Context : Graph shows increase in % of visits from mobile devices to Library web resources at Duke since Dec. 2009. The % of visits from mobile devices is higher for finding aids than any other category of library resource, except blogs. % of mobile visits has grown more than 5x since Spring 2010 Over the last two months (Jun-Aug 2012), mobile visits have accounted for over 9% of all visits to finding aids. Not a huge percentage, but growth rate is significant We can’t afford to ignore this trend, we should make our finding aid interfaces more mobile-friendly (more on that later)
Understanding mobile visitors Slide shows the average distribution of mobile visits to finding aids by device across 5 repositories: AAA, Duke, ECU, UMD, and Miami (July 2011-July 2012) More users visit finding aids from iPads than from any other device. Not surprising.
What do mobile visitors do once they discover finding aids? It’s hard to tell, but they don’t stick around long. Perhaps because viewing most finding aids on a mobile device is a painful experience, or at least more painful than viewing them on a larger screen. Both in the depth of visit, and in the duration of visit, it seems obvious that there is less interaction from a user when on a mobile device. On average Mobile visitors: visit 39% fewer pages per visit spend 55% less time on the site
How do mobile visitors get to our finding aids? Charts show distribution of traffic sources for mobile visits (left) vs. all visits (right) You can see that mobile visitors are: More likely to discover finding aids through web search (Google, Bing, Yahoo, etc.) [this seems a bit counter-intuitive – I would expect a higher percentage of click-throughs from Facebook, Twitter, Wikipedia, and other popular mobile-enhanced sites. Searching on a mobile device seems more cumbersome, but maybe not as difficult as browsing to finding aids from a library website…] Half as likely to come from referring sites like library webpages, the library catalog, Wikipedia, Facebook, etc.
Although referral traffic makes up a relatively small portion of mobile visits to finding aids (~15% across the 4 repositories), there are some noticeable differences in referral traffic when comparing mobile visits to all visits. These charts show the top referring sites for mobile visits (left) vs. all visits (right). Some highlights: Compared to all visitors, mobile visitors are: 1.4 times less likely to come from University Webpages (e.g. duke.edu, ecu.edu) 1.7 times more likely to come from Wikipedia (both mobile and full versions of the site). AAA’s stats are the only ones that are close to even, here, between mobile and all visits. The rest have a higher percentage of traffic driven via Wikipedia on Mobile devices when compared to the whole. 2.5 times more likely to come from Facebook (both mobile and full version) Over 3.1 times as likely to come from other Google Services (Google Reader, Google Groups, and other Google related sites)… or, if you remove AAA’s data, that goes up to 6 times more likley* *[not sure exactly what these services are. See: http://support.google.com/googleanalytics/bin/answer.py?hl=en&answer=55587 ]
Average percentage of Wikipedia referrals amongst all other referrals.
Broken down for each 5 repositories.
Total sum of Wikipedia referrals from all 5 repositories. It’s certainly going up, so it’s more instructive to look at the absolute numbers in this case!
Most of the yearly increase is due to AAA’s influence, although Maryland and Miami also show clear increases for each year.
Move to an “archival dashboard” (for both researchers and archivists), somewhat like the real-time tennis stats pictured here. Additionally, were it not for Exploratory Data Analysis practices, additional metrics in tennis (such as the aggressive ratio), would never have been defined. The more we explore the potential of quantitative archival metrics, the more likely that we will create similar sorts of combined metrics that will enable us to make a whole host of new data-informed decisions (for accessioning, processing, re-describing, digitizing, etc.)
Five repositories, one dataset
Five Repositories, OneDatasetUSING EXPLORATORY DATA ANALYSIS TECHNIQUES TO TRACKPATTERNS OF USE
Mark CusterNoah HuffmanJennie Levine KniesKyle RimkusSara Snyder
Outline of Today’s talk1. Introduction: Exploratory and preliminary nature of the study2. Overview of website / EAD-portal metrics for three years3. The path to an aggregate data set and difficulties4. Collection-level metrics: one year, in depth5. Visits from Mobile devices over the years6. Wikipedia referrals over the years7. Conclusion: Next Steps
CHART TITLE Other 16% iPad 37%Android 22% iPhone 25%
Mobile Visit Behavior Avg. Pages/Visit • Mobile visits - 1.83 • All visits – 3.04 Avg. Time on Site • Mobile visits - 1:07 • All visits - 2:31From Google Analytics Data (July 1, 2011-June 30, 2012) from:AAA, Duke University, East Carolina University, University of Maryland, andUniversity of Miami
Traffic Sources: Mobile Visits vs. All Visits Direct Traffic Direct Traffic 12% 13%Referral Traffic 15% Referral Traffic 29% Search Traffic 59% Search Traffic 72% Mobile Visits All Visits From Google Analytics Data (July 1, 2011-June 30, 2012) from: AAA, Duke University, East Carolina University, University of Maryland, and University of Miami
Referring Sites: Mobile Visits vs. All Visits University All Other Website 32% 33% All Other 36% University Website 46%Google Services 6% Google Services Facebook 2% Facebook 6% Wikipedia 3% Wikipedia 23% 13% Mobile Visits – Top Referrers From Google Analytics Data (July 1, 2011-June 30, 2012) from: AAA, Duke University, East Carolina University, University of Maryland, and University of Miami
8: Conclusion Next StepsHow best to define a collection-level page? Should we?Which metrics are most useful for archivists, researchers, etc.?Beyond the collection, how can we analyze these data sets by subject / topic?How best to share this data?How else can it be analyzed?
8: Conclusion Next StepsHow best to define a collection-level page? Should we?Which metrics are most useful for archivists, researchers, etc.? My hunch: UPVs EPVHs Reading Room Hours Reference Consultations And?Beyond the collection, how can we analyze these data sets by subject / topic?How best to share this data?How else can it be analyzed?
Questions? Mark Custer Noah Huffman Jennie Levine Knies Kyle Rimkus Sara Snyder