Five repositories, one dataset
Upcoming SlideShare
Loading in...5
×
 

Five repositories, one dataset

on

  • 346 views

 

Statistics

Views

Total Views
346
Views on SlideShare
341
Embed Views
5

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 5

https://twimg0-a.akamaihd.net 3
https://si0.twimg.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • John Tukey: “ If we need a short suggestion of what exploratory data analysis is, I would suggest that 1. It is an attitude AND 2. A flexibility AND 3. Some graph paper (or transparencies, or both).”
  • 85 Dimensions 92 Metrics 177 different values in 14 different groups
  • The main challenges are three: Making the transition from passive data collection to active and exploratory data analysis. Defining a collection-level page Dealing with data exports from 5 very different websites
  • 8 example URLs
  • Reduces to 5 different EAD ID values. The last one, since it is labelled a test, I eventually removed from the data set. There were only a handful of these, and they only occurred at ECU and Maryland.
  • Just shy of 100k URLs exported. Most were analyzed, since they had EAD IDs in them, but a large section of the Duke URLs were easily separated post-export, after the initial data analysis revealed that all of their “search string” URLs only contained queries, not EAD IDs.
  • An admittedly poor visualization, but I couldn’t resist a reference to the Olympic games.
  • Explain UPVs here
  • This type of distribution was purely a coincidence, but it was nice that the repositories represented a diverse range of yearly PVs / UPVs.
  • AAA is clearly the odd one out in this grouping. The reason: their most-viewed collections are digitized collections, which contain far more potential collection-page views, each of which receiving a significant amount of use. And that’s interesting (if not completely expected) news: these digitized collections are seeing high amounts of online ‘use’.
  • Explain EPVHs here. Only about 8760 hours in an entire year, mind you. Ideally, I would like to add a third variable to the mix: “reading room hours” per collection. But, for now, we can correlate EPVHs with UPVs.
  • All 5 datasets combined. I don’t think that too much should be read into this particular visual, however (especially due to the heterogeneous nature of EAD delivery frameworks).
  • Duke Context : Graph shows increase in % of visits from mobile devices to Library web resources at Duke since Dec. 2009. The % of visits from mobile devices is higher for finding aids than any other category of library resource, except blogs. % of mobile visits has grown more than 5x since Spring 2010 Over the last two months (Jun-Aug 2012), mobile visits have accounted for over 9% of all visits to finding aids. Not a huge percentage, but growth rate is significant We can’t afford to ignore this trend, we should make our finding aid interfaces more mobile-friendly (more on that later)
  • Understanding mobile visitors Slide shows the average distribution of mobile visits to finding aids by device across 5 repositories: AAA, Duke, ECU, UMD, and Miami (July 2011-July 2012) More users visit finding aids from iPads than from any other device. Not surprising.
  • What do mobile visitors do once they discover finding aids? It’s hard to tell, but they don’t stick around long. Perhaps because viewing most finding aids on a mobile device is a painful experience, or at least more painful than viewing them on a larger screen. Both in the depth of visit, and in the duration of visit, it seems obvious that there is less interaction from a user when on a mobile device. On average Mobile visitors: visit 39% fewer pages per visit spend 55% less time on the site
  • How do mobile visitors get to our finding aids? Charts show distribution of traffic sources for mobile visits (left) vs. all visits (right) You can see that mobile visitors are: More likely to discover finding aids through web search (Google, Bing, Yahoo, etc.) [this seems a bit counter-intuitive – I would expect a higher percentage of click-throughs from Facebook, Twitter, Wikipedia, and other popular mobile-enhanced sites. Searching on a mobile device seems more cumbersome, but maybe not as difficult as browsing to finding aids from a library website…] Half as likely to come from referring sites like library webpages, the library catalog, Wikipedia, Facebook, etc.
  • Although referral traffic makes up a relatively small portion of mobile visits to finding aids (~15% across the 4 repositories), there are some noticeable differences in referral traffic when comparing mobile visits to all visits. These charts show the top referring sites for mobile visits (left) vs. all visits (right). Some highlights: Compared to all visitors, mobile visitors are: 1.4 times less likely to come from University Webpages (e.g. duke.edu, ecu.edu) 1.7 times more likely to come from Wikipedia (both mobile and full versions of the site). AAA’s stats are the only ones that are close to even, here, between mobile and all visits. The rest have a higher percentage of traffic driven via Wikipedia on Mobile devices when compared to the whole. 2.5 times more likely to come from Facebook (both mobile and full version) Over 3.1 times as likely to come from other Google Services (Google Reader, Google Groups, and other Google related sites)… or, if you remove AAA’s data, that goes up to 6 times more likley* *[not sure exactly what these services are. See: http://support.google.com/googleanalytics/bin/answer.py?hl=en&answer=55587 ]
  • Average percentage of Wikipedia referrals amongst all other referrals.
  • Broken down for each 5 repositories.
  • Total sum of Wikipedia referrals from all 5 repositories. It’s certainly going up, so it’s more instructive to look at the absolute numbers in this case!
  • Most of the yearly increase is due to AAA’s influence, although Maryland and Miami also show clear increases for each year.
  • Move to an “archival dashboard” (for both researchers and archivists), somewhat like the real-time tennis stats pictured here. Additionally, were it not for Exploratory Data Analysis practices, additional metrics in tennis (such as the aggressive ratio), would never have been defined. The more we explore the potential of quantitative archival metrics, the more likely that we will create similar sorts of combined metrics that will enable us to make a whole host of new data-informed decisions (for accessioning, processing, re-describing, digitizing, etc.)

Five repositories, one dataset Five repositories, one dataset Presentation Transcript

  • Five Repositories, OneDatasetUSING EXPLORATORY DATA ANALYSIS TECHNIQUES TO TRACKPATTERNS OF USE
  • Mark CusterNoah HuffmanJennie Levine KniesKyle RimkusSara Snyder
  • Outline of Today’s talk1. Introduction: Exploratory and preliminary nature of the study2. Overview of website / EAD-portal metrics for three years3. The path to an aggregate data set and difficulties4. Collection-level metrics: one year, in depth5. Visits from Mobile devices over the years6. Wikipedia referrals over the years7. Conclusion: Next Steps
  • 1: Introduction
  • 2: Website Metrics, FY 2009 - FY 2011
  • 3: The Path and its Difficulties
  • /collections/findingaids/downgall.htm%20and%20http:/www.aaa.si.edu/collectionsonline/downgall/overview.htm/collections/oralhistories%20/tranSCRIPTs/levine02.htm/search?q=cache:zqG_DxtU1AIJ:proust.library.miami.edu/findingaids/?p=collections/controlcard&id=480+orestes+miami&cd=13&hl=en&ct=clnk&gl=us/translate_c?hl=ar&sl=en&u=http://proust.library.miami.edu/findingaids/%3Fp=collections/controlcard&id=247&prev=/search%3Fq=batista%2Bcollection&hl=ar&client=firefox-a&channel=s&rls=org.mozilla:ar:official&sa=N&rurl=translate.google.com.eg&usg=ALkJrhiuq78PNcimpnEph3V5gEnNNUZuNw/search?q=cache:wkJ778Y-NEgJ:test.lib.umd.edu/archivesum/actions.DisplayEADDoc.do%3Fsource%3DMdU.ead.histms.0008.xml%26style%3Dead+historical+Davis+family+Texas&cd=6&hl=en&ct=clnk&gl=us/digitalcollections/rbmscl/inv/results?q=testimonial+advertising&fq=duke.collection%3Ainv&start=0&rows=20&f=keyword&t=testimonial+advertising&btnG.x=0&btnG.y=0/url_result?ctw_=sT,eCR-EJ,bT,hT,uaHR0cDovL3d3dy5saWIudW1kLmVkdS9hcmNoaXZlc3VtL2h0bWwvTWRVLmVhZC5saXRtcy4wMDA3Lmh0bWw=,qlang=ja|for=0|sp=-5|fs=100%|fb=0|fi=0|fc=FF0000|db=T|eid=CR-EJ,/archivesum/actions.DisplayEADDoc.do?source=/MdU.ead.scpa.0078.test.xml&style=ead
  • /collections/findingaids/downgall.htm%20and%20http:/www.aaa.si.edu/collectionsonline/downgall/overview.htm/collections/oralhistories%20/tranSCRIPTs/levine02.htm/search?q=cache:zqG_DxtU1AIJ:proust.library.miami.edu/findingaids/?p=collections/controlcard&id=480+orestes+miami&cd=13&hl=en&ct=clnk&gl=us/translate_c?hl=ar&sl=en&u=http://proust.library.miami.edu/findingaids/%3Fp=collections/controlcard&id=247&prev=/search%3Fq=batista%2Bcollection&hl=ar&client=firefox-a&channel=s&rls=org.mozilla:ar:official&sa=N&rurl=translate.google.com.eg&usg=ALkJrhiuq78PNcimpnEph3V5gEnNNUZuNw/search?q=cache:wkJ778Y-NEgJ:test.lib.umd.edu/archivesum/actions.DisplayEADDoc.do%3Fsource%3DMdU.ead.histms.0008.xml%26style%3Dead+historical+Davis+family+Texas&cd=6&hl=en&ct=clnk&gl=us/digitalcollections/rbmscl/inv/results?q=testimonial+advertising&fq=duke.collection%3Ainv&start=0&rows=20&f=keyword&t=testimonial+advertising&btnG.x=0&btnG.y=0/url_result?ctw_=sT,eCR-EJ,bT,hT,uaHR0cDovL3d3dy5saWIudW1kLmVkdS9hcmNoaXZlc3VtL2h0bWwvTWRVLmVhZC5saXRtcy4wMDA3Lmh0bWw=,qlang=ja|for=0|sp=-5|fs=100%|fb=0|fi=0|fc=FF0000|db=T|eid=CR-EJ,/archivesum/actions.DisplayEADDoc.do?source=/MdU.ead.scpa.0078.test.xml&style=ead
  • Total Rows of Data Analyzed, FY 20095000045000 101240000350003000025000 43422 124722000015000 7110000 15441 1335 124215000 601 7325 3815 0 AAA Duke ECU Maryland Miami analyzed rows separated rows
  • 4: Collection-level Data
  • 700000600000500000400000300000200000100000 0 AAA Duke ECU Maryland Miami PVs UPVs
  • 700000600000500000400000300000200000100000 0 AAA Duke ECU Maryland Miami PVs UPVs
  • The uneven distributions, as pictured in 5 sets of quintiles100%90%80%70%60%50%40%30%20%10% 0% AAA Duke ECU Maryland Miami 1st 87.27% 70.96% 71.57% 70.47% 65.93% 2nd 9.22% 16.71% 16.10% 16.28% 16.22% 3rd 2.51% 7.55% 7.26% 8.08% 10.61% 4th 0.76% 3.56% 3.56% 3.75% 5.24% 5th 0.24% 1.22% 1.51% 1.42% 2.00%
  • Estimated page view hours per year (EPVHs) in FY 200921,945.59 9,494.46 4,103.40 3,702.08 882.39 EPVHS AAA Duke ECU Maryland Miami
  • AAA: EPVHs vs UPVs400003500030000250002000015000100005000 0 0 200 400 600 800 1000 1200
  • Duke: EPVHs vs UPVs600050004000300020001000 0 0 50 100 150 200 250
  • ECU: EPVHs vs UPVs200018001600140012001000800600400200 0 0 20 40 60 80 100 120
  • Maryland: EPVHs vs UPVs30002500200015001000500 0 0 50 100 150 200 250
  • Miami: EPVHs vs UPVs800700600500400300200100 0 0 5 10 15 20 25 30 35 40
  • All: EPHVs vs. UPVs400003500030000250002000015000100005000 0 0 200 400 600 800 1000 1200
  • The uneven distributions, as pictured in 5 sets of quintiles100%90%80%70%60%50%40%30%20%10% 0% AAA Duke ECU Maryland Miami 1st 87.27% 70.96% 71.57% 70.47% 65.93% 2nd 9.22% 16.71% 16.10% 16.28% 16.22% 3rd 2.51% 7.55% 7.26% 8.08% 10.61% 4th 0.76% 3.56% 3.56% 3.75% 5.24% 5th 0.24% 1.22% 1.51% 1.42% 2.00%
  • 100%90% 83.33%80%70%60%50%40%30%20% 10.50%10% 3.94% 1.66% 0.57% 0% 1ST 2ND 3RD 4TH 5TH
  • 5: Mobile
  • CHART TITLE Other 16% iPad 37%Android 22% iPhone 25%
  • Mobile Visit Behavior Avg. Pages/Visit • Mobile visits - 1.83 • All visits – 3.04 Avg. Time on Site • Mobile visits - 1:07 • All visits - 2:31From Google Analytics Data (July 1, 2011-June 30, 2012) from:AAA, Duke University, East Carolina University, University of Maryland, andUniversity of Miami
  • Traffic Sources: Mobile Visits vs. All Visits Direct Traffic Direct Traffic 12% 13%Referral Traffic 15% Referral Traffic 29% Search Traffic 59% Search Traffic 72% Mobile Visits All Visits From Google Analytics Data (July 1, 2011-June 30, 2012) from: AAA, Duke University, East Carolina University, University of Maryland, and University of Miami
  • Referring Sites: Mobile Visits vs. All Visits University All Other Website 32% 33% All Other 36% University Website 46%Google Services 6% Google Services Facebook 2% Facebook 6% Wikipedia 3% Wikipedia 23% 13% Mobile Visits – Top Referrers From Google Analytics Data (July 1, 2011-June 30, 2012) from: AAA, Duke University, East Carolina University, University of Maryland, and University of Miami
  • 6: Wikipedia
  • 13.50% 13.35% 13.25%13.00%12.50%12.00% 11.98%11.50%11.00% 2009 2010 2011
  • 28.28%27.63% 23.05% 16.27% 14.87% 14.23% 13.50% 11.22% 9.51% 7.40% 6.94% 5.44% 5.51% 5.64% 3.41% AAA DUKE ECU MARYLAND MIAMI
  • 42,000 41,03141,00040,00039,000 38,48338,000 37,01837,00036,00035,000 2009 2010 2011
  • 35,426 33,21031,400 3,974 3,388 3,212 536 546 548 930 887 1,356 178 452 489 AAA DUKE ECU MARYLAND MIAMI
  • 8: Conclusion  Next StepsHow best to define a collection-level page? Should we?Which metrics are most useful for archivists, researchers, etc.?Beyond the collection, how can we analyze these data sets by subject / topic?How best to share this data?How else can it be analyzed?
  • 8: Conclusion  Next StepsHow best to define a collection-level page? Should we?Which metrics are most useful for archivists, researchers, etc.? My hunch:  UPVs  EPVHs  Reading Room Hours  Reference Consultations  And?Beyond the collection, how can we analyze these data sets by subject / topic?How best to share this data?How else can it be analyzed?
  • Questions? Mark Custer Noah Huffman Jennie Levine Knies Kyle Rimkus Sara Snyder