Upcoming SlideShare
×

# CARLI Usage Stats Keynote 20130325

1,189 views

Published on

20130325 mcdonald price carli usage statistics final McDonald and Price

Published in: Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Views
Total views
1,189
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
13
0
Likes
0
Embeds 0
No embeds

No notes for slide
• I could tell you about all the useful interesting things that either Jason or I have done or that we’ve worked on together. But here is the most important thing for you to know today about us!
• Give the agenda for the talk
• My thoughts on what usage statistics could have done and what they aren’t currently doing. Story about usage based pricing from AAAS, how it was never known what people were doing – bibliometric research was mostly based on WoK/ISI data, etc.How statistics can be wielded for mis-use. Drawing causation from correlation, or using raw count data that is not statistically significant. Story about a professor asking a speaker to put up his slides “Which one?” “Anyone of them – I have a critique on every one of them”Investment of a lot of time and effort and it’s not paying off … yet.
• In the example given earlier, a list of 23 people, comparing the birthday of the first person on the list to the others allows 22 chances for a matching birthday, the second person on the list to the others allows 21 chances for a matching birthday, third person has 20 chances, and so on. Hence total chances are: 22+21+20+....+1 = 253, so comparing every person to all of the others allows 253 distinct chances (combinations): in a group of 23 people there are pairs.Presuming all birthdays are equally probable,[2][3][4] the probability of a given birthday for a person chosen from the entire population at random is 1/365 (ignoring Leap Day, February 29). Although the pairings in a group of 23 people are not statistically equivalent to 253 pairs chosen independently, the birthday paradox becomes less surprising if a group is thought of in terms of the number of possible pairs, rather than as the number of individuals.
• The paradoxical conclusion is that treatment A is more effective when used on small stones, and also when used on large stones, yet treatment B is more effective when considering both sizes at the same time. In this example the &quot;lurking&quot; variable (or confounding variable) of the stone size was not previously known to be important until its effects were included.Which treatment is considered better is determined by an inequality between two ratios (successes/total). The reversal of the inequality between the ratios, which creates Simpson&apos;s paradox, happens because two effects occur together:The sizes of the groups, which are combined when the lurking variable is ignored, are very different. Doctors tend to give the severe cases (large stones) the better treatment (A), and the milder cases (small stones) the inferior treatment (B). Therefore, the totals are dominated by groups 3 and 2, and not by the two much smaller groups 1 and 4.The lurking variable has a large effect on the ratios, i.e. the success rate is more strongly influenced by the severity of the case than by the choice of treatment. Therefore, the group of patients with large stones using treatment A (group 3) does worse than the group with small stones, even if the latter used the inferior treatment B (group 2).
• So, who cares? Well, given this meeting and a variety of others over the years, obviously we’re still seeking that concept of ‘The Promise’ for usage statistics. And in fact, we’re making progress – at Charleston 2012, we saw the following sessions that were partially or primarily about usage statistics in some form or other.
• See if I can find a slideshare of this article to show something more akin to statistics.
• Add reference for this
• Sustainable Collections Services, Maine Shared Collections Strategy Planning Meeting, http://www.slideshare.net/Maine_SharedCollections/mscs-scs-planning-meeting-rick-lugg-andy-breeding
• Find better example of this.
• This is a graph of the number of articles covered by source as of last month. We only started tracking Twitter in June of this year and it’s expected that the graph will change a social media sites accrue more mentions to PLOS articles.
• Even though there are good reasons not to expect CPU to be the same, Cite Blecic talk (3rd Breakout Presentation) …Size of the discipline is always an issue in PPU – Scale by relative size of user base
• Sort of subscribed titles by price per use, for use of package cancellation allowanceAcknowledge that these prices don’t tell the whole story, since they subsidize the “unsubscribed” titlesComparison of price per use of subscribed titles from other packages would be apples to oranges
• Variables in green influenced by purchase trigger (# of loans before purchase)Variables in blue could be effected by subject profile or Max Price% visible list with STL…
• *In all subsequent slides user books from user selected collections are in blue, and those from preselected collections are in green*Overall Average number of uses per year in general quite high ≈ 6 per year *Average number of post-purchase uses per year is significantly greater for user-selected ebooks (2x as high) *Even though the total number of books (n) in the user selected set is greater, this has no effect on the result—these are PER BOOK averages, so each book in the user selected collection is used an average of 8.6x per year, andeach book the preselected collection is used an average of 4.3x per year*This result rejects the hypothesis rejects the hypothesis that users will select ebooks will be used less than pre-selected ebooks
• *Pattern of greater use for user-selected books is consistent across all 5 libraries: 4 of 5 are significantly different based on non-overlapping 95% confidencec intervals*degree of difference varies from 1.75x to 4.5x
• *This figure shows for the number of unique users per ebook per year for the overall user selected and preselected collections*The average user-selected ebook was used by a significantly greater number of different users per year (about 2x as many)*These data allow us to result rejects the hypothesis that users select books that are only of interest to themselves
• *Here we see that pattern of wider use of user-selected ebooks is also consistent across the 5 libraries, with the same 4 libraries showingsignificantly wider useThe degree of this effect varies from 1.75x to 3.3 times more unique users per book per year in user-selected collections
• Out of the research, an idea of what metrics could contribute to a Usage Factor measure began to emerge. Similar to Impact Factor, it was Total Usage over a Specified Time Period of the Articles Published during a Time Period, divided by the Total articles published during the Time Period
• Adding in journals attending PPM that are not ISI ranked (Green bars = no IF rank)
• ### CARLI Usage Stats Keynote 20130325

1. 1. Whats the Use: ASymposium on UsageStatistics John McDonald & Jason Price, PhD CIO & AVP Interim Library Director Claremont Colleges Library March 25, 2013CARLI Electronic Resources and Collections Working Groups
2. 2. Overview: a Keynote in three parts 1. Broad perspective: Where are we now? 2. Detailed perspective : Addressing the challenges of usage statistics 3. The latest: our present & future projects
3. 3. Where arewe now?
4. 4. The Promise & The Peril
5. 5. How many people do you need in a room before it ishighly likely that two share a birthday? a) Less than 30 b) 30 – 60 c) More than 60 d) 367
6. 6. Which treatment for kidney stones is moresuccessful? Treatment A Treatment B Success Treatment A 78% Treatment B 83% Rates (273/350) (289/350) Small Group 1 Group 2 Stones 93% (81/87) 87% (234/270) Group 3 Large Group 4 73% Stones 69% (55/80) (192/263) Success 78% 83% (289/350) Rates (273/350)
7. 7. So what? What willthe data tell us…
8. 8.  Harvesting the Crop: Implementing a Usage Statistics Management System at Georgia State Social Media, ROI and Cookie Day How Do E-Resources Contribute to Teaching and Learning? Findings from the Lib-Value Project Using Data Visualization Tools for Collection Analysis To Keep, or Not to Keep: The Effect of Discovery Tools on Licensed Resources Everything Thats Wrong with E-book Statistics - A Comparison of E- book Packages Discovery & Usage: The Foundation of a Powerful Collection
9. 9. Standards
10. 10. Counter at 10
11. 11. Progress Commonly agreed upon measures Routine methods of transmission Regular formatting of files Standard dates for delivery Audits of reports Certification of compliant vendors Established process for refinement
12. 12. Still evolving Comprehensive coverage of publishers Sophistication on Ebooks & Databases Automation Further granularity Measures for non-text usage (article parts) Article level metrics
13. 13. Usage in Practice
14. 14. Usage Statistics informingdecisions about acquisitions
15. 15. # of Total \$ of Additional Total Savings Ebooks Ebooks not STL Costs over Existing Purchased Purchased PlanPurchase on 89 \$17,382.31 \$3,327.20 \$14,055.11 Cost Projections - GVSU4th LoanPurchase on 58 \$24,512.55 \$4,621.09 \$19,891.465th LoanPurchase on 34 \$25,722.11 \$5,041.64 \$20,680.476th LoanPurchase on 22 \$26,899.83 \$5,324.84 \$21,579.997th Loan Doug Way and Julie Garrison, “Financial Implications of Demand-Driven Acquisition,” in David Swords (ed.) Patron-Driven Acquisitions: History and Best Practices. (Berlin: De Gruyter Saur, 2011), p. 148.
16. 16. Usage Statistics informingdecisions about print collectionmanagement
17. 17. DU Storage studyLevine-Clark, Michael, “Analyzing and Describing Collections Use: Strategies forManaging a Library Move,” LYRASIS Ideas and Insights, Webinar, May 4, 2012.http://www.slideshare.net/MichaelLevineClark/
18. 18. Usage Statistics informingdecisions about shared printprojects
19. 19. Each “Title-Holding” has different characteristics Dominguez Fullerton Long Beach Los Angeles Northridge Pomona Hills Total Circulations 0 circs 19 circs 16 circs 12 circs 13 circs 8 circs Last Circulation Date -none- 11/30/11 12/16/08 5/30/07 4/27/07 3/11/08 Date added to Collection 6/27/02 4/23/02 9/21/01 5/03/00 11/11/02 8/11/00Sustainable Collections Services, Maine Shared Collections Strategy PlanningMeeting, http://www.slideshare.net/Maine_SharedCollections/mscs-scs-planning-meeting-rick- 21lugg-andy-breeding
20. 20. Sample Pilot Group - Title-Holdings by Holdings Level2,000,000 Sample Pilot Group - Title-Holdings by Holdings Level1,800,000 2,000,0001,600,000 1,800,000 779,7561,400,000 1,600,000 4+ circs 4+ circs 779,756 1,400,0001,200,000 1-3 Circs 1-3 Circs 1,200,0001,000,000 0 circs 0 circs 1,000,000 800,000 305,438 539,718 800,000 305,438 539,718 600,000 257,739 600,000 311,240 257,739 400,000 311,240 400,000 220,071 220,071 560,107 200,000 560,107 362,050 200,000 362,050 239,202 239,202 - - 1 1 22 3-6 3-6 # of Pilot Group Libraries Holding Title # of Pilot Group Libraries Holding Title
21. 21. Resource Sharing: CAMINO Collections CUC LMU Oxy Pep UOP CSTWstmtCalArts CBU Dom WJUWUHS AJU HNU 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 Books held only by library Books held by BOTH library and the rest of Camino Books held only by the rest of Camino
22. 22. Usage Statistics informingdecisions about print & onlineresources
23. 23. Holy grail: Understanding UserBehavior
24. 24. 30
25. 25. 32
26. 26. Tracking Impact Beyond Articles
27. 27. http://www.zazzle.com/statistics_means_never_having_to_say_youre_certai_tshirt-235669028746970031 March 25, 2013
28. 28. Part 2: Addressing theChallenges of Usage Stats1. Comparability • Package price per use • Defining the appropriate range(s) of cost per use • Practical applications2. Reliability • Impact of mobile, discovery & harvesters3. Prediction • Demand Driven Acquisition • Number of books available <> Size of budget4. Context – Data about our data
29. 29. Apples and oranges are both round(ish)…
30. 30. Challenge 1: Comparing Package Price Per ViewpkgIDTotal Use SubsCost UnSubsCost Overall PPV S3.140048 \$1,652,000 comparison\$13.10 1 Cross-package \$182,000 2 20341 \$333,000 \$10,000 \$16.86 3 13572 \$282,000 \$21,000 \$22.33So Pkg 1 is a better value than Pkg 3? It might not be…
31. 31. html to pdf Ratios vary widely for these packages 50000 48047 html views 40000 pdf downloads 32688 # of views 30000 1:1.3 1:23 20000 13004 1:12 10000 4066 352 568 0 1 2 3 PackageHow many pdfs in Pkg 1 are duplicates of html views? (fmi: See Davis & Price, 2006 JASIST 57(9))
32. 32. Getting a pdf from Package 1…‘Get article’ links directly to the html version… then the user downloads the pdf… …2 uses are recorded for 1 pdf
33. 33. Total full text views suffer from duplication issues
34. 34. pkgID Package value revisited S3. Use SubsCost UnSubsCost Overall PPV Total 1 140048 \$1,652,000 \$182,000 \$13.10 2 20341 \$333,000 \$10,000 vs. \$16.86 3 13572 \$282,000 \$21,000 \$22.33 pdf requests only tell a different story!pkgID Est. pdf Use SubsCost UnSubsCost Overall PPP 1 83469 \$1,652,000 \$182,000 \$21.97 2 18734 \$333,000 \$10,000 \$18.31 3 13287 \$282,000 \$21,000 \$22.80
35. 35. Addressing Challenge 1: Comparing Package Price Per View When comparing packages, both total views and PDF downloads should be compared Extension of principle: Journal report 1B JR 1a JR 1b ARCHIVE FRONTFILE
36. 36. Challenge 2: Defining acceptable range(s) ofcost per use Among packages Within packages
37. 37. Reality CheckShould we expect cost per use to beequivalent among packages?Content QualityBusiness Model For Profit vs Cost RecoveryExposure in Discovery toolsTitle list accuracyBackfile access ASSUMPTIONS
38. 38. Reality!Acceptable CPU range? a) 0-\$6 b) 0-\$12 c) 0-\$24 d) 0-\$50 e) It depends on _________ f) Can’t say / Don’t know
39. 39. Consortial Benchmarks SCELC Package W Overall Price per Use \$50.00Price per full text article view \$40.00 \$30.00 Use data not avaliable \$20.00 \$10.00 \$0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Consortium Member (Sorted by decreasing spend)
40. 40. Consortial Benchmarks SCELC Package W Overall Price per Use \$50.00Price per full text article view \$40.00 \$30.00 Use data not avaliable \$20.00 \$10.00 \$0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Consortium Member (Sorted by decreasing spend)
41. 41. Consortial Benchmarks SCELC Package W Overall Price per Use \$50.00Price per full text article view \$40.00 \$30.00 Use data not avaliable \$20.00 \$10.00 \$0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Consortium Member (Sorted by decreasing spend)
42. 42. Consortial Benchmarks SCELC Package W Overall Price per Use \$50.00Price per full text article view \$40.00 \$30.00 Use data not avaliable \$20.00 \$10.00 \$0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Consortium Member (Sorted by decreasing spend)
43. 43. Subscribed titles w/in a pkg- Apples to apples? \$1,200 \$4300/year; 11 uses \$1,000 Price Per Use (PPU) \$800 \$8800/year; 33 uses \$600 \$3400/year; 17 uses \$400 \$200 \$0 0 10 20 30 40 50 Subscribed Title # (ordered by PPU)
44. 44. Strict title level cost per view is misleading Cost per view by access type \$35 \$30.51 \$30 \$25 Cost per view \$20 \$13.41 \$15 \$10 \$5 \$0.81 \$0 All Titles Subscribed Titles Unsubscribed [n=537] [n=192] (Leased) Titles [n=345] Access Type
45. 45. Best practices for usage comparison tasks 1. Goal - Identify pricing inequity a. Best accomplished by consortial benchmarking b. Requires readily available package level cost per use across consortial participants c. Leverage COUNTER consortial reports and economy of scale of consortial specialist 2. Goal - Identify lower value packages a. Use both total views and pdf download comparisons 3. Goal - Identify lower value titles a. Only after targeting specific lower value packages b. Recognize that by title price per use comparison is only valid within a package
46. 46. Challenge 2: The convenience/reliability trade off COUNTER R4: Search activity generated by federated search engines and other automatedCase Search Inflation Full text search agents should be included in separate impact Impact inflationDirect from Google IP Low? [Cost is granularity Low? “Searches_federated and automated” countsaccess of usage stats] …and are NOT to Unlikely to be significant “RegularMobile devices be included in the Low to None Searches” counts.Federated search Significant, but COUNTER Noneengines (built into some rules require separatediscovery tools) automate reporting & number ofsearches searches has always had dubious meaningHarvesters (e.g. Quosa) Same as federated search Potentially veryautomate article highdownloads from searchresults
47. 47. Harvesters (like Quosa): the real threat?
48. 48. Usage factor may address the harvester challenge
49. 49. Challenge 3: Prediction – Coming soon! Observations: • Libraries prefer predictability over savings! • Title level journal usage is remarkably predictable year on year • Usage driven purchasing is ripe for modelling based on this predictability
50. 50. Example: Demand driven ebook forecastingEstimated List Size -OR-Estimated Annual Expenditure=List Size × (% visible list purchased × mean book price) + (% visible list w STL × mean cost per STL × mean STL per title)
51. 51. Challenge 4 – Context = metadata!• We do need good data about our data • Data quality is more than just accuracy • Retrospective studies require history! • Circulation Statistics • Dates of profile changes • Cross library comparisons• In an ideal world we’d share datasets with rich metadata• Library science is far from this ideal world• An example of the power of good retrospective data…
52. 52. Total Books & Usage User- Pre- Usage by Usage ReadLibrary Model Selected Selected Download Online A MIX 1131 552 6773 9888 B MIX 5246 2612 42880 38329 C USER 2198 102 0 11801 D USER 3010 48 697 15126 E MIX 4159 909 17396 25604 F PRE 0 1451 4905 3082 G PRE 31 2154 7001 4459 H USER 801 0 556 415 I MIX 305 336 3334 2568 J USER 2799 53 5 13349 K MIX 147 276 2436 2283 TOTAL 19,831 8,496 85,983 126,904
53. 53. Total Books & Usage User- Pre- Usage by Usage ReadLibrary Model Selected Selected Download Online A MIX 1131 552 6773 9888 B MIX 5246 2612 42880 38329 C USER 2198 102 0 11801 D USER 3010 48 697 15126 E MIX 4159 909 17396 25604 F PRE 0 1451 4905 3082 G PRE 31 2154 7001 4459 H USER 801 0 556 415 I MIX 305 336 3334 2568 J USER 2799 53 5 13349 K MIX 147 276 2436 2283 TOTAL 19,831 8,496 85,983 126,904
54. 54. Total Books & Usage User- Pre- Usage by Usage ReadLibrary Model Selected Selected Download Online A MIX 1131 552 6773 9888 B MIX 5246 2612 42880 38329 C USER 2198 102 0 11801 D USER 3010 48 697 15126 E MIX 4159 909 17396 25604 F PRE 0 1451 4905 3082 G PRE 31 2154 7001 4459 H USER 801 0 556 415 I MIX 305 336 3334 2568 J USER 2799 53 5 13349 K MIX 147 276 2436 2283 TOTAL 19,831 8,496 85,983 126,904
55. 55. Total Books & Usage User- Pre- Usage by Usage ReadLibrary Model Selected Selected Download Online A MIX 1131 552 6773 9888 B MIX 5246 2612 42880 38329 C USER 2198 102 0 11801 D USER 3010 48 697 15126 E MIX 4159 909 17396 25604 F PRE 0 1451 4905 3082 G PRE 31 2154 7001 4459 H USER 801 0 556 415 I MIX 305 336 3334 2568 J USER 2799 53 5 13349 K MIX 147 276 2436 2283 TOTAL 19,831 8,496 85,983 126,904
56. 56. Librarian Acquired
57. 57. Data required• Book purchase date• Book purchase type• Many years of use• Different types of use• Library purchasing profile• Library list profile (what content was excluded)• Individual user IDs (anonymized)• Came from 4 files per library with a total of 69 data elements….• We found one vendor that invested in library facing reports the level of data needed, there are few others…• Addressing the challenge: a consortial solution?
58. 58. Part 3: Our present & future1. Improving usage stats collection a. (External) Consortial paperstats b. (Internal) Dublin Six AUDITOR2. Improving usage stats visualization a. Excel Conditional formatting b. Splunk for Dashboard Creation…3. Better database metrics4. Improving on Journal number comparisons5. Usage Factor for Journal Evaluation
59. 59. Consortia: EnhancementsTrack stats for each member Automatic import of consortia stats
60. 60. SCELC PaperStats by the numbersTotal number of full text downloads tracked for SCELC: 312,908,657Total counter reports downloaded: 2000+Total number of logins: 387Number of month records: 20.3MEarliest year covered: 2003Total number of reports being harvested: 15Total number of institutions covered: 95Total number of participants: 14
61. 61. Better technology
62. 62. Click through to article and user level detail!!!
63. 63. Visualization (Excel conditional formatting)
64. 64. Visualization: Splunk
65. 65. Splunk for dashboard visualization
66. 66. Better database metrics (beyond searches & sessions)
67. 67. # of Online Journal Subscriptions: meaningful?50000 Claremont Colleges450004000035000 2nd Quartile30000250002000015000 Median10000 1st Quartile5000 0 2004 2005 2006 2007 2008 2009
68. 68. Beyond numbers of journals & total usage • Knowledge base & Usage statistics comparisons • Selected group of peers with same knowledgebase & stats consolidation vendor • Run comparisons in Access & Excel
69. 69. Usage Factor Formula Usage Factor =Total usage over period ‘x’ of articles published during period ‘y’ ÷ Total articles published during period ‘y’
70. 70. Impact and usage factor ranks are not related
71. 71. (lower)-->RANK-->(higher) 0 20 (lower)-->RANK-->(higher) 40 60 80 100 IF_rank num art? UF_rank(All) 120 UF_rank(All)--not ISI rated 140