A 20-minute presentation to the 2015 Code4Lib Northeast event about a project to build a reporting service for authors. This service reports on downloads of author papers from the MIT Open Access Articles Collection.
3. Background
March 18, 2009 - Open Access Policy adopted
“...The policy is to take effect immediately; it will be reviewed after five years by
the Faculty Policy Committee, with a report presented to the Faculty.”
4. Background
March 18, 2009 - Open Access Policy adopted
“...The policy is to take effect immediately; it will be reviewed after five years by
the Faculty Policy Committee, with a report presented to the Faculty.”
2009 – 2013
MIT Libraries assemble a collection within Dspace@MIT for Open Access
Articles.
~15,000 articles, ~ 1 million downloads
6. Background
March 18, 2009 - Open Access Policy adopted
“[P]olicy … will be reviewed after five years…”
August 2013 - Project begins
“Implement author-level, article-level, and aggregated article download usage
statistics for articles in the Open Access Articles Collection in DSpace@MIT to
incentivize deposits and provide useful assessment information for the MIT
Faculty Open Access Policy.”
9. Prior Work
MyDASH provided solid model…
• Map
• Timeline
• Summary table
… but couldn’t be directly implemented.
• Repository versus One Collection
• Multiple department affiliations
10. Project Goals
• Make available download statistics at three levels:
author, article, and aggregate
• Incentivize deposits to collection
• Provide useful information for policy evaluation
• Evaluate new technologies within the Libraries (i.e.
MongoDB)
14. Pipeline
Start from Apache server logs
● Filter the qualifying downloads
● Look up the downloaded paper
● Augment with additional information
● Store in MongoDB
● Use SOLR to build summary collection
UI queries summary collection
20. Pipeline challenges - departments
Department names
● Inconsistent program / department affiliations
o “Media Laboratory”
o “Center for Bits and Atoms”
● Spelling Variations
o “MIT Department of Physics”
o “Massachusetts Institute of Technology, Department of Physics”
o “Dept. of Physics”
o “Physics”
21. Pipeline challenges - departments
Standardized department names
Whitelist of recognized names
22. {
"_id" : ObjectId("5449127895b0c25083f29352"),
"status" : "200",
"handle" : "http://hdl.handle.net/1721.1/52491",
"title" : "A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors",
"country" : "USA",
"authors" : [
{
"mitid" : "3.1415926537",
"name" : "Fee, Michale S."
},
{
"mitid" : "6.02x10^23",
"name" : "Andalman, Aaron S."
}
],
"request" : "/openaccess-disseminate/1721.1/52491",
"referer" : "http://www.google.com/search?q=head+mounted+microphone+zebra+finch&ie=utf-8&oe=utf-8&aq=t&rls=org.m
ozilla:en-US:official&client=firefox-a",
"user_agent" : "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8",
"time" : ISODate("2010-08-10T17:14:03Z"),
"ip_address" : "128.218.64.242",
"dlcs" : [
{
"display" : "McGovern Institute for Brain Research at MIT",
"canonical" : "McGovern Institute for Brain Research at MIT"
},
{
"display" : "Brain and Cognitive Sciences",
"canonical" : "Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
32. Email to authors
Dear {name},
Thank you for sharing your scholarly articles through the open repository DSpace@MIT <https://dspace.mit.edu/handle/1721.1/49433/>, in association with the MIT Faculty Open
Access Policy <https://libraries.mit.edu/oapolicy>.
Our newly implemented OA Stats Service provides data about the use and reach of our open access collection. Since August 2010, 15,184 articles have been downloaded from
227 different countries.
This service also provides information at the author and article level:
Your 3 articles have been downloaded 168 times since they were deposited, from 28 different countries.
You can access more detailed download information about your articles, including per-article and per-country downloads at <https://oastats.mit.edu>.
Initially, we plan to provide this information to all authors via email in the Fall and Spring semesters. As we seek to improve the service, we'll consider expanding options to
interact with it and the underlying data.
We are anxious to hear your feedback on how this service can be most useful to you, so please send your suggestions to oastats@mit.edu.
--From the MIT Libraries
35. Faculty reception
Excitement
● “Thank you for the update, this is a fantastic tool!!”
● “Thanks so much for doing this - it's really cool and awesome!”
Why not more?
● “Hi, I like your feedback. But I am puzzled that only one of my articles is in
your database.”
● Department heads using this as leverage to encourage further
contributions
36. Project goals revisited
• Make available download statistics at three levels:
author, article, and aggregate
• Incentivize deposits to collection
• Provide useful information for policy evaluation
• Evaluate new technologies within the Libraries (i.e.
MongoDB)
37. Future work
● Automate the pipeline
● Run pipeline more frequently
● Ditch Mongo for something relational
● Talk to faculty about making more detailed information
public
● Add functionality to UI (additional format exports, move
to SPA)
● Improve cataloging in DSpace@MIT with lookup
services
Timeline of open access at MIT
2009 faculty vote
growth of the collection over time
rough figures on current size
Five year anniversary
Policy review called for
Libraries wanted to contribute information to faculty to help inform the debate
Five year anniversary
Policy review called for
Libraries wanted to contribute information to faculty to help inform the debate
Five year anniversary
Policy review called for
Libraries wanted to contribute information to faculty to help inform the debate
Five year anniversary
Policy review called for
Libraries wanted to contribute information to faculty to help inform the debate
Harvard Libraries had unveiled MyDASH, which served as an inspiration to our early work
Harvard Libraries had unveiled MyDASH, which served as an inspiration to our early work
Harvard Libraries had unveiled MyDASH, which served as an inspiration to our early work
Need to generate a pipeline diagram
Start from Apache logs
Filter out OA downloads
Filter out bots
Augment with author identities
Augment with geo-referenced IP addresses
Store in raw Mongo collection
Generate summary collection via SOLR
Visualize in UI
https://github.com/MITLibraries/oastats-backend
Need to generate a pipeline diagram
Start from Apache logs
Filter out OA downloads
Filter out bots
Augment with author identities
Augment with geo-referenced IP addresses
Store in raw Mongo collection
Generate summary collection via SOLR
Visualize in UI
https://github.com/MITLibraries/oastats-backend
Need to generate a pipeline diagram
Start from Apache logs
Filter out OA downloads
Filter out bots
Augment with author identities
Augment with geo-referenced IP addresses
Store in raw Mongo collection
Generate summary collection via SOLR
Visualize in UI
https://github.com/MITLibraries/oastats-backend
Need to generate a pipeline diagram
Start from Apache logs
Filter out OA downloads
Filter out bots
Augment with author identities
Augment with geo-referenced IP addresses
Store in raw Mongo collection
Generate summary collection via SOLR
Visualize in UI
https://github.com/MITLibraries/oastats-backend
Maybe repeated diagrams, adding sections as the pipeline got more complicated?
Screenshots of OpenRefine, or of DSpace@MIT showing wrong data?
Maybe repeated diagrams, adding sections as the pipeline got more complicated?
Screenshots of OpenRefine, or of DSpace@MIT showing wrong data?
Maybe repeated diagrams, adding sections as the pipeline got more complicated?
Screenshots of OpenRefine, or of DSpace@MIT showing wrong data?
Maybe repeated diagrams, adding sections as the pipeline got more complicated?
Screenshots of OpenRefine, or of DSpace@MIT showing wrong data?
Maybe repeated diagrams, adding sections as the pipeline got more complicated?
Screenshots of OpenRefine, or of DSpace@MIT showing wrong data?
Maybe repeated diagrams, adding sections as the pipeline got more complicated?
Screenshots of OpenRefine, or of DSpace@MIT showing wrong data?
Maybe repeated diagrams, adding sections as the pipeline got more complicated?
Screenshots of OpenRefine, or of DSpace@MIT showing wrong data?
Maybe repeated diagrams, adding sections as the pipeline got more complicated?
Screenshots of OpenRefine, or of DSpace@MIT showing wrong data?
The
Need to generate a UI graphic
Need to generate a UI graphic
Need to generate a UI graphic
Need to generate a UI graphic
Need to generate a UI graphic
Need to generate a UI graphic
The latest action has been to send an email to all authors represented in the collection, inviting them to view the information about the downloads of their papers.
This is a sample email, providing basic information about how many papers the author has in the collection, and some summary statistics about their downloads.
This messaging was successful in driving a lot of traffic to the platform.
There are also some kinks to be worked out about what email addresses we use.
The feedback we’ve received from faculty and administrators about this project has been almost entirely positive.
Project goals were met
There are also some kinks to be worked out about what email addresses we use.
There are also some kinks to be worked out about what email addresses we use.