1. Data Set Journeys
New Directions and Challenges in
Acquiring Data for Library Collections
Karen Hogenboom, Numeric and Spatial Data Librarian
Lynn Wiley, Head of Acquisitions
2. Questions for everyone
• Who is buying small data sets on your
campus?
• Where are data sets stored on your campus?
• How do researchers on campus know what
has been purchased and where it is stored?
2
3. What We Are Not Talking About
• Data management plans required for federal
grants
• Research data generated on campus
• Subscription databases of downloadable data
hosted on vendor’s servers
3
4. What We Are Talking About
• Building a collection of downloadable small
data sets
• User-driven collection development for data
• Our experiences acquiring and managing data
sets
• Your experiences acquiring and managing data
sets!
4
5. Library Literature Review: sampling
• Mark P. Newton, C. C. Miller & Marianne Stowell
Bracke (2010): Librarian Roles in Institutional
Repository Data Set Collecting: Collection
Management, 36:1, 53-67
• Florance, Patrick. 2006. GIS collection development
within an academic library. Library Trends 55(2): 222–
235.
5
6. Literature Review sampling
• Davis and Vickery (2007 Datasets, a Shift in
the Currency of Scholarly Communication:
Implications for Library Collections and
Acquisitions. Serials Review, 33(1), 26–32
6
7. Why take this journey?
• Increase campus access to data sets
• Embed librarians in research process
• Create data “collection” with confidence in
usefulness to campus researchers
• Develop skills in buying, storing and providing
access to data sets
• Develop relationships with vendors for this
market
7
8. First Step: Pilot Project
• Call for proposals
• Communication with potential applicants
• Data Services Committee review
• Brainstorm buying process internally
– ?License terms
– ?Delivery format and then storage
– ?Payment method
Ordering the data and its receipt
8
9. Now How Do We Get There?
BUYING
• Checking in with the vendor: relationships
– Where are they and Who
– Have they sold to libraries before
– Are they in our database
– Do they understand our requirements
– Are they open to using our agreement language
– Do they require an agreement be signed
9
10. BUYING cont’d
• What to Buy
– Exact set: dates: geographic regions, subsets
– Format: excel, ascii, raw data vs. categorized,
conversion issues
– Options for add ons or updating with new data
– Date created updated and by whom Copyright
– Lineage/ Origin
– Unique identifier for description and ordering
10
11. Buying continued
• HOW to buy
– Outright Purchase
– Subscription (how updated, frequency)
– Privacy issues
– Agreement enforcement or authentication of
users
– Costs (# of users, maintenance fees)
– Pay for: invoiced, credit card one time vs.
maintenance
– Have a standard data use agreement for access
11
12. Work around
• Communication problems
• Payment problems
• Delivery problems
• Sales to individuals
• Follow up, follow up follow up
12
13. So We Know Where We’re Going…
What we identified from pilot project
• Relationships critical
• Start small
• Need time to negotiate it all
• Revenue issue for vendors 13
• noncommercial yet needed income
• Data needs very specific
14. And How Do We Make It Available?
• One researcher no problem public need better
access:
• Load locally
• Web links to Datasets
• Catalog record
– Small data sets
– Subject
– Title
– Corporate entry
14
16. Next Steps on the Journey: Metadata
• Metadata
– NISO Standards
– FGDC metadata / ISO 19115
– Data Documentation Initiative (DDI)
– MARC records
16
17. Next Steps:
Use and Storage of Datasets
• Deposit issues: Technical, IRB
• Cleaning data prior to deposit
• Consultation on field values
• Set up of data
• Indexing or guides
17
18. More Next Steps
• Specific ways to involve librarians in research
• Rolling application period
• Increase the funds available!
• Spread the word
18
19. Discussion
• Who is buying small data sets on your
campus?
• Where are data sets stored on your campus?
• How do researchers on campus know what
has been purchased and where it is stored?
19
20. Contact us:
• Karen Hogenboom, Numeric and Spatial Data
Librarian: hogenboo@illinois.edu
• Lynn Wiley, Head of Acquisitions:
lnwiley@illinois.edu
20
23. • Morris, Steven P. 2006. Geospatial Web services and
geoarchiving: New opportunities and challenges in
geographic information service. Library Trends
55(2):285–303.
Available at: http://muse.jhu.edu/journals/library
trends/v055/55.2morris.html.
23
Editor's Notes
Want to know about your lessons learnedEspecially want to know about your vendors and their questionsCan you put your data up for public accessHow do you catalog them or make them accessible?What do researchers want on your campus
Different ballgame federal grants specific requirements, PI privacy, tracking not publicAgain proprietary, confidential, ownershipThose seen as lease/purchase and as long straight forward OK, vendors know IP ranges etc. license issues st
Making them useful to othersMeeting user needsFinding out about vendorsFinding our about researchBetter connections to campus
While not talking about IR and curating local data the article from Purdue cover the new roles libns have in identifying., collecting, curating preserving and administering data sets. Described unique role libns have in describing material i.e. metadata and preserving it but also how well suited they are to broker acquisitions translate to researchers the value of consistent access, the provenance and context of the data, standards and long term preservation of data. Argue for broad access to share promote interdisciplinary research Similar in approaching vendors broad access needed to share data for best research.
researchers continue to produce and request access to data for their work. While data are being produced at exponential rates, it is not a trivial matter for researchers to discover, access, and repurpose data sets. Lyman and Varian9 report that in 2002 alone more than five exabytes of information were created. That equates to 37,000 times the size of the Library of Congress. With advances in computing power and technology, it is easier than ever to mine and manipulate large data sets as long as they are mounted online.10,11 The problem for researchers and librarians, however, is that many of these data sets are scattered amongst individual researchers’ desktops; there exists no truly organized system for discovering, accessing, and repurposing data sets. MODELSIRSerial continuationsOne time paymentTransitional or the ad hoc model where anything may go
LOTS of research going onGrants to buy data What wantedBy whomHow much costsIs it purchasable By vendorBy campusCan we license
Translater of library termsNo business staff credit card Who sends it and howVsorganiaztion
url links
We can also say on whole great experiments?
GIS data purchases nice over view of the issue as they affect purchases and maintenance of this material