Micromeritics - Fundamental and Derived Properties of Powders
Text & Data Mining Licensing Issues
1. Text & Data Mining
Licensing Issues
Daniel Dollar
Yale University Library
November 5, 2015
2. What is TDM?
“Text and data mining (TDM) is the process of deriving
information from machine-read material. It works by
copying large quantities of material, extracting the data,and
recombining it to identify patterns.” – UK Government
Taken from the LIBER/Association of European Research Libraries’ Text and Data
Mining page, http://libereurope.eu/text-data-mining/
3. Scholarship
• TDM is a means to take full advantage of a large corpus
of digital content.
• It will become the everyday tool used for the discovery
of knowledge, and a mainstream methodology in the
humanities.
5. Legal/Licensing
• Separate TDM licenses are unnecessary. Output is
subject to same terms and conditions as when
undertaking any research using licensed resources.
• Making lawful use of the content when employing TDM,
subject to the negotiated license agreement.
• “Right to read is the right to mine”
• This is an area where US law is ahead of EU regulations.
6. Pricing
• Research libraries are paying a premium for the
content.TDM needs to be part of the cost of doing
business.
• TDM embargo. Inability to mine is a type of embargo
(restriction) on using the content that will increasingly
undermine the value of the library’s investment in that
content.
7. Access
• Raw data
• Put it on a drive or secure location in the cloud
• Machine read the data with pointers back to human
readable data (publications) on publisher/vendor
platforms
• API – Good and Bad
• Good: Not mediated by publisher/third parties
• Examples: JSTOR’s Data for Research and HathiTrust
Research Center
• Bad: Mediated by publisher/third parties
• Accessibility
• Raw data needed for TDM can also be used to meet
federally-protected accessibility needs of students.
8. Library Support
• Humanities typically are in need of greater support
than the Sciences with TDM.
• Digital Humanities Centers in libraries can help bridge
that gap and make raw data interpretable by
humanists.
• For libraries, such support comes with significant financial
implications.
9. Questions
Thank you to the following Yale University Library
colleagues for assistance with this presentation:
• Peter Leonard, Director of the Digital Humanities Lab
• Joan Emmet, Licensing and Copyright Librarian
• Julie Linden, Assistant Director of Collection
Development