Sla2009 D Curation Heidorn - Presentation Transcript
Societal Need for Digital Curation Specialists in the Library Setting June 16, 2009 Special Libraries Association P. Bryan Heidorn
Introduction
Program Manager, Division of Biological Infrastructure, National Science Foundation
Associate Professor, Graduate School of Library and Information Science, University of Illinois
JRS Biodiversity Foundation, Board of Directors
Why Libraries
Libraries manage the scholarly output of society
Scholars in the humanities and sciences are generating primary and secondary data at unprecedented rates
Social investment is not only in journal publications but all scholarly knowledge
Need for specialists for information organization, access and preservation
Libraries have the institutional structure and many of the skills needed to curate data and other digital resources.
Cyberinfrastructure Vision
“ The anticipated growth in both the production and repurposing of digital data raises complex issues not only of scale and heterogeneity, but also of stewardship, curation and long-term access . ”
NSF Cyberinfrastructure Vision for 21st Century Discovery (2007), Chapter 3
Recognition of need for data curation
“ Recommendation 6 : The NSF, working in partnership with collection managers and the community at large, should act to develop and mature the career path for data scientists and to ensure that the research enterprise includes a sufficient number of high-quality data scientists.”
Long-Lived Digital Data Collections: Enabling Research and Education in the 21 st Century (2005), Recommendations
Recognition of the importance of Information
Recognition of the need for education
New work roles within traditional institutions
Interagency Working Group on Digital Data
New Information Disciplines
Digital Curator : an expert knowledgeable of and with responsibility for the content of a digital collection(s)
Digital Archivist : an expert competent to appraise, acquire, authenticate, preserve, and provide access to records in digital form
Data Scientists : the information and computer scientists, database and software engineers and programmers, disciplinary experts, expert annotators, and others, who are crucial to the successful management of a digital data collection
(Long Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century, report of the National Science Board, September, 2005)
Library Skills
Where is the data now?
Not in reference collections
Varies mandates for sharing
Unsustainable models
Individual researchers
Boutique databases
Most data is from small projects
Big science and independent science
Economics of the long tail
The Long Tail , By Chris Anderson. Wired Magizine.12.10, 2004. ( http://www.wired.com/wired/archive/12.10/tail_pr.html )
NetFlix versus BlockBuster
Genbank versus Mary’s Lab
Naive View of Science Data GenBank PDB f ( x )= ax k + o ( x k ) Power Law of Science Data f ( x )= ax k + o ( x k )| X<.20 Data Volume Science Projects and Initiatives
Does NSF’s Data Follow the Power Law? I do not know but if $1 = X bytes…..
20-80 Rule The small are big! Total Grants 9347 $2,137,636,716 20% 80% Number Grants 1869 7478 Total Dollars $1,199,088,125 $938,548,595 Range $6,892,810-$350,000 $350,000- $831
Dark data is the data that we know is/was there but we can’t see it.
Hubble Space Telescope composite image "ring" of dark matter in the galaxy cluster Cl 0024+17
Related Ideas
John Porter:
Deep verses Wide databases
Swanson:
Undiscovered Public Knowledge
Science Commons:
Big Verses Small science
Why is the tail also important
Valuable science data is in the tail
Many scientists could use the tail data
Unpublished observations of flowing time in Concord by Alfred Hosmer from 1888 to 1902
Photographs of Flowers
Blue Hill Observatory meteorological data
Richard B. Primack, Abraham J. Miller-Rushing, Daniel Primack, and Sharda Mukunda (2007). Using Photographs to Show the Effects of Climate Change on Flowing Time. Arnoldia 65(1), p2-9.
Valuable science data is in the tail
Many scientists could use the tail data
Science innovation occurs in the long tail
Unpublished negative results / aka dark data
We know very little about the tail
Transformative science happens in the tail
Computational thinking needed to free the tail
NSF Current investments in the tail
OECD Principles and Guidelines for Access to Research Data from Public Funding
The Case of Lake Victoria Data
Lake Victoria is the largest fresh water lake in Africa
Nile Perch, Water Hyacinth, Deforestation and human waste are destroying the fishery
Hundreds of data sets have been created over 50 years
There is no access to most of that information
Barriers
Lack of professional reward structure
Lack of education in data curation
Intellectual property rights (IPR)
Lack of technology
Lack of financial reward structure
Under valuation / lack of investment
Cost of infrastructure creation
Cost of infrastructure maintenance
PDF, excel, MS word, arcview, floppy disks
Technical Solutions: Move the tail to the head (increase k)
Data standards
e.g. Environmental Markup Language (EML)
e.g. TaxonX - taXMLit
Metadata
Darwin Core (DwC)
Access to Biological Collection Data (ABCD)
Protocols
TAPIR
Solutions
Controlled Vocabularies
MeSH, ZooBank, IPNI, ITIS
Ontologies
Gene Ontology (GO)
Science Environment for Ecological Knowledge (SEEK)
EcoGrid
Leopold Semi-Automated ontology generation for Amphibian Morphology DBI-0640053
(Semantic) web software
DataNet
Institutional Solutions
Well Paid Librarians
Well-heeled Museums
Professional societies
Generous Publishers
Library director John Hanson told the Associated Press that a couple of dozen people are cited each year for failure to return materials or pay fines. The incident cost Dalibor about $30 for the two overdue paperbacks. It cost her mother $172 to free her.
Organizational Solutions
Phase One of a Lake Victoria Biodiversity Informatics Project
DataNet (DataOne and Data Conservancy)
Dryad
LTER, NEON, GBIF, TDWG
National Center for Ecological Analysis and Synthesis (NCEAS)
National Evolutionary Synthesis Center (NESCent)
European Union Networks of Excellence (NoE)
European Distributed Institute of Taxonomy (EDIT)
Education Programs
Biological Information Specialist
Concentration in Data Curation (MSLIS)
Certificate of Advanced Study in Data Curation
Summer Institutes in Data Curation
Information and professional education in biodiversity informatics
Biological Information Specialists
At present:
Biologists at all degree levels self-trained in information technology
Information technologists at all degree levels self-trained in biology
(both with gaps in knowledge for many months, years)
Differing roles of BIS in large and small science
Master of Science in Biological Informatics
Degree Program began September 2007
Part of campus-wide bioinformatics masters program
NSF/CISE/IIS, Education Research and Curriculum Development, 0534567 (Palmer, PI)
Combines Biology, Bioinformatics, Computer Science core with LIS courses
What does a BIS need to know?
Biological training and interest in solving biological research problems
Information skills
Evaluation and implementation of information systems: user based assessment and continual quality improvement for the development of tools that work and are used.
Information acquisition, management, and dissemination: development of digital libraries, data archives, institutional repositories, and related tools.
Information organization and integration: ontology development, structuring information for optimal use and sharing, and standards development.
UIUC bioinformatics core coursework
Cross-disciplinary course distribution requirement
Bioinformatics: Computing in Molecular Biology Algorithms in Bioinformatics Principles of Systematics
Computer Science: Algorithms Database Systems
Biology: Human Genetics Introductory Biochemistry Macromolecular Modeling
Sample of existing LIS courses
Information Organization and Knowledge Representation
LIS 551 Interfaces to Information Systems
LIS 590DM Document Modeling
LIS 590RO Representing and Organizing Information Resources
LIS590ON Ontologies in Natural Science
Information Resources, Uses and users
LIS 503 Use and Users of Information
LIS 522 Information Sources in the Sciences
LIS 590TR Information Transfer and Collaboration in Science
Information Systems
LIS 456 Information Storage and Retrieval
LIS 509 Building Digital Libraries
LIS 566 Architecture of Network Information Systems
LIS 590EP Electronic Publishing
Disciplinary Focus
LIS 530B Health Sciences Information Services and Resources
0 comments
Post a comment