Publication and dissemination of datasets in taxonomy: ZooKeys working example
Lyubomir Penev, Terry Erwin, Jeremy Miller, Vishwas Chavan, Tom Moritz, Charles Griswold. ZooKeys 11: 1-8 (2009)
doi: 10.3897/zookeys.11.210
Penev, L et al. Publ Dissem Data Zookeys 06 01 09Tom Moritz
This document describes a concept for publishing and disseminating datasets in taxonomy that was applied in a paper by Miller et al. published in ZooKeys. Key aspects of the concept include: (1) publishing primary biodiversity data from the paper as a separate dataset with a DOI, (2) making the occurrence dataset available through GBIF simultaneously with publication, and (3) publishing the occurrence dataset as an interactive KML file in Google Earth with a separate DOI. This allows for indexing, aggregation, and reuse of the published data.
Jean-Claude Bradley and Andrew Lang present to Columbia University on May 21, 2009 in an effort to explore what role libraries could play in archiving Open Notebook Science projects and other forms of digital scholarship.
Global biodiversity data is critical for conservation, policymaking, and scientific research. However, most data is held by small, isolated publishers and is difficult to access. The Global Biodiversity Information Facility (GBIF) aims to mobilize this "small data" by creating common data standards and tools to publish data through its Integrated Publishing Toolkit. This allows data to be discovered through GBIF's portal and used for applications like predicting climate change impacts and invasive species spread. GBIF calls on all data holders to publish their data openly through its framework to build a comprehensive global resource for biodiversity data.
This document discusses data publishing and compares it to scholarly publishing. It argues that data publishing brings about cultural change towards open access to biodiversity data. A data publishing framework is proposed that includes technical components like publishing toolkits and a data usage index to track impact. Challenges to the framework include gaining policy and social acceptance. Overall, the document promotes treating data publishing similarly to scholarly publishing to increase data sharing and reuse.
Penev, L et al. Publ Dissem Data Zookeys 06 01 09Tom Moritz
This document describes a concept for publishing and disseminating datasets in taxonomy that was applied in a paper by Miller et al. published in ZooKeys. Key aspects of the concept include: (1) publishing primary biodiversity data from the paper as a separate dataset with a DOI, (2) making the occurrence dataset available through GBIF simultaneously with publication, and (3) publishing the occurrence dataset as an interactive KML file in Google Earth with a separate DOI. This allows for indexing, aggregation, and reuse of the published data.
Jean-Claude Bradley and Andrew Lang present to Columbia University on May 21, 2009 in an effort to explore what role libraries could play in archiving Open Notebook Science projects and other forms of digital scholarship.
Global biodiversity data is critical for conservation, policymaking, and scientific research. However, most data is held by small, isolated publishers and is difficult to access. The Global Biodiversity Information Facility (GBIF) aims to mobilize this "small data" by creating common data standards and tools to publish data through its Integrated Publishing Toolkit. This allows data to be discovered through GBIF's portal and used for applications like predicting climate change impacts and invasive species spread. GBIF calls on all data holders to publish their data openly through its framework to build a comprehensive global resource for biodiversity data.
This document discusses data publishing and compares it to scholarly publishing. It argues that data publishing brings about cultural change towards open access to biodiversity data. A data publishing framework is proposed that includes technical components like publishing toolkits and a data usage index to track impact. Challenges to the framework include gaining policy and social acceptance. Overall, the document promotes treating data publishing similarly to scholarly publishing to increase data sharing and reuse.
BioVeL (Biodiversity Virtual e-Laboratory) is an e-laboratory that supports research on biodiversity using large amounts of data from cross-disciplinary sources.
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
The document discusses how bio-ontologies and natural language processing can enable open science by facilitating structured knowledge representation and collaborative curation. It describes services provided by the National Center for Biomedical Ontology (NCBO) that allow use of ontologies for annotation, data aggregation, and accelerating the curation process. Several groups are highlighted that utilize NCBO services for applications such as clinical trial matching, specimen banking, and data summarization.
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONIJwest
With the growth of data-oriented research in humanities, a large number of research datasets have been
created and published through web services. However, how to discover, integrate and reuse these distributed
heterogeneous research datasets is a challenging task. Ontology is the soul between series digital humanities
resources, which provides a good way for people to discover and understand these datasets. With the release
of more and more linked open data and knowledge bases, a large number of ontologies have been produced
at the same time. These ontologies have different publishing formats, consumption patterns, and interactions
ways, which are not conductive to the user’s understanding of the datasets and the reuse of the ontologies.
The Ontology Service Center platform consists of Ontology Query Center and Ontology Validation Center,
mainly using linked data and ontology-based technologies. The Ontology Query Center realizes the functions
of ontology publishing, querying, data interaction and online browsing, while the Ontology Validation
Center can verify the status of using certain ontologies in the linked datasets. The empirical part of the paper
uses the Confucius portrait as an example of how OSC can be used in the semantic annotation of images. In
a word, the purpose of this paper is to construct the applied ecology of ontology to promote the development
of knowledge graphs and the spread of ontology.
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION dannyijwest
With the growth of data-oriented research in humanities, a large number of research datasets have been
created and published through web services. However, how to discover, integrate and reuse these distributed
heterogeneous research datasets is a challenging task. Ontology is the soul between series digital humanities
resources, which provides a good way for people to discover and understand these datasets. With the release
of more and more linked open data and knowledge bases, a large number of ontologies have been produced
at the same time
This document discusses the importance of publishing archaeological data openly on the web. It notes that raw data is not sufficient and requires cleaning, description and organization to be intelligible and reusable. Data publishing involves putting data in a form that is editorially vetted, machine-readable, linked to other resources, and archived for long-term access. This enhances data presentation, search, discovery and allows new types of research across multiple datasets. While challenging, data publishing provides benefits like reproducibility, new research opportunities, and professional advancement when done according to best practices.
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...James Nelson
The aim of our project is to also utilize the GloBI APIs to visualize understudied organisms and locations with minimal interaction data within the GloBI data repository.
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Presentation of the Global Biodiversity Information Facility (GBIF) knowledge organization system (KOS) work program for the National Center for Biomedical Ontology (NCBO) Web seminar series in October 2012. Available at http://www.bioontology.org/GBIF-vocabulary-management-for-biodiversity-informatics
The document discusses recommender systems and summarizes:
1) It introduces recommender systems and the different types including knowledge-based, collaborative filtering, and content-based recommendations.
2) It outlines some of the key resources for recommender systems including datasets, conferences, and articles.
3) It provides a high-level overview of common recommender system approaches like collaborative filtering, content-based analysis, and knowledge-based recommendations.
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Slides from a presentation on the Knowledge Organization System (KOS) work program for GBIF. KOS developments for biodiversity information resources and input to the emerging Vocabulary Management Task Group (VoMaG).
Links
GBIF KOS prototype tools, http://kos.gbif.org/
Tool: Semantic Wiki prototype, http://terms.gbif.org/wiki/
Tool: ISOcat prototype demo, http://kos.gbif.org/isocat/
GBIF concept vocabulary term browser, http://kos.gbif.org/termbrowser/
GBIF Resources Repository, http://rs.gbif.org/terms/
GBIF Vocabulary Server, http://vocabularies.gbif.org/
GBIF Resources Browser, http://tools.gbif.org/resource-browser/
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...dolleyj
The Evidence & Conclusion Ontology (ECO) has been developed to provide standardized descriptions for types of evidence within the biological domain. Best
practices in biocuration require that when a biological assertion is made (e.g. linking a Gene Ontology (GO) term for a molecular function to a protein), the type of evidence
supporting it is captured. In recent development efforts, we have been working with other ontology groups to ensure that ECO classes exist for the types of curation they
support. These include the Ontology for Microbial Phenotypes and GO. In addition, we continue to support user-level class requests through our GitHub issue tracker. To
facilitate the addition and maintenance of new classes, we utilize ROBOT (a command line tool for working with Open Biomedical Ontologies) as part of our standard workflow.
ROBOT templates allow us to define classes in a spreadsheet and convert them to Web Ontology Language (OWL) axioms, which can then be merged into ECO. ROBOT is
also part of our automated release process. Additionally, we are engaged in ongoing work to map ECO classes to Ontology for Biomedical Investigation classes using logical
definitions. ECO is currently in use by dozens of groups engaged in biological curation and the number of ECO users continues to grow. The ontology, in OWL and Open
Biomedical Ontology (OBO) formats, and associated resources can be accessed through our GitHub site (https://github.com/evidenceontology/evidenceontology) as well as
the ECO web page (http://evidenceontology.org/).
Präsentation anlässlich eines Thementreffs der Hauptbibliothek Universität Zürich zum Thema "Neue Open Access-Themen mit Bedeutung für wissenschaftliche Bibliotheken" am 23.7.2012
Data sharing archiving discovery, Bill MichenerAlison Specht
A presentation by Bill Michener (University of New Mexico and DataONE) about data sharing, archiving and discovery. It was an introduction to a session co-hosted by FRB-CESAB and CEFE (CNRS) in Montpellier.
The document discusses knowledge management of experimental data through the ISA ecosystem. It describes the ISA-tab format and software suite that allows annotation and curation of experimental metadata. As a use case, it analyzes a dataset on metabolite profiling from a study of fatty acid amide hydrolase knockout mice. The ISA tools can represent investigations and assays, convert data to standardized formats, and facilitate sharing and analysis of experimental data.
The document summarizes the goals and activities of the Biodiversity Heritage Library (BHL) project, which aims to digitize the published literature on biodiversity and make it openly accessible online. It discusses BHL's partnerships with other organizations like the Encyclopedia of Life to aggregate content. It also provides details on BHL's scanning operations and efforts to engage international partners to expand its global coverage of literature.
The repository ecology: an approach to understanding repository and service i...R. John Robertson
An increasing number of university institutions and other organisations are deciding to deploy repositories and a growing number of formal and informal distributed services are supporting or capitalising on the information these repositories provide. Despite reasonably well understood technical architectures, early majority adopters may struggle to articulate their place within the actualities of a wider information environment. The idea of a repository ecology provides developers and administrators with a useful way of articulating and analysing their place in the information environment, and the technical and organisational interactions they have, or are developing, with other parts of such an environment. This presentation will provide an overview of the concept of a repository ecology and examine some examples from the domains of scholarly communications and elearning.
Introduction to Ontologies for Environmental BiologyBarry Smith
1. The document introduces ontologies for environmental biology and discusses several disciplines that could benefit from their use, including GIS, ecology, environmental biology, and various "-omics" fields.
2. It describes what an ontology is and compares ontologies to legends for maps or diagrams, which allow integration and help humans and computers make sense of complex data. Ontologies provide standardized terminology and annotations.
3. The document outlines the Open Biomedical Ontologies (OBO) Foundry, a collection of interoperable reference ontologies for annotating biomedical data. Foundry ontologies include the Gene Ontology and other ontologies for molecules, cells, anatomical structures, and more. They are developed through consensus and share
(1) The document is an annotated bibliography on information extraction and natural language processing written by Jun-ichi Tsujii from the University of Tokyo.
(2) It provides references to key papers that have influenced the development of the field of information extraction over the last 5 years as of 2000, organized by topics such as general introduction, IE systems used in Message Understanding Conferences, and IE systems for biology and biomedical texts.
(3) The references cover techniques such as finite-state processing, pattern matching, and use of full parsers as well as domain-specific resources for biological IE systems.
Conceptualising Framework for Local Biodiversity Heritage Sites (LBHS): A Bio...Vishwas Chavan
This document proposes a conceptual framework for establishing Local Biodiversity Heritage Sites (LBHS) in Maharashtra, India based on a social-ecological model. It discusses how the Biological Diversity Act of 2002 allows local communities to designate biodiversity-rich areas as heritage sites. The framework identifies potential LBHS in two habitats: sacred groves, which are forest patches traditionally protected for their cultural and ecological values; and rocky plateaus, which support unique biodiversity through indigenous management practices. The document argues LBHS can preserve genetic resources, species, ecosystems, knowledge, culture and traditions as a legacy for future generations.
State Biodiversity Boards: Towards Better GovernanceVishwas Chavan
India’s Biological Diversity Act, 2002, and the three-tier
implementation mechanism of the National Biodiversity Authority (NBA), the State Biodiversity Board (SBB), the Union Territory Biodiversity Council (UTBC) and the Biodiversity Management Committee (BMC) is close to two decades old. However, our collective and compounding national progress is much less than satisfactory. One of the major reasons is lack of empowerment
of the SBBs, the UTBCs and resultantly passive functioning of the BMCs. Bottom-upward empowerment of BMCs to SBBs and UTBCs is crucial in order to achieve the National Biodiversity Targets (NBT) and other national biodiversity conservation and sustainable development ambitions. In this article, author proposes a five pillared work program that can help empower
the SBBs and UTBCs that can result in vibrant and optimally governing BMCs. Some or all of the activities mentioned in this article may have been initiated or implemented by few SBBs and UTBCs. However, author calls for coordinated and performance evaluation mechanism being developed and steered by SBBs and UTBC to achieve the national goal of development inclusive biodiversity conservation.
BioVeL (Biodiversity Virtual e-Laboratory) is an e-laboratory that supports research on biodiversity using large amounts of data from cross-disciplinary sources.
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
The document discusses how bio-ontologies and natural language processing can enable open science by facilitating structured knowledge representation and collaborative curation. It describes services provided by the National Center for Biomedical Ontology (NCBO) that allow use of ontologies for annotation, data aggregation, and accelerating the curation process. Several groups are highlighted that utilize NCBO services for applications such as clinical trial matching, specimen banking, and data summarization.
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONIJwest
With the growth of data-oriented research in humanities, a large number of research datasets have been
created and published through web services. However, how to discover, integrate and reuse these distributed
heterogeneous research datasets is a challenging task. Ontology is the soul between series digital humanities
resources, which provides a good way for people to discover and understand these datasets. With the release
of more and more linked open data and knowledge bases, a large number of ontologies have been produced
at the same time. These ontologies have different publishing formats, consumption patterns, and interactions
ways, which are not conductive to the user’s understanding of the datasets and the reuse of the ontologies.
The Ontology Service Center platform consists of Ontology Query Center and Ontology Validation Center,
mainly using linked data and ontology-based technologies. The Ontology Query Center realizes the functions
of ontology publishing, querying, data interaction and online browsing, while the Ontology Validation
Center can verify the status of using certain ontologies in the linked datasets. The empirical part of the paper
uses the Confucius portrait as an example of how OSC can be used in the semantic annotation of images. In
a word, the purpose of this paper is to construct the applied ecology of ontology to promote the development
of knowledge graphs and the spread of ontology.
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION dannyijwest
With the growth of data-oriented research in humanities, a large number of research datasets have been
created and published through web services. However, how to discover, integrate and reuse these distributed
heterogeneous research datasets is a challenging task. Ontology is the soul between series digital humanities
resources, which provides a good way for people to discover and understand these datasets. With the release
of more and more linked open data and knowledge bases, a large number of ontologies have been produced
at the same time
This document discusses the importance of publishing archaeological data openly on the web. It notes that raw data is not sufficient and requires cleaning, description and organization to be intelligible and reusable. Data publishing involves putting data in a form that is editorially vetted, machine-readable, linked to other resources, and archived for long-term access. This enhances data presentation, search, discovery and allows new types of research across multiple datasets. While challenging, data publishing provides benefits like reproducibility, new research opportunities, and professional advancement when done according to best practices.
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...James Nelson
The aim of our project is to also utilize the GloBI APIs to visualize understudied organisms and locations with minimal interaction data within the GloBI data repository.
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Presentation of the Global Biodiversity Information Facility (GBIF) knowledge organization system (KOS) work program for the National Center for Biomedical Ontology (NCBO) Web seminar series in October 2012. Available at http://www.bioontology.org/GBIF-vocabulary-management-for-biodiversity-informatics
The document discusses recommender systems and summarizes:
1) It introduces recommender systems and the different types including knowledge-based, collaborative filtering, and content-based recommendations.
2) It outlines some of the key resources for recommender systems including datasets, conferences, and articles.
3) It provides a high-level overview of common recommender system approaches like collaborative filtering, content-based analysis, and knowledge-based recommendations.
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
Slides from a presentation on the Knowledge Organization System (KOS) work program for GBIF. KOS developments for biodiversity information resources and input to the emerging Vocabulary Management Task Group (VoMaG).
Links
GBIF KOS prototype tools, http://kos.gbif.org/
Tool: Semantic Wiki prototype, http://terms.gbif.org/wiki/
Tool: ISOcat prototype demo, http://kos.gbif.org/isocat/
GBIF concept vocabulary term browser, http://kos.gbif.org/termbrowser/
GBIF Resources Repository, http://rs.gbif.org/terms/
GBIF Vocabulary Server, http://vocabularies.gbif.org/
GBIF Resources Browser, http://tools.gbif.org/resource-browser/
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...dolleyj
The Evidence & Conclusion Ontology (ECO) has been developed to provide standardized descriptions for types of evidence within the biological domain. Best
practices in biocuration require that when a biological assertion is made (e.g. linking a Gene Ontology (GO) term for a molecular function to a protein), the type of evidence
supporting it is captured. In recent development efforts, we have been working with other ontology groups to ensure that ECO classes exist for the types of curation they
support. These include the Ontology for Microbial Phenotypes and GO. In addition, we continue to support user-level class requests through our GitHub issue tracker. To
facilitate the addition and maintenance of new classes, we utilize ROBOT (a command line tool for working with Open Biomedical Ontologies) as part of our standard workflow.
ROBOT templates allow us to define classes in a spreadsheet and convert them to Web Ontology Language (OWL) axioms, which can then be merged into ECO. ROBOT is
also part of our automated release process. Additionally, we are engaged in ongoing work to map ECO classes to Ontology for Biomedical Investigation classes using logical
definitions. ECO is currently in use by dozens of groups engaged in biological curation and the number of ECO users continues to grow. The ontology, in OWL and Open
Biomedical Ontology (OBO) formats, and associated resources can be accessed through our GitHub site (https://github.com/evidenceontology/evidenceontology) as well as
the ECO web page (http://evidenceontology.org/).
Präsentation anlässlich eines Thementreffs der Hauptbibliothek Universität Zürich zum Thema "Neue Open Access-Themen mit Bedeutung für wissenschaftliche Bibliotheken" am 23.7.2012
Data sharing archiving discovery, Bill MichenerAlison Specht
A presentation by Bill Michener (University of New Mexico and DataONE) about data sharing, archiving and discovery. It was an introduction to a session co-hosted by FRB-CESAB and CEFE (CNRS) in Montpellier.
The document discusses knowledge management of experimental data through the ISA ecosystem. It describes the ISA-tab format and software suite that allows annotation and curation of experimental metadata. As a use case, it analyzes a dataset on metabolite profiling from a study of fatty acid amide hydrolase knockout mice. The ISA tools can represent investigations and assays, convert data to standardized formats, and facilitate sharing and analysis of experimental data.
The document summarizes the goals and activities of the Biodiversity Heritage Library (BHL) project, which aims to digitize the published literature on biodiversity and make it openly accessible online. It discusses BHL's partnerships with other organizations like the Encyclopedia of Life to aggregate content. It also provides details on BHL's scanning operations and efforts to engage international partners to expand its global coverage of literature.
The repository ecology: an approach to understanding repository and service i...R. John Robertson
An increasing number of university institutions and other organisations are deciding to deploy repositories and a growing number of formal and informal distributed services are supporting or capitalising on the information these repositories provide. Despite reasonably well understood technical architectures, early majority adopters may struggle to articulate their place within the actualities of a wider information environment. The idea of a repository ecology provides developers and administrators with a useful way of articulating and analysing their place in the information environment, and the technical and organisational interactions they have, or are developing, with other parts of such an environment. This presentation will provide an overview of the concept of a repository ecology and examine some examples from the domains of scholarly communications and elearning.
Introduction to Ontologies for Environmental BiologyBarry Smith
1. The document introduces ontologies for environmental biology and discusses several disciplines that could benefit from their use, including GIS, ecology, environmental biology, and various "-omics" fields.
2. It describes what an ontology is and compares ontologies to legends for maps or diagrams, which allow integration and help humans and computers make sense of complex data. Ontologies provide standardized terminology and annotations.
3. The document outlines the Open Biomedical Ontologies (OBO) Foundry, a collection of interoperable reference ontologies for annotating biomedical data. Foundry ontologies include the Gene Ontology and other ontologies for molecules, cells, anatomical structures, and more. They are developed through consensus and share
(1) The document is an annotated bibliography on information extraction and natural language processing written by Jun-ichi Tsujii from the University of Tokyo.
(2) It provides references to key papers that have influenced the development of the field of information extraction over the last 5 years as of 2000, organized by topics such as general introduction, IE systems used in Message Understanding Conferences, and IE systems for biology and biomedical texts.
(3) The references cover techniques such as finite-state processing, pattern matching, and use of full parsers as well as domain-specific resources for biological IE systems.
Conceptualising Framework for Local Biodiversity Heritage Sites (LBHS): A Bio...Vishwas Chavan
This document proposes a conceptual framework for establishing Local Biodiversity Heritage Sites (LBHS) in Maharashtra, India based on a social-ecological model. It discusses how the Biological Diversity Act of 2002 allows local communities to designate biodiversity-rich areas as heritage sites. The framework identifies potential LBHS in two habitats: sacred groves, which are forest patches traditionally protected for their cultural and ecological values; and rocky plateaus, which support unique biodiversity through indigenous management practices. The document argues LBHS can preserve genetic resources, species, ecosystems, knowledge, culture and traditions as a legacy for future generations.
State Biodiversity Boards: Towards Better GovernanceVishwas Chavan
India’s Biological Diversity Act, 2002, and the three-tier
implementation mechanism of the National Biodiversity Authority (NBA), the State Biodiversity Board (SBB), the Union Territory Biodiversity Council (UTBC) and the Biodiversity Management Committee (BMC) is close to two decades old. However, our collective and compounding national progress is much less than satisfactory. One of the major reasons is lack of empowerment
of the SBBs, the UTBCs and resultantly passive functioning of the BMCs. Bottom-upward empowerment of BMCs to SBBs and UTBCs is crucial in order to achieve the National Biodiversity Targets (NBT) and other national biodiversity conservation and sustainable development ambitions. In this article, author proposes a five pillared work program that can help empower
the SBBs and UTBCs that can result in vibrant and optimally governing BMCs. Some or all of the activities mentioned in this article may have been initiated or implemented by few SBBs and UTBCs. However, author calls for coordinated and performance evaluation mechanism being developed and steered by SBBs and UTBC to achieve the national goal of development inclusive biodiversity conservation.
Exploring the future of scholarly publishing of biodiversity dataVishwas Chavan
Little more than decade back biodiversity data publishing was opportunistic and secondary spin-off activity of the biodiversity research and conservation management chain. Today, the Global Biodiversity Information Facility facilitate free and open access to over 420 million primary biodiversity data records contributed by publishers across the globe. This is an outcome of a growing realization that free and open access to biodiversity data is crucial to take informed decisions and actions for sustainable use of biotic resources and conservation of biodiversity areas. In recent past use of biodiversity data in research, conservation and management activities is on rise. However, users often complain about the low degree of ‘fitness-for-use’ of the accessible data. Most of the times potential use of data is hampered because of lack of adequate metadata, that can demonstrate the fintness-for-use of a given dataset.
To overcome this an appropriate incentivisation mechanism is essential, that can provide due credit and acknowledgement to a research groups for their efforts in authoring good metadata. In recent past a concept of ‘scholarly data publishing’ is being talked about where in both data and metadata undergo peer-review similar to other scientific publications. Pensoft publishing has launched a fresh data only journal called ‘Biodiversity Data Journal, and accepts data papers in six of its other journal titles. European aquatic biodiversity community through EU funded project ‘BioFresh’ has engaged with editors of 29 aquatic biodiversity journals to being accepting data papers. GBIF node in Columbia and South Africa are planning to kick start a journal that will publish data papers. Recently, Nature Publishing Group has announced a peer-reviewed data publishing only journal called ‘Scientific Data’. These developments announce the arrival of the new data publishing era ‘Scholarly Data Publishing’. Biodiversity science and biodiversity informatics stands to gain a lot by being on the forefront of this tide.
The document discusses GBIF's 2010-2011 work programme highlights related to improving content for science and society. It outlines GBIF's approach to focus on community needs, expand content coverage to include multimedia and observations, and increase relevance through facilitating the flow of data and information to scientific publications and decisions. It also analyzes current coverage and content biases and trends to help guide GBIF's science focus in 2011.
This document discusses data citation mechanisms and services for primary biodiversity data. It outlines the need for data citation to provide recognition for data producers and publishers. An ideal data citation framework would address social, technical, and policy issues to incentivize all stakeholders. Core technical components would include persistent identifiers, a data citation mechanism, and a data usage index. The document reviews the history calling for data citation standards and proposes requirements for an effective data citation model, including attributing roles across data production and publication. It also examines challenges in developing data citation practices.
The document summarizes the recommendations of the GBIF Governing Board's Global Strategy and Action Plan for Mobilization of Natural History Collections Data task group. The task group recommends that GBIF facilitate discovery of non-digital collection resources, increase efficiency of data capture and quality of digitized specimens, and improve infrastructure for publishing digitized collection data globally.
The document summarizes the state of data publishing through the Global Biodiversity Information Facility (GBIF) network. It finds that while data records are increasing, the rate of increase is declining. Developing regions contribute the most data, with the Avian Knowledge Network as the single largest data publisher. Over 2.4 billion records have been identified by GBIF participants but only around 800 million are accessible digitally and participants have committed to publishing less than 25% of available records by 2010. There remains a need for more strategic and planned approaches to data discovery and publishing with an emphasis on both local and global efforts.
This article discusses Balanophora, a rare and endangered plant found in North East India. It belongs to the family Balanophoraceae. The 15 species in the genus are native to the Old World Tropics. Most species are parasites on tree roots and are found in dense forests in the Himalayan region. They have underground inflorescences that rupture and emerge above ground. The plants are dioecious. Balanophora is listed as an endangered species under Indian law and prohibited from export due to its rarity. The article provides a brief description of the plant's rhizome, scapes, and reproductive structure.
This document provides an overview of bioinformatics education in India. It discusses how bioinformatics education has evolved from short workshops to formal degree programs over time. A key development was the establishment of the Biotechnology Information System network in the 1980s by the Department of Biotechnology, which helped develop bioinformatics infrastructure and training programs in India. The document then describes the current landscape of bioinformatics education in India, including a case study of the master's program in bioinformatics at the University of Pune. It concludes by noting that many universities and institutions now offer bioinformatics education at various levels to train students for careers in this growing field.
The document summarizes recommendations from the GBIF GSAP-NHC Task Group on improving the publishing of natural history collections data. It recommends that GBIF facilitate access to information about non-digital collections, work to increase the efficiency of digitizing specimen data and enhance data quality, and improve the global infrastructure for publishing digitized collections data.
The document discusses a meeting agenda between GBIF (Global Biodiversity Information Facility) and Elsevier to discuss opportunities for collaboration around data publishing and sharing biodiversity data. Some key points discussed in the agenda include GBIF's role in facilitating open access to biodiversity data, its data publishing framework to encourage data mobilization and sharing, and potential areas of collaboration around simultaneous publishing of data and scholarly articles.
The document discusses GBIF's (Global Biodiversity Information Facility) goals of facilitating open access to biodiversity data worldwide to support scientific research. GBIF shares over 200 million biodiversity records through data publishers and resources. The document proposes a Data Publishing Framework to improve data mobilization and cultural acceptance of open data sharing. It describes challenges to the framework and its potential impacts, such as increased data usage and quality through incentives like data papers and a Data Usage Index.
The document summarizes recommendations from the GBIF GSAP-NHC Task Group on improving the digitization and publication of natural history collection data. It recommends that GBIF facilitate discovery of non-digital collection resources, increase efficiency and quality of data capture, and improve global infrastructure for publishing digitized collection data. Specifically, it calls for GBIF to publicize non-digital metadata, assess the scale of undigitized specimens, support technological innovations for digitization, and strengthen hosting and identification of published data.
The document discusses technologies and infrastructure for publishing biodiversity data from environmental impact assessments (EIA). It covers the types and formats of EIA biodiversity data, tools for data capture and digitization, platforms for data discovery and publishing, ensuring data quality, and hosting data centers to facilitate long-term archiving and publishing of EIA biodiversity data.
The document discusses the need for a Global Biodiversity Resources Discovery System (GBRDS) to address challenges in discovering biodiversity data. It proposes that GBRDS would act as a registry and discovery service to facilitate finding biodiversity information resources. GBRDS would provide an integrated 'yellow pages' reference for all biodiversity data by reconciling distributed resources and allowing meaningful discovery of data and services in a distributed manner. The document outlines how GBRDS could empower discovery of biodiversity data resources.
1. A peer-reviewed open-access journal
ZooKeys 11: 1-8 (2009)
Publication and dissemination of datasets in taxonomy: ZooKeys working example 1
doi: 10.3897/zookeys.11.210 EDITORIAL
www.pensoftonline.net/zookeys Launched to accelerate biodiversity research
Publication and dissemination of datasets in taxonomy:
ZooKeys working example
Lyubomir Penev1, Terry Erwin2, Jeremy Miller 3,6, Vishwas Chavan4,
Tom Moritz5, Charles Griswold6
1 Central Laboratory of General Ecology, Bulgarian Academy of Sciences and Pensoft Publishers, Sofia, Bulgaria
2 Smithsonian Institution, Washington DC, USA 3 Department of Terrestrial Zoology, Nationaal Natuurhi-
storisch Museum Naturalis, Leiden, The Netherlands 4 Global Biodiversity Information Facility, Copenhagen,
Denmark 5 1968 1/2 South Shenandoah Street, Los Angeles, California, USA; formerly: Harold Boechenstein
Director, Library Services, American Museum of Natural History, New York, USA 6 Department of Entomology,
California Academy of Sciences, San Francisco, USA
Corresponding authors: Lyubomir Penev (info@pensoft.net), Jeremy Miller (miller@naturalis.nl), Vishwas Chavan
(vchavan@gbif.org)
Received 27 May 2008 | Accepted 30 May 2009 | Published 1 June 2009
Citation: Penev L, Erwin T, Miller J, Chavan V, Moritz T, Griswold C (2009) Publication and dissemination of data-
sets in taxonomy: ZooKeys working example. ZooKeys 11: 1-8. doi: 10.3897/zookeys.11.210
Abstract
A concept for data publication and semantic enhancements proposed by ZooKeys and applied in the
milestone paper of Miller et al. (2009) is described. For the first time in systematic zoology, an unique
combination of data publication and semantic enhancements is applied within the mainstream process of
journal publishing, to demonstrate how: (1) All primary biodiversity data underlying a taxonomic mono-
graph are published as a dataset under a separate DOI within the paper; (2) The occurrence dataset is
discoverable and accessible through GBIF data portal (data.gbif.org) simultaneously with the publication;
(3) Occurrence dataset is published as a KML (Keyhole Markup Language) file under a distinct DOI to
provide an interactive experience in Google Earth; (4) All new taxa (42) are registered at ZooBank during
the publication process (mandatory for ZooKeys); (5) All new taxa (42) are provided to Encyclopedia of
Life through XML mark up on the day of publication (mandatory for ZooKeys). It is proposed to clearly
distinguish between static and dynamic datasets in the way they are published, preserved and cited.
Keywords
Data publication, semantic enhancements, taxonomy
Copyright Lyubomir Penev et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
2. 2 Lyubomir Penev et al. / ZooKeys 11: 1-8 (2009)
Introduction
Publishing of datasets is one of the most discussed fields in academic publishing. The
need for open access to research data consequently ensuring its preservation, dissemi-
nation and reuse is recognized as a strategic goal in several documents at governmental
and international levels. For instance, at the European Union (EU) level it is recog-
nized that long-term sustainability will be achieved not only by digital data storage but
even more so by increasing probability of data reuse. Such a perspective is reflected
by the statement of the Council of Europe on “the importance of better access to un-
processed data and repository resources for data and material that allows fresh analysis
and utilisation beyond what the originator of the data had envisaged” (2832nd COM-
PETITIVENESS (Internal market, Industry and Research) Council meeting, Brussels,
22 and 23 November 2007). Moreover, in 2008 the 7th Framework Program of EU
opened a special call entitled “Rehabilitation of data from biodiversity-related projects
funded under previous framework programmes”.
Data publication tends to become a common practice in other branches of science
(i.e. Altman and King 2007). In a recently published white paper of OECD (Green
2009), there are described working examples of a quite simple approach to data pub-
lishing. The publishing itself is not a technical problem. The main question is how to
publish data under the open access model and how to motivate data collectors and
creators to do that? There are also other unresolved questions related to consequent
data usage, method of citation, author’s and/or institution’s recognition, challenge of
publishing dynamic datasets (databases) and so on.
Benefits from data publishing for the authors and society are obvious. Since the
data are online and freely available, authors gain broader recognition for their work.
But more importantly, published data when well-described and validated, contribute
to larger data aggregations through organizations like ZooBank, Global Biodiversity
Information Facility (GBIF), Morphbank, and Encyclopedia of Life (EOL). Increas-
ingly, analyses and meta-analyses of aggregated data will unlock new potentials for
academic science, applied science (i.e., conservation), and informed public discourse
leading to better public policy. The indexing practices described here would establish
authorship of datasets, crediting researchers and institutions for their productivity and
initiative in piloting non-traditional forms of publication. Sponsors, funding agencies
and society in general would benefit from the full disclosure, aggregation and reuse of
the results of publicly funded scientific research.
In systematic zoology we have already excellent examples of semantic enhance-
ments to research papers (i.e., Pyle et al. 2008; Fisher and Smith 2008; Talamas et al.
2009), however we still lack a comprehensive “full-life-cycle” model for data manage-
ment, from a manuscript through the peer-reviewed publication process to data ag-
gregation, dissemination, and stable long-term deposition. This particularly concerns
primary biodiversity (mostly based on but not limited to specimen and observation-
by-locality) data.
3. Publication and dissemination of datasets in taxonomy: ZooKeys working example 3
We describe here a vision for data publication and dissemination in taxonomy. The
most ambitious illustration of this publishing process (Fig. 1) is the milestone paper
of Miller et al. (2009) published in the present issue. Following its proclaimed policy
to apply innovative ways of publishing towards a “web-based” taxonomy (Penev et al.
2008), ZooKeys in a close cooperation with the authors and GBIF, provide in this
paper for the first time in systematic zoology a unique combination of data publica-
tion and semantic enhancements (Shotton et al. 2009), applied within the mainstream
process of journal publishing:
1. All primary biodiversity data underlying a taxonomic monograph are publi-
shed as a dataset under a separate DOI within the paper
2. The occurrence dataset is uploaded to GBIF simultaneously with the publication
3. The occurrence dataset is published as a KML (Keyhole Markup Language)
file under a distinct DOI to provide an interactive experience in Google Earth.
The interactive map features collection occurrence data for all specimens in the
monograph and links to collections of images for each species posted on Mor-
phbank. Data can be filtered to display or hide any family, genus, or species.
4. All new taxa (42) are registered at ZooBank during the publication process
(mandatory for ZooKeys)
5. All new taxa (42) are provided to Encyclopedia of Life through XML mark up
on the day of publication (mandatory for ZooKeys)
Morphbank
Locality data file Manuscript Pre-submission
Genbank
Submission, peer-review, editing
Editorial
LSID LSID
process
Locality data file Proofs
DOI
IPT Publication
ISI
Locality KML file XML
MARK UP
Zoological Record Publication
Dessimination
CAB Abstracts Deposition
Archives
Earth
Fig. 1. The ZooKeys model for data publication and semantic enhancements workflow in taxonomy
4. 4 Lyubomir Penev et al. / ZooKeys 11: 1-8 (2009)
Description of the concept
Static and dynamic datasets
A dataset is understood here as a logical file – analog or machine-readable – present-
ing a collection of facts (observations, descriptions or measurements) formally struc-
tured in records; each record is structured in fields. Within the domain of biological
taxonomy, a dataset can be any set of data underlying a taxonomic paper – e.g., a list
of all occurrence data published in the paper, data tables from which a graph or map
is produced, an appendix with morphological measurements, etc. Each dataset has its
own DOI within a paper.
We propose a clear distinction between fixed “data tables” that represent precisely
the set or sets of data upon which the analyses and conclusions of a given scientific
paper are based and dynamic “databases” that represent larger and more extensive col-
lections of data that may or may not include the precise data table or tables that are
the referent(s) for a given scientific paper. Both data tables and databases are of strong
potential scientific interest and application but publication of data tables is inextricably
linked to a scientific paper and the publisher must assure consistent and secure access, in
perpetuity, to referent data tables. In other words, a newer version of a data table cannot
be uploaded on the journal’s website and in this way the publisher guarantees its con-
sistency and perpetuity in time, similarly to the content of an electronic publication.
We propose the use of digital object identifiers (DOI), semantically related to the
DOI of the respective paper, to be assigned to each discrete referent data table. This
will insure that there is a fixed and citable entity to which subsequent reviews and revi-
sions can refer. The referent data table can be downloaded in conjunction with a paper
and critically analyzed. It can also be freely used for subsequent analyses and publica-
tions given proper citation and attribution.
Publication and citation of dynamic databases is supplementary and discretionary
in the context of a given paper but should certainly be encouraged for the greater good
of science. Sustained maintenance of databases is not an obligation for a publisher but
is – by emerging scientific norms – an obligation for scientists, scientific institutions
and agencies, libraries and archives. Ultimately we can envision aggregations – at vari-
ous chronologic, geographic or taxonomic scales – of hundreds of databases composed
of thousands of datasets.
Respecting the citation of databases, we note that conventions of citation must
include careful specification of the version, date and time of accessioning. Database
design should at all points support easy “stamping” of data usage.
Identification and location
We propose also to distinguish between static and dynamic datasets semantically to fa-
cilitate harvesting through scripts and other machine-generated methods. Data tables,
5. Publication and dissemination of datasets in taxonomy: ZooKeys working example 5
as the referents of a scientific paper, can be identified with the acronym “dt” (from data
table) within its DOI, which is a semantical extension of the paper’s DOI (working
examples from the paper of Miller et al. (2009) in the present issue):
If a DOI of a paper is: doi: 10.3897/zookeys.11.160
then the DOI of a static dataset (data table) will have the form:
doi: 10.3897/zookeys.11.160/app.B.dt
which means Appendix B of the paper.
To distinguish between static and dynamic datasets, DOI of data, if published as a
database, would have the form:
doi: 10.3897/zookeys.11.160/app.B.db (“db” marks up a dynamic dataset to differ
it from “dt” which means a static dataset or data table).
DOI of another data table, i.e. data underlying a graph, within the same publication
would have the form:
doi: 10.3897/zookeys.11.160/fig.2.dt
Stand-alone publication of a dataset
“Datasets”, as collections of data, can be published separately in a form of a conven-
tional publication containing textual description of key features of the dataset (e.g.
“metadata”) – i.e., introductory information on a taxon/taxa being subject of this da-
taset, principles of architecture, size, technical description, history of the dataset/data-
base itself, hosting, relations to other datasets, ownership and copyright issues and so
on. Within a journal, such a publication may be classified as “Dataset” analogously to
other type of publications, i.e., “Editorial”, “Research article”, or “Correspondence”.
Formal protocols for what will constitute complete and acceptable metadata for data
(data tables, datasets and databases) will need to be prescribed in subsequent analysis.
Significant progress on this problem is being made in other domains (Green 2009).
Peer-review
The publication – either as a scientific paper containing referent data tables – or a
stand-alone publication of a dataset – will be subject to standard peer-review processes
in accordance with the editorial requirements of specific journals.
6. 6 Lyubomir Penev et al. / ZooKeys 11: 1-8 (2009)
Citation
Citation of a data will vary in correspondence with the form of publication and pres-
ervation.
If published within a paper:
<Author> <(Year)> <Legend of the dataset>. DATA TABLE. <File format> <DOI
of the dataset> <URL of the dataset> <Journal, volume, issue, pages> < DOI of
the publication>
In the case of the working example of Miller et al. (2009), the citation of the dataset
published under Appendix B has the form:
Miller JA, Griswold CE, Yin CM (2009) Appendix B. Locality data (XLS format)
for all specimens of the spider families Theridiosomatidae, Mysmenidae, Anapi-
dae, and Symphytognathidae collected during an inventory of the Gaoligongshan,
Yunnan, China, 1998-2007. DATA TABLE. File format: Microsoft Excel (1997-
2003). doi: 10.3897/zookeys.11.160/app.B.dt. http://dx.doi.org/10.3897/zook-
eys.11.160/app.B.dt. ZooKeys 11: 9-195. doi: 10.3897/zookeys.11.160
If published as a stand-alone:
<Author> <(Year)> <Title of the publication> <Journal, volume, issue, pages> <
DOI of the publication> DATASET. <Dataset format, version> <DOI of the da-
taset> <URL of the dataset>. Accessed <Day of accession>
Dissemination and usage
ZooKeys was the first journal to offer mandatory ZooBank registration and to supply
species descriptions of all new taxa to Encyclopedia of Life on the day of publication.
Thanks to that new taxa described on the pages of the journal become quickly known
to the world. What happens, however, with the species-by-occurrence records (i.e.,
what is normally called “primary biodiversity data”)?
To describe a new species is not an easy task and behind this process there is a long
time of accumulation of knowledge, efforts and investments (funding). Similarly, the
collection of specimen records requires serious effort. At ZooKeys we are convinced
that each species-by-occurrence record collected on Earth has its own discrete value
and deserves proper registration, publication, preservation and dissemination and with
proper attribution for authors, data suppliers and publishers.
Still the job is not complete merely with publication of data. The larger challenge
is to accumulate a dataset in a repository, or repositories, where it will be preserved,
7. Publication and dissemination of datasets in taxonomy: ZooKeys working example 7
integrated and reused. The role of such discovery and mobilisation of primary biodi-
versity is major reason for GBIF’s existence. The recently launched Integrated Publish-
ing Toolkit (ipt.gbif.org) by GBIF allows data to be published in a form of database
using the DarwinCore protocol. Besides the standard DarwinCore fields describing
taxonomic position and rank of each taxon, locality, collectors, ID of specimens in a
collection, etc., the data table published as a downloadable Excel file as Appendix B
(doi: 10.3897/zookeys.11.160/app.B.dt) in the paper of Miller et al. (2009) was com-
pleted with the following additional fields:
§ LSID (ZooBank) of each taxon
§ DOI of the dataset
§ DOI of the publication
§ Citation
The dataset was published through GBIF simultaneously with the publication of
the paper, allowing its reuse within the scope of an unlimited number of cross-sectioned
larger datasets compiled at larger geographic, taxonomic or chronologic scales, e.g. of
all spider species recorded in China, all plant and animal species recorded in the same
localities or region and so on. The LSID link of each species offer unlimited possibili-
ties for use of each specimen record in any branch of science and nature conservation.
However, how to use the data independently, in the form “as-it-is-published”? Un-
der the terms of the Creative Commons Attribution License anyone can download the
dataset and use it, provided that the original author and source are credited. The same
paper however, offers one more innovative way to display and use this particular dataset.
In Appendix C, the data are published in the form of KML, a file format which allows
interactive mapping in Google Earth, and incorporates display on maps of all localities
and species, descriptions of each specimen records, hierarchical filtering and mapping of
higher taxa (genus, family), and links to species descriptive morphology in Morphbank.
It remains only to include the links to EOL species pages, which will be easy to do,
when EOL starts to provide LSIDs to their species pages prior to the day of publication.
Another great feature of our time to dream for!
We believe that our proposed approach will motivate authors to publish data and
to receive recognition in the form of citations, future co-authorship, and credentialing
for career development. We look forward to the time when data discovery and publish-
ing initiatives like GBIF will automatically index such datasets and incorporate them
along with the correspondent descriptive metadata details.
References
Altman M, King GA (2007) Proposed Standard for the Scholarly Citation of Quantitative
Data. D-lib Magazine 13 (3/4).
8. 8 Lyubomir Penev et al. / ZooKeys 11: 1-8 (2009)
Green T (2009) We Need Publishing Standards for Datasets and Data Tables. OECD
Publishing White Paper, OECD Publishing. doi: 10.1787/603233448430. http://dx.doi.
org/10.1787/603233448430
Fisher BL, Smith MA (2008) A Revision of Malagasy Species of Anochetus Mayr and
Odontomachus Latreille (Hymenoptera: Formicidae). PLoS ONE 3(5): e1787. doi:
10.1371/journal.pone.0001787
Miller JA, Griswold CE, Yin CM (2009) The symphytognathoid spiders of the Gaoligongshan,
Yunnan, China (Araneae: Araneoidea): Systematics and diversity of micro-orbweavers.
ZooKeys 11: 9-195. doi: 10.3897/zookeys.11.160
Penev L, Erwin T, Thompson FC, Sues H-D, Engel MS, Agosti D, Pyle R, Ivie M, Assmann
T, Henry T, Miller J, Ananjeva NB, Casale A, Lourenco W, Golovatch S, Fagerholm H-P,
Taiti S, Alonso-Zarazaga M (2008) ZooKeys, unlocking Earth’s incredible biodiversity and
building a sustainable bridge into the public domain: From “print-based” to “web-based”
taxonomy, systematics, and natural history. ZooKeys Editorial Opening Paper. ZooKeys 1:
1-7. doi: 10.3897/zookeys.1.11
Pyle RL, Earle JL, Greene BD (2008) Five new species of the damselfish genus Chromis
(Perciformes: Labroidei: Pomacentridae) from deep coral reefs in the tropical western
Pacific. Zootaxa 1671: 3-31.
Shotton D, Portwin K, Klyne G, Miles A (2009) Adventures in Semantic Publishing: Exemplar
Semantic Enhancements of a Research Article. PLoS Comput Biol 5(4): e1000361.
doi:10.1371/journal.pcbi.1000361
Talamas EJ, Johnson NF, van Noort S, Masner L, Polaszek A (2009) Revision of world species
of the genus Oreiscelio Kieffer (Hymenoptera, Platygastroidea, Platygastridae). ZooKeys 6:
1-68. doi: 10.3897/zookeys.6.67
2832nd COMPETITIVENESS (Internal market, Industry and Research) Council meeting,
Brussels, 22 and 23 November 2007. http://ue.eu.int/Newsroom