The document discusses two initiatives - Cafe RouGE and ORCID - for improving data sharing and attribution for genetic research data. Cafe RouGE is a central clearinghouse that assigns DOIs to genetic variation data submitted by diagnostic laboratories to facilitate sharing and tracking data usage. ORCID seeks to address challenges in attributing work to contributors by providing a global registry of disambiguated IDs for researchers. The initiatives aim to improve data publication, citation and credit for data submitters.
ORCID participant meeting May 2011: The digital scholar, identity on the Web ...Gudmundur Thorisson
The document discusses Gudmundur Thorisson's involvement with ORCID and related projects. It describes ongoing and planned genetic research data publication projects that incorporate ORCID to help address challenges around name ambiguity and attribution. Specifically, it outlines projects using ORCID to provide publication credit and unique identifiers for data deposits in Cafe Variome and nanopublications in GWAS Central. It also discusses how ORCID could help aggregate a digital scholar's various online identities and contributions across publications, data, code, and other research objects.
The document discusses incentivizing data sharing by treating data like publications. It proposes a system where researchers can publish datasets online, receive digital object identifiers (DOIs) for datasets, and have their ORCID researcher identifiers linked to the DOIs. This would allow researchers to be unambiguously attributed to the datasets they generate and provide metrics like the number of times their datasets are cited, incentivizing data sharing similar to how the current publication system works.
Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...Gudmundur Thorisson
The document discusses integrating ORCID researcher identifiers with data publication to provide incentives for data sharing. It describes two of the author's data publication projects: a disease genetics data project and a project called Cafe Variome that facilitates the exchange of genetic data between diagnostic laboratories and databases. The author argues that treating data as publications that are cited and attributed to their creators, such as through assigning DOIs and linking to ORCID IDs, can help address challenges around data sharing by incentivizing researchers.
G.A. Thorisson presents on the collaborative project between VIVO and ORCID to address challenges in author identification and attribution. The document discusses problems with name ambiguity and the need for unique researcher identifiers. ORCID aims to assign persistent identifiers to individual researchers to disambiguate names and track author contributions. The collaborative project between VIVO and ORCID involves evaluating how the two systems can interact technically by identifying overlaps in capabilities, reusing software components, and developing extensions to better integrate researcher profiles and publication data.
Identity in research data publication - meeting with SageCite people march2011Gudmundur Thorisson
The document discusses the problem of non-unique author names in scholarly literature. Approximately two-thirds of the 6 million authors in MEDLINE have names that are ambiguous. It introduces ORCID as a solution to provide unique identifiers for authors and contributors to automatically disambiguate names and accurately attribute publications. ORCID assigns persistent digital identifiers to individuals and links author names to research works, facilitating credit and recognition of contributions.
Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?
High throughput mining of the scholarly literature: journals and thesespetermurrayrust
The document discusses how machines can be used to analyze and extract information from the huge volume of scholarly literature produced each day, which now amounts to over 10,000 articles per day. It describes technologies like natural language processing, named entity recognition, and information extraction that can filter and extract structured data like numeric values from text. However, it notes that fully exploiting this data is hindered by resistance from publishers concerned about copyright and content theft.
ORCID participant meeting May 2011: The digital scholar, identity on the Web ...Gudmundur Thorisson
The document discusses Gudmundur Thorisson's involvement with ORCID and related projects. It describes ongoing and planned genetic research data publication projects that incorporate ORCID to help address challenges around name ambiguity and attribution. Specifically, it outlines projects using ORCID to provide publication credit and unique identifiers for data deposits in Cafe Variome and nanopublications in GWAS Central. It also discusses how ORCID could help aggregate a digital scholar's various online identities and contributions across publications, data, code, and other research objects.
The document discusses incentivizing data sharing by treating data like publications. It proposes a system where researchers can publish datasets online, receive digital object identifiers (DOIs) for datasets, and have their ORCID researcher identifiers linked to the DOIs. This would allow researchers to be unambiguously attributed to the datasets they generate and provide metrics like the number of times their datasets are cited, incentivizing data sharing similar to how the current publication system works.
Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...Gudmundur Thorisson
The document discusses integrating ORCID researcher identifiers with data publication to provide incentives for data sharing. It describes two of the author's data publication projects: a disease genetics data project and a project called Cafe Variome that facilitates the exchange of genetic data between diagnostic laboratories and databases. The author argues that treating data as publications that are cited and attributed to their creators, such as through assigning DOIs and linking to ORCID IDs, can help address challenges around data sharing by incentivizing researchers.
G.A. Thorisson presents on the collaborative project between VIVO and ORCID to address challenges in author identification and attribution. The document discusses problems with name ambiguity and the need for unique researcher identifiers. ORCID aims to assign persistent identifiers to individual researchers to disambiguate names and track author contributions. The collaborative project between VIVO and ORCID involves evaluating how the two systems can interact technically by identifying overlaps in capabilities, reusing software components, and developing extensions to better integrate researcher profiles and publication data.
Identity in research data publication - meeting with SageCite people march2011Gudmundur Thorisson
The document discusses the problem of non-unique author names in scholarly literature. Approximately two-thirds of the 6 million authors in MEDLINE have names that are ambiguous. It introduces ORCID as a solution to provide unique identifiers for authors and contributors to automatically disambiguate names and accurately attribute publications. ORCID assigns persistent digital identifiers to individuals and links author names to research works, facilitating credit and recognition of contributions.
Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?
High throughput mining of the scholarly literature: journals and thesespetermurrayrust
The document discusses how machines can be used to analyze and extract information from the huge volume of scholarly literature produced each day, which now amounts to over 10,000 articles per day. It describes technologies like natural language processing, named entity recognition, and information extraction that can filter and extract structured data like numeric values from text. However, it notes that fully exploiting this data is hindered by resistance from publishers concerned about copyright and content theft.
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
This document summarizes Peter Murray-Rust's work on developing software to extract structured data and information from scientific documents. It discusses tools to extract data from text, tables, images, computational logs, and more. It provides examples of extracting chemical information, disease and species data, and phylogenetic trees from figures. The goal is to liberate scientific data locked up in unstructured documents to enable new discoveries.
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
The presentation explores the trend towards a scholarly communication system that is friendly to machines. It presents 3 exhibits illustrating the trend and 1 exhibit illustrating inertia in the system. It makes the point that machine-actionability can be much easier achieved if content and metadata are available in Open Access and under a permissive Creative Commons license. It also observes that even with content and metadata openly available, new costs related to advanced tools to explore the scholarly record will emerge. Finally, it points at significant challenges regarding the persistence of the scholarly record in light of increasingly interconnected and actionable content and advanced tools to interact with it.
The slides were used for a plenary presentation at the LIBER 2011 Conference in Barcelona, Spain, on June 30 2011.
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
This document summarizes a presentation on open science and open data. It discusses the importance of open research data for reproducibility and innovation. It outlines key policy developments promoting open data, including funder data policies and journal data policies. It also describes CODATA's activities related to data policies, frameworks for developing open data strategies, and components of the international open science ecosystem.
Automatic Extraction of Knowledge from the LiteratureTheContentMine
Published on May 11, 2016 by PMR
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
Keynote presentation at the iConference 2015, Newport Beach, Los Angeles, 26 March 2015.
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
http://ischools.org/the-iconference/
BEWARE: presentation includes hidden slides AND in situ build animations - best viewed by downloading.
The role of biodiversity informatics in GBIF, 2021-05-18Dag Endresen
The document discusses the role of biodiversity informatics and the Global Biodiversity Information Facility (GBIF) in making biodiversity data available through open access. GBIF provides free and open access to over 1.6 billion species occurrence records from over 1600 data publishers. The document highlights how digitizing natural history collections and integrating diverse biodiversity data sources can support research and policy goals. It emphasizes best practices like using common data standards, publishing datasets on GBIF to make them widely discoverable and reusable, and citing data with DOIs to incentivize open data sharing.
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
Keynote at JISC Digifest 2015 on Reproducibility and Research Objects in Scholarly Communication
Includes hidden slides
All material except maybe the IT Crowd screengrab reusable
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
Published on Mar 4, 2016 by PMR
Text and data mining (TDM) techniques can be applied to a wide range of materials, from published research papers, books and theses, to cultural heritage materials, digitised collections, administrative and management reports and documentation, etc. Use cases include academic research, resource discovery and business intelligence.
This workshop will show the value and benefits of TDM techniques and demonstrate how ContentMine aims to liberate 100,000,000 facts from the scientific literature, and ContentMine will provide a hands on demo on a topical and accessible scientific/medical subject.
Lecture for a course at NTNU, 27th January 2021
CC-BY 4.0 Dag Endresen https://orcid.org/0000-0002-2352-5497
See also http://bit.ly/biodiversityinformatics
https://www.gbif.no/events/2021/lecture-ntnu-gbif.html
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
This document discusses the growing reproducibility crisis in scientific research and proposes open data and transparent methods as solutions. It notes several studies finding a lack of reproducibility in published research due to inaccessible data and methods. Consequences of this include a large and growing number of retractions as well as perceptions that some regions have higher rates of fraudulent research due to lack of transparency. The document argues that open data, software and peer review can help address these issues by enabling credit for sharing and reusing research objects. Examples of initiatives that aim to reward open practices and improve reproducibility through open data publishing and peer review are also provided.
Published on Feb 29, 2016 by PMR
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14Dag Endresen
GBIF data publishing seminar at the Department for Biology at the University of Bergen. http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
This document discusses proteomics repositories and data sharing in proteomics. It describes the types of information stored in MS proteomics repositories, including raw data, identification results, quantification, and metadata. It outlines several main repositories, distinguishing between those that do not reprocess data, like PRIDE and MassIVE, and those that do reprocess data through a standardized pipeline, like PeptideAtlas and GPMDB. It also discusses resources focused on drafts of the human proteome, such as proteomicsDB and the Human Proteome Map. Overall, the document provides an overview of existing proteomics repositories and issues around data sharing in the field.
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
Published on Mar 16, 2016 by PMR
A plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
BRIF workshop Toulouse 2012 ORCID intro and status updateGudmundur Thorisson
This document discusses ORCID (Open Researcher and Contributor ID), an organization that aims to solve the problem of name ambiguity in scholarly research by assigning unique identifiers to individual researchers. ORCID has recently launched a live service where researchers can register for a free ORCID iD and begin managing their profile and research contributions. The document outlines several ways ORCID identifiers could be integrated by research institutions, publishers, and other organizations to streamline author attribution and research management processes.
This document discusses open access to scientific research data. It notes that scientific research is increasingly data-driven and large-scale, especially in fields like high-energy physics, astronomy, and biology. However, inadequate access to research data is a problem, limiting opportunities to reuse data and validate or build upon past findings. The document examines some incentive-based approaches and key developments related to improving data sharing. It provides examples of large-scale data generation projects and challenges around managing and analyzing big data. Overall, the document argues that unrestricted sharing of scientific data deposited in the public domain could accelerate research and advance knowledge.
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
This document summarizes Peter Murray-Rust's work on developing software to extract structured data and information from scientific documents. It discusses tools to extract data from text, tables, images, computational logs, and more. It provides examples of extracting chemical information, disease and species data, and phylogenetic trees from figures. The goal is to liberate scientific data locked up in unstructured documents to enable new discoveries.
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
The presentation explores the trend towards a scholarly communication system that is friendly to machines. It presents 3 exhibits illustrating the trend and 1 exhibit illustrating inertia in the system. It makes the point that machine-actionability can be much easier achieved if content and metadata are available in Open Access and under a permissive Creative Commons license. It also observes that even with content and metadata openly available, new costs related to advanced tools to explore the scholarly record will emerge. Finally, it points at significant challenges regarding the persistence of the scholarly record in light of increasingly interconnected and actionable content and advanced tools to interact with it.
The slides were used for a plenary presentation at the LIBER 2011 Conference in Barcelona, Spain, on June 30 2011.
Published on Jan 29, 2016 by PMR
Keynote talk to LEARN (LERU/H2020 project) for research data management. Emphasizes that problems are cultural not technical. Promotes modern approaches such as Git / continuous Integration, announces DAT. Asserts that the Right to Read in the Right to Mine. Calls for widespread development of content mining (TDM)
This document summarizes a presentation on open science and open data. It discusses the importance of open research data for reproducibility and innovation. It outlines key policy developments promoting open data, including funder data policies and journal data policies. It also describes CODATA's activities related to data policies, frameworks for developing open data strategies, and components of the international open science ecosystem.
Automatic Extraction of Knowledge from the LiteratureTheContentMine
Published on May 11, 2016 by PMR
ContentMine tools (and the Harvest alliance) can be used to search the literature for knowledge, especially in biomedicine. All tools are Open and shortly we shall be indexing the complete daily scholarly literature
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
Keynote presentation at the iConference 2015, Newport Beach, Los Angeles, 26 March 2015.
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
http://ischools.org/the-iconference/
BEWARE: presentation includes hidden slides AND in situ build animations - best viewed by downloading.
The role of biodiversity informatics in GBIF, 2021-05-18Dag Endresen
The document discusses the role of biodiversity informatics and the Global Biodiversity Information Facility (GBIF) in making biodiversity data available through open access. GBIF provides free and open access to over 1.6 billion species occurrence records from over 1600 data publishers. The document highlights how digitizing natural history collections and integrating diverse biodiversity data sources can support research and policy goals. It emphasizes best practices like using common data standards, publishing datasets on GBIF to make them widely discoverable and reusable, and citing data with DOIs to incentivize open data sharing.
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
Keynote at JISC Digifest 2015 on Reproducibility and Research Objects in Scholarly Communication
Includes hidden slides
All material except maybe the IT Crowd screengrab reusable
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
Published on Mar 4, 2016 by PMR
Text and data mining (TDM) techniques can be applied to a wide range of materials, from published research papers, books and theses, to cultural heritage materials, digitised collections, administrative and management reports and documentation, etc. Use cases include academic research, resource discovery and business intelligence.
This workshop will show the value and benefits of TDM techniques and demonstrate how ContentMine aims to liberate 100,000,000 facts from the scientific literature, and ContentMine will provide a hands on demo on a topical and accessible scientific/medical subject.
Lecture for a course at NTNU, 27th January 2021
CC-BY 4.0 Dag Endresen https://orcid.org/0000-0002-2352-5497
See also http://bit.ly/biodiversityinformatics
https://www.gbif.no/events/2021/lecture-ntnu-gbif.html
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
This document discusses the growing reproducibility crisis in scientific research and proposes open data and transparent methods as solutions. It notes several studies finding a lack of reproducibility in published research due to inaccessible data and methods. Consequences of this include a large and growing number of retractions as well as perceptions that some regions have higher rates of fraudulent research due to lack of transparency. The document argues that open data, software and peer review can help address these issues by enabling credit for sharing and reusing research objects. Examples of initiatives that aim to reward open practices and improve reproducibility through open data publishing and peer review are also provided.
Published on Feb 29, 2016 by PMR
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14Dag Endresen
GBIF data publishing seminar at the Department for Biology at the University of Bergen. http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
This document discusses proteomics repositories and data sharing in proteomics. It describes the types of information stored in MS proteomics repositories, including raw data, identification results, quantification, and metadata. It outlines several main repositories, distinguishing between those that do not reprocess data, like PRIDE and MassIVE, and those that do reprocess data through a standardized pipeline, like PeptideAtlas and GPMDB. It also discusses resources focused on drafts of the human proteome, such as proteomicsDB and the Human Proteome Map. Overall, the document provides an overview of existing proteomics repositories and issues around data sharing in the field.
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
Published on Mar 16, 2016 by PMR
A plenary lecture to Cochrane Collaboration in Birmingham, on the value of automatically extracting knowledge. Covers the Why? How? What? Who? and problems and invites collaboration
BRIF workshop Toulouse 2012 ORCID intro and status updateGudmundur Thorisson
This document discusses ORCID (Open Researcher and Contributor ID), an organization that aims to solve the problem of name ambiguity in scholarly research by assigning unique identifiers to individual researchers. ORCID has recently launched a live service where researchers can register for a free ORCID iD and begin managing their profile and research contributions. The document outlines several ways ORCID identifiers could be integrated by research institutions, publishers, and other organizations to streamline author attribution and research management processes.
This document discusses open access to scientific research data. It notes that scientific research is increasingly data-driven and large-scale, especially in fields like high-energy physics, astronomy, and biology. However, inadequate access to research data is a problem, limiting opportunities to reuse data and validate or build upon past findings. The document examines some incentive-based approaches and key developments related to improving data sharing. It provides examples of large-scale data generation projects and challenges around managing and analyzing big data. Overall, the document argues that unrestricted sharing of scientific data deposited in the public domain could accelerate research and advance knowledge.
This chapter discusses the complex issue of determining an animal's moral status and how acceptable it is to treat animals as commodities. The author, a biologist, performs research on nematode worms that seems morally acceptable but may not be if done on vertebrates, as concern for animal welfare increases with neurological complexity and potential for suffering. While research on higher animals isn't prohibited, respect for the subject should always be maintained.
Flickr.com: More than Pretty Pictures (updated for GWA2010)Kim Kruse
This is a introduction to the benefits of Flickr.com, a low-cost and easy to use online photo management system. Flickr provides a great way to store and find photos, as well as tap into the powers of social media.
RDAP13 Lorrie Johnson: Facilitating Access to Scientific DataASIS&T
The document summarizes the role and activities of the Office of Scientific and Technical Information (OSTI) within the U.S. Department of Energy. OSTI ensures public access to research results from DOE programs by collecting, preserving, and providing access to scientific and technical information through multiple online outlets. It also works to facilitate global sharing of research results by integrating databases through search portals like Science.gov and WorldWideScience.org. OSTI assigns digital object identifiers and metadata to datasets through its partnership with DataCite to improve discovery and citation of datasets.
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
Presented in session 48 - Sharing of sensitive data - presented by Fiona Nielsen on September 12, 2016 at #SciDataCon http://scidatacon.org
We have addressed the most pressing problem for public genomic data, that of data discoverability, by indexing worldwide resources for genomic research data on an online platform (repositive.io) providing a single point of entry to find and access available genomic research data.
http://www.scidatacon.org/2016/sessions/48/paper/26/
http://www.scidatacon.org/2016/sessions/48/
International data week - #RDAPlenary #IDW2016
https://www.youtube.com/watch?v=5YqAH3f9LiU
Digital Transformation is a key goal of many large and small companies, as well as of most research institutes today. However, a key prerequisite and enabler of digital transformation is computational accessibility and interoperability of data, as laid out in the FAIR Data principles. The Hyve has been involved in the FAIR Data movement since the start, and for this webinar, our CEO Kees van Bochove will be talking to a very special guest, Ruben Kok, director of DTL. DTL and its predecessor NBIC, as well as ‘spinoff’ GO-FAIR have spent an enormous amount of effort in the past years on outreach, training, tools and community building around the FAIR Data Principles. Where do we stand today? What can we expect to see in the coming years for FAIR and FAIR biomedical data (e.g. Personal Health Train) in particular?
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
Scott Edmunds talk at the HUPO congress in Geneva, September 6th 2011 on GigaScience - a journal or a database? Lessons learned from the Genomics Tsunami.
This document discusses data citation mechanisms and services for primary biodiversity data. It outlines the need for data citation to provide recognition for data producers and publishers. An ideal data citation framework would address social, technical, and policy issues to incentivize all stakeholders. Core technical components would include persistent identifiers, a data citation mechanism, and a data usage index. The document reviews the history calling for data citation standards and proposes requirements for an effective data citation model, including attributing roles across data production and publication. It also examines challenges in developing data citation practices.
This document summarizes the state of open research data by outlining its evolution over time. It begins with centralized data centers in the 1960s and progresses to more collaborative models of data sharing through community agreements and online supplementary materials. The benefits of open data are discussed, including increased reproducibility and citation advantages for authors who share. While open data is ideal, achieving 3-star open standards according to the 5 star scheme is currently realistic. The future may bring stricter funding and publishing requirements to encourage more widespread data sharing.
This document summarizes the state of open research data by outlining its evolution over time. It begins with centralized data centers in the 1960s and progresses to more collaborative models of data sharing through community agreements and online supplementary materials. The benefits of open data are discussed, including increased reproducibility and citation advantages for authors who share. While open data is ideal, achieving 3-star open standards according to the 5 star scheme is currently realistic. The future may bring stricter funding and publishing requirements to encourage more widespread data sharing.
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
Seminar Presentation for PMB Department, UC Berkeley for Love Data Week. Subject is how to prepare publications and associated data sets for maximum reuse.
This presentation sets out some of the challenges around citing and identifying datasets and introduces DataCite, the international data citation initiative. DataCite was founded on 1-December 2009 to support researchers by
providing methods for them to locate, identify, and cite
research datasets with confidence.
This presentation was given by Adam Farquhar at the STM Publishers Association Innovation Conference on 4-Dec-2009.
This presentation sets out some of the challenges around citing and identifying datasets and introduces DataCite, the international data citation initiative. DataCite was founded on 1-December 2009 to support researchers by
providing methods for them to locate, identify, and cite
research datasets with confidence.
This presentation was given by Adam Farquhar at the STM Publishers Association Innovation Conference on 4-Dec-2009.
I gave this presentation to the STM Publishers Association Innovation Conference in London, 4-December-2009. It frames the data citation problem and introduces DataCite - the international data citation initiative.
This presentation sets out some of the challenges around citing and identifying datasets and introduces DataCite, the international data citation initiative. DataCite was founded on 1-December 2009 to support researchers by
providing methods for them to locate, identify, and cite
research datasets with confidence.
This presentation was given by Adam Farquhar at the STM Publishers Association Innovation Conference on 4-Dec-2009.
1. Data mining is the process of finding patterns and correlations within large data sets to identify relationships between data. It allows organizations to predict trends and is used to build disease risk models and detect fraud.
2. The main difference between a database and data warehouse is that a database stores application data while a data warehouse stores historical and cumulative data for analysis. Data mining analyzes patterns in data while data warehousing extracts and stores data.
3. Important areas of research in bioinformatics using data mining and warehousing include sequence analysis, genome annotation, gene and protein expression analysis, cancer mutation analysis, protein structure prediction, and comparative genomics.
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
Presentation for the NFAIS Webinar series: Open Data Fostering Open Science: Meeting Researchers' Needs
http://www.nfais.org/index.php?option=com_mc&view=mc&mcid=72&eventId=508850&orgId=nfais
I o dav data workshop prof wafula final 19.9.17Tom Nyongesa
The document summarizes an iODaV Data Workshop held at JKUAT in Kenya on open data and the JORD policy. It discusses why open data is important for reproducibility, innovation and scientific discovery. It outlines the FAIR principles for open data and metadata to make data findable, accessible, interoperable and reusable. It also discusses opportunities and challenges of open data for universities, including developing skills and infrastructure. Finally, it provides examples of open data initiatives at JKUAT including developing an open data policy, the iODaV program, contributions to national ICT policies, and the digital health applied research centre.
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls
The document discusses data sharing policies and mandates from various organizations including federal funding agencies in the US and internationally, journals, and a paradigm shift toward more transparent and collaborative research that integrates publications and data. Key points include requirements for data management plans from NIH and NSF, expectations of funding agencies in other countries to maximize access to research data, a journal policy requiring data to be made available, and challenges around measuring the impact of shared data given the lack of common practices and standards for citing data.
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
This video illustrates how certified digital repositories contribute to making and keeping research data findable, accessible, interoperable and reusable (FAIR). Trustworthy repositories support Open Access to data, as well as Restricted Access when necessary, and they offer support for metadata, sustainable and interoperable file formats, and persistent identifiers for future citation. Presented by Marjan Grootveld (DANS, OpenAIRE).
Main references
• Core Trust Seal for trustworthy digital repositories: https://www.coretrustseal.org/
• EUDAT FAIR checklist: https://doi.org/10.5281/zenodo.1065991
• European Commission’s Guidelines on FAIR data management: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• FAIR data principles: www.force11.org/group/fairgroup/fairprinciples
• Overview of metadata standards and tools: https://rdamsc.dcc.ac.uk/
Similar to DataCite workshop at BL April 2011 (20)
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiersGudmundur Thorisson
This document summarizes a presentation about connecting identifiers like ORCID and DOIs to link researchers and their works. It describes prototypes created by the ODIN project, including a DataCite2ORCID tool that allows users to search DataCite metadata, find their works, and add them to their ORCID profile with a click. The presentation discusses challenges in linking heterogeneous metadata and next steps to capture contributor-work relationships and align with community standards.
This document describes a demonstration project linking ORCID identifiers and DataCite identifiers called ODIN. The project aims to connect researchers and datasets via persistent identifiers. It is a 2 year EC-funded project with 7 partners. A proof-of-concept tool was developed that allows researchers to claim datasets in their ORCID profile by searching and linking from CrossRef and DataCite metadata. The tool demonstrates prospective linking of ORCIDs in data workflows as well as retrospective claiming of published datasets.
The document discusses the Open Researcher & Contributor ID (ORCID) initiative. ORCID aims to solve the problem of ambiguous author attribution in scholarly works by assigning unique identifiers to individual researchers. It outlines how ambiguous names and the increasing number of authors per work have broken the current scholarly attribution system. ORCID launched in 2009 with support from research institutions, publishers, and organizations to create a central registry of researcher profiles linked to contributions. The document promotes the benefits of ORCID for reliable author identification and attribution across the scholarly community.
The document discusses the use of digital identifiers to identify bioresources. It provides background on digital identifiers and their importance for tracking use and impact. It discusses use cases for identifying different types of resources like datasets, databases, and projects. Key challenges include getting authors to use appropriate identifiers and a lack of solutions for some resource types like physical samples. Next steps include recommendations for identifier use and exploring identification schemes for clinical studies and trials.
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?Gudmundur Thorisson
This document summarizes the launch and status of the ORCID system for uniquely identifying academic authors. It notes that the ORCID service is now live but still has some bugs and missing features. It encourages researchers to register for an ORCID identifier and integrators like publishers and organizations to begin using the public and members APIs to integrate ORCID into their systems. Finally, it discusses challenges around encouraging broader adoption, including by smaller organizations, and efforts with the ORCID and DataCite Interoperability Network project.
TNC2012 Federated and scholarly identity - match made in heaven?Gudmundur Thorisson
This document discusses federated identity and scholarly identity. It provides an overview of scholarly identity and challenges related to name ambiguity and fragmented online identities. It then describes the Open Researcher & Contributor ID (ORCID) initiative, which aims to provide unique identifiers for researchers and link them unambiguously to their works. ORCID currently has over 300 participating organizations and is working to support the creation of a clear record of scholarly contributions through unique identifiers. Examples of how ORCID could enable knowledge discovery by linking contributors to their works are also provided.
This document proposes collaborating with the BioDBCore initiative to standardize the registration and description of biological databases. It identifies challenges in uniquely identifying databases due to unstable URLs. The proposal suggests adopting the MIRIAM registry's persistent identifiers to decouple identification from location. Benefits include globally identifying life science databases, improved discovery of relevant resources, and potential for BioDBCore to evolve into a database publishing platform. Open questions remain regarding technical details and integrating existing database lists.
GEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developmentsGudmundur Thorisson
This document summarizes updates on identity initiatives including ORCID and contributions tracking tools for Drupal websites. ORCID is developing an API to allow unique identification of scholarly authors and tracking of author-publication links. An IRISC workshop discussed challenges around unambiguous author identification and opportunities for ORCID and identity federations to collaborate. The document also describes plans to develop a Drupal module to enhance tracking of content contributions and link local user accounts to ORCID profiles.
VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...Gudmundur Thorisson
A major challenge facing VIVO is the retrieval of published works associated with specific authors from participating institutions, and automated disambiguation & identification of authors and scholarly works. VIVO thus shares many of the same goals as the Open Researcher and Contributor ID not-for-profit organization (ORCID: http://www.orcid.org). ORCID is working to solve the long-standing name ambiguity problem in scholarly communication globally, not only for researchers affiliated with academic institutions, but for contributors to scholarly works of all kinds. The aim of this mini-grant collaborative project is to explore how VIVO and ORCID could interact in the scholarly identity ecosystem, by way of small-scale implementation work and technology evaluation&review. The presentation will provide a brief introduction to ORCID and a background to the project, summarize the technical development undertaken thus far and outline the work remaining, and discuss some possilities for future work beyond this specific short-term project.
sameAs London May 2011: The digital scholar, identity on the Web and ORCIDGudmundur Thorisson
The document discusses the challenges of identity fragmentation for digital scholars and how ORCID aims to address this issue. ORCID seeks to provide a single global registry of researcher identifiers that can be used to attribute contributions across publications, datasets, software, and other research outputs. This would help address problems like a lack of incentives for data sharing by allowing all contributions to be properly attributed and credited. The document outlines several potential use cases for how ORCID could aggregate different aspects of a researcher's identity and online presence.
The document discusses Gudmundur Thorisson's work with ORCID and JISC MRD projects. ORCID is working to create a global registry of researcher identifiers to help disambiguate author names and attribute contributions. This will help link researchers to their work more accurately. The registry will be open, free for researchers to use, and follow open principles. JISC MRD projects could benefit from ORCID's efforts to better attribute researchers and incentivize data sharing.
1. G. A. Thorisson, University of Leicester
DOIs for genetic research data
identifying datasets and data contributors
Gudmundur ‘Mummi’ Thorisson
Twitter: @gthorisson
<gt50@le.ac.uk>
Department of Genetics, University of Leicester
ORCID - http://www.orcid.org
GEN2PHEN - http://www.gen2phen.org
-- Outline --
• Cafe RouGE - central clearinghouse for genetic variation data
• ORCID - tackling the contributor identification challenge
This work can be freely copied, redistributed and
adapted, as long as proper attribution is given
DataCite workshop 4 April 2011 1
Monday, 4 April 2011
2. G. A. Thorisson, University of Leicester
DataCite workshop 4 April 2011 2
Monday, 4 April 2011
3. G. A. Thorisson, University of Leicester
Cafe RouGE - Café for Routine Genetic data Exchange
1. Diagnostic 2. Central 3. End-users (e.g.
laboratories ‘clearinghouse’ LSDB curators)
Publish data Retrieve Atom feeds
Submi&ng
muta,ons
from
diagnos,c
labs
using
“Café
RouGE
enabled”
so<ware
via
simple
bu@on
click Data
are
shared
with
diverse
3rd
par,es
via
manual
retrieval
or
automated
feed-‐
based
monitoring/retrieval
DataCite workshop 4 April 2011 3
Monday, 4 April 2011
4. G. A. Thorisson, University of Leicester
Aim:publication credit for Cafe submissions
• Journal publication:
Thorisson, G.A. Accreditation and attribution in data sharing. Nature Biotechnology 27,
984 - 985 (2009) doi:10.1038/nbt1109-984b
=> http://www.nature.com/nbt/journal/v27/n11/full/nbt1109-984b.html
• Data publication:
G. A. Thorisson. 4x variants in BRCA2 gene. Published online via Cafe RouGE. 21
January (2011) doi:10.1255/caferouge.BRCA2-2352354
=> http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354
DataCite workshop 4 April 2011 4
Monday, 4 April 2011
5. G. A. Thorisson, University of Leicester
Data
shared
with
diverse
3rd
par6es
and
data
usage/cita6on
tracked
via
DOI
Submission
from
diag.
lab
✔
DOI
assigned
to
incoming
data
upload
Cafe
RouGE
Depot
dbSNP
(coding)
UniProt
PhenCode
Already
stable
IDs
so
no
DOI
assigned
AAribu6on
given
to
data
submiAers
via
ORCID
unique
iden6fier
Metadata
describing
varia6on
data
published
elsewhere
DataCite workshop 4 April 2011
Monday, 4 April 2011 http://
6. G. A. Thorisson, University of Leicester
Data
shared
with
diverse
3rd
par6es
and
data
usage/cita6on
tracked
via
DOI
Submission
from
diag.
lab
✔
DOI
assigned
to
incoming
data
upload
Cafe
RouGE
Depot
dbSNP
(coding)
UniProt
PhenCode
Already
stable
IDs
so
no
DOI
assigned
AAribu6on
given
to
data
submiAers
via
ORCID
unique
iden6fier
Metadata
describing
varia6on
data
published
elsewhere
DataCite workshop 4 April 2011
Monday, 4 April 2011 http://
7. G. A. Thorisson, University of Leicester
The ORCID initiative - tackling contributor
identification challenges
?
ORCID
ORCID ID: B-1242-2010 F67572010
G. Thorisson, Univ. Leicester
G. A. Thorisson, Univ. Leicester
G. A. Thorisson, Cold Spring Harbor Lab.
ORCID ID: G-1442-2009
J. Smith, Univ. North Pole
ORCID ID: D-2400-2010
J. Smith, Luthor Corporation
Global registry of disambiguated IDs for contributors:
i) UI for researchers to manage & use their ORCID ID
ii) facilitate linking content creators with their work (attribution links)
iii) interact with other systems (publishers, digital libraries, universities etc)
DataCite workshop 4 April 2011 6
Monday, 4 April 2011
8. G. A. Thorisson, University of Leicester
Key use case: submitting manuscript to journal
Attribution statement deposited into ORCID:
• A-883-2010 <created> 10.4259/psycho-review.gbilder2010.Foobar
DataCite workshop 4 April 2011
Monday, 4 April 2011
9. G. A. Thorisson, University of Leicester
DataCite workshop 4 April 2011 8
Monday, 4 April 2011
10. G. A. Thorisson, University of Leicester
tio ns
a ni za rg
O rg id .o
2 0 0 .o rc
> w w
:/ /w
ht tp
DataCite workshop 4 April 2011 8
Monday, 4 April 2011
11. G. A. Thorisson, University of Leicester
Attributing other types of ‘published work’
G. Thorisson, Univ. Leicester
gthorisson@gmail.com
ORCID ID: A-883-2010
• Thorisson, G. (A-883-2010), Bilder, G.W. (C-035-2009) and Fenner, M. (A-101-2010).
Icelandic 9th century viking bowl. Psychoceramics Archive. Sep 2 2010.
doi:10.4259/psycho.5gtpq-thorisson
DataCite workshop 4 April 2011 9
Monday, 4 April 2011
12. G. A. Thorisson, University of Leicester
Attributing other types of ‘published work’
G. Thorisson, Univ. Leicester
gthorisson@gmail.com
ORCID ID: A-883-2010
• Thorisson, G. (A-883-2010), Bilder, G.W. (C-035-2009) and Fenner, M. (A-101-2010).
Icelandic 9th century viking bowl. Psychoceramics Archive. Sep 2 2010.
doi:10.4259/psycho.5gtpq-thorisson
• A-883-2010 <created> 10.4259/psycho-image5gtpq-thorisson
• C-035-2009 <created> 10.4259/psycho-image5gtpq-thorisson
• A-101-2010 <created> 10.4259/psycho-image5gtpq-thorisson
DataCite workshop 4 April 2011 9
Monday, 4 April 2011
13. G. A. Thorisson, University of Leicester
• Some DOI issues for discussion
– Granularity & versioning
• citing datasets vs citing collections of datasets (including databases)
• organization - datasets within databases, aggregation
• databases =~ journals, BUT changing resources?
• DOI identifies a work which may be a new version of a previous work
– When is it “appropriate” to assign DOIs?
• original datasets published via the online repository
• derived datasets resulting from analysis of primary data
• republished datasets acquired from elsewhere?
– Identifier structure: desire for branding vs opacity (10.163/caferouge.325fff5)
• Contributor recognition: ORCID IDs for data creators / contributors
– ORCID will support datasets registered via DataCite
• Early adopter outreach - summer/autumn 2011
– DataCite metadata schema: where do contributor IDs fit in? (Creator field)
DataCite workshop 4 April 2011 10
Monday, 4 April 2011
14. G. A. Thorisson, University of Leicester
Coming autumn 2011, to a venue near you!
Int’l workshop on researcher identity
• Co-organized by CSC (Finland IT Centre for Science)
• Provisional title: “Identity in research infrastructure and
scientific communication" - IRISC
• Location: Helsinki
• Dates: September 12-13
DataCite workshop 4 April 2011 11
Monday, 4 April 2011
15. G. A. Thorisson, University of Leicester
Acknowledgements
GEN2PHEN Consortium This work has received funding from
the European Community's Seventh
http://www.gen2phen.org/about-gen2phen/partners
Framework Programme
(FP7/2007-2013)
under grant agreement number
200754 - the GEN2PHEN project.
Anthony J. Brookes Bioinformatics Group
Contact me!
Gudmundur ‘Mummi’ Thorisson
<gt50@le.ac.uk> |<gthorisson@gmail.com>
http://friendfeed.com/mummit
http://www.linkedin.com/in/mummi
http://www.twitter.com/gthorisson
DataCite workshop 4 April 2011 12
Monday, 4 April 2011