Metadata as Linked Data for Research Data Repositoriesandrea huang
“Every man has his own cosmology and who can say that his own is right.” said by Einstein. This is also true when we come to understand data semantics that one data may be different interpreted by different data creators, curators and re-users. Then, how do we build a better research data repository?
We start with the point made by Willis, C., Greenberg, J., & White, H. (2012) that the metadata of research data increases the access to and reuse of the data. And Stanford, Harvard, and Cornell believe the use of linked data technologies is a promising method to gather contextual information about research resources.
To look for inspiration tools that can meet the urgent needs of innovative solutions providing feature-rich services for helping data publishing such as visualization, validation & reuse in different applications by research repositories (Assante, et.al, 2016), the CKAN (Comprehensive Knowledge Archive Network) as a major solution that makes linked metadata available, citable, and validated becomes our first choice.
Original file: http://m.odw.tw/u/odw/m/metadata-as-linked-data-for-research-data-repositories/
Giving Credit Where Credit is Due: Author and Funder IDsAndrea Payant
A process to include standardized funder and author identifiers into institutional repository and ILS records which are associated with funded research data
Linking Scientific Metadata (presented at DC2010)Jian Qin
Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
Slides from Friday 3rd August - Data in the Scholarly Communications Life Cycle Course which is part of the FORCE11 Scholarly Communications Institute.
Presenter - Natasha Simons
Metadata as Linked Data for Research Data Repositoriesandrea huang
“Every man has his own cosmology and who can say that his own is right.” said by Einstein. This is also true when we come to understand data semantics that one data may be different interpreted by different data creators, curators and re-users. Then, how do we build a better research data repository?
We start with the point made by Willis, C., Greenberg, J., & White, H. (2012) that the metadata of research data increases the access to and reuse of the data. And Stanford, Harvard, and Cornell believe the use of linked data technologies is a promising method to gather contextual information about research resources.
To look for inspiration tools that can meet the urgent needs of innovative solutions providing feature-rich services for helping data publishing such as visualization, validation & reuse in different applications by research repositories (Assante, et.al, 2016), the CKAN (Comprehensive Knowledge Archive Network) as a major solution that makes linked metadata available, citable, and validated becomes our first choice.
Original file: http://m.odw.tw/u/odw/m/metadata-as-linked-data-for-research-data-repositories/
Giving Credit Where Credit is Due: Author and Funder IDsAndrea Payant
A process to include standardized funder and author identifiers into institutional repository and ILS records which are associated with funded research data
Linking Scientific Metadata (presented at DC2010)Jian Qin
Linked entity data in metadata records builds a foundation for semantic web. Even though metadata records contain rich entity data, there is no linking between associated entities such as persons, datasets, projects, publications, or organizations. We conducted a small experiment using the dataset collection from the Hubbard Brook Ecosystem Study (HBES), in which we converted the entities and their relationships into RDF triples and linked the URIs contained in RDF triples to the corresponding entities in the Ecological Metadata Language (EML) records. Through the transformation program written in XML Stylesheet Language (XSL), we turned a plain EML record display into an interlinked semantic web of ecological datasets. The experiment suggests a methodological feasibility in incorporating linked entity data into metadata records. The paper also argues for the need of changing the scientific as well as general metadata paradigm.
As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.
Slides from Friday 3rd August - Data in the Scholarly Communications Life Cycle Course which is part of the FORCE11 Scholarly Communications Institute.
Presenter - Natasha Simons
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation”
Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Andrea Payant
Payant, A., Rozum, B., Woolcott, L. (2016). Mitigating the Risk: Identifying Strategic University Partnerships for Compliance Tracking of Research Data and Publications. International Federation of Library Associations (IFLA) Satellite Conference: Data in Libraries: The Big Picture
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording
of metadata for their interpretation.
The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Presented at COMBINE 2016, Newcastle, 19 September.
http://co.mbine.org/events/COMBINE_2016
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
In this webinar, we gave a general introduction of the dkNET portal and showed how dkNET can be used to address a variety of use cases, including:
1) Find funding sources for your research of interest
2) Determine what study section have reviewed this type of research
3) Help with new NIH guidelines for rigor and reproducibility
Publishing of Scientific Data - Science Foundation Ireland Summit 2010jodischneider
Slides prepared for the Publishing of Scientific Data workshop at the Science Foundation Ireland Summit 2010. I was one of three panelists. We had a lively discussion!
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Reproducible and citable data and models: an introduction.FAIRDOM
Prepared and presented by Carole Goble (University of Manchester), Wolfgang Mueller (HITS), Dagmar Waltermath (University of Rostock), at the Reproducible and Citable Data and Models Workshop, Warnemünde, Germany. September 14th - 16th 2015.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
Talk at WWW2017 on LRMI adoption, quality and usage. Full paper here: http://papers.www2017.com.au.s3-website-ap-southeast-2.amazonaws.com/companion/p283.pdf.
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...DuraSpace
Hot Topics: The DuraSpace Community Webinar Series
Series 11: Integrating ORCID Persistent Identifiers with DSpace, Fedora and VIVO
Webinar 2: “Hydra: many heads, many connections. Enriching Fedora Repositories with ORCID.”
Thursday, April 2, 2015
Curated by Josh Brown, ORCID
Presented by: Laura Paglione, Technical Director, ORCID and Rick Johnson, Head of Digital Library Services, University of Notre Dame
This presentation was provided by Anne Washington of the University of Houston during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
The NIDDK Information Network (dkNET; http://dknet.org) is a open community resource for basic and clinical investigators in metabolic, digestive and kidney disease. dkNET’s portal facilitates access to a collection of diverse research resources (i.e. the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain) that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). This webinar was presented by dkNET principle investigator Dr. Jeffrey Grethe.
An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.
Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.
Data Wrangling in SQL & Other Tools :: Data Wranglers DC :: June 4, 2014Ryan B Harvey, CSDP, CSM
I gave a talk on the basics of SQL and its utility for data preprocessing and analysis tasks to the Data Wrangers DC meetup group, a member meetup of the Data Community DC (http://datacommunitydc.org).
The talk covered an introduction to relational data, database tools, and the SQL standard, as well as the basics of SQL select statements, common table expressions and creating views from select statements. In addition, the use of relevant libraries in R and Python to connect to data in relational databases were explained using examples with PostgreSQL, IPython notebooks, and RMarkdown.
Talk information: http://www.meetup.com/Data-Wranglers-DC/events/171768162/
Talk materials: https://github.com/nihonjinrxs/dwdc-june2014
Data “publication” attempts to appropriate for data the prestige of publication in the scholarly literature. While the scholarly communication community substantially endorses the idea, it hasn’t fully resolved what a data publication should look like or how data peer review should work. To contribute an important and neglected perspective on these issues, we surveyed ~250 researchers across the sciences and social sciences, asking what expectations “data publication” raises and what features would be useful to evaluate the trustworthiness and impact of a data publication and the contribution of its creator(s).
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation”
Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Andrea Payant
Payant, A., Rozum, B., Woolcott, L. (2016). Mitigating the Risk: Identifying Strategic University Partnerships for Compliance Tracking of Research Data and Publications. International Federation of Library Associations (IFLA) Satellite Conference: Data in Libraries: The Big Picture
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording
of metadata for their interpretation.
The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Presented at COMBINE 2016, Newcastle, 19 September.
http://co.mbine.org/events/COMBINE_2016
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
In this webinar, we gave a general introduction of the dkNET portal and showed how dkNET can be used to address a variety of use cases, including:
1) Find funding sources for your research of interest
2) Determine what study section have reviewed this type of research
3) Help with new NIH guidelines for rigor and reproducibility
Publishing of Scientific Data - Science Foundation Ireland Summit 2010jodischneider
Slides prepared for the Publishing of Scientific Data workshop at the Science Foundation Ireland Summit 2010. I was one of three panelists. We had a lively discussion!
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Reproducible and citable data and models: an introduction.FAIRDOM
Prepared and presented by Carole Goble (University of Manchester), Wolfgang Mueller (HITS), Dagmar Waltermath (University of Rostock), at the Reproducible and Citable Data and Models Workshop, Warnemünde, Germany. September 14th - 16th 2015.
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
BioPharma and the broader research community is faced with the challenge of simply finding the appropriate internal and external datasets for downstream analytics, knowledge-generation and collaboration. With datasets as the core asset, we wanted to promote both human and machine exploitability, using web-centric data cataloguing principles as described in the W3C Data on the Web Best Practices. To do so, we adopted DCAT (Data CATalog Vocabulary) and VoID (Vocabulary of Interlinked Datasets) for both RDF and non-RDF datasets at summary, version and distribution levels. Further, we’ve described datasets using a limited set of well-vetted public vocabularies, focused on cross-omics analytes and clinical features of the catalogued datasets.
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
Talk at WWW2017 on LRMI adoption, quality and usage. Full paper here: http://papers.www2017.com.au.s3-website-ap-southeast-2.amazonaws.com/companion/p283.pdf.
4.2.15 Slides, “Hydra: many heads, many connections. Enriching Fedora Reposit...DuraSpace
Hot Topics: The DuraSpace Community Webinar Series
Series 11: Integrating ORCID Persistent Identifiers with DSpace, Fedora and VIVO
Webinar 2: “Hydra: many heads, many connections. Enriching Fedora Repositories with ORCID.”
Thursday, April 2, 2015
Curated by Josh Brown, ORCID
Presented by: Laura Paglione, Technical Director, ORCID and Rick Johnson, Head of Digital Library Services, University of Notre Dame
This presentation was provided by Anne Washington of the University of Houston during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
The NIDDK Information Network (dkNET; http://dknet.org) is a open community resource for basic and clinical investigators in metabolic, digestive and kidney disease. dkNET’s portal facilitates access to a collection of diverse research resources (i.e. the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain) that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). This webinar was presented by dkNET principle investigator Dr. Jeffrey Grethe.
An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.
Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.
Data Wrangling in SQL & Other Tools :: Data Wranglers DC :: June 4, 2014Ryan B Harvey, CSDP, CSM
I gave a talk on the basics of SQL and its utility for data preprocessing and analysis tasks to the Data Wrangers DC meetup group, a member meetup of the Data Community DC (http://datacommunitydc.org).
The talk covered an introduction to relational data, database tools, and the SQL standard, as well as the basics of SQL select statements, common table expressions and creating views from select statements. In addition, the use of relevant libraries in R and Python to connect to data in relational databases were explained using examples with PostgreSQL, IPython notebooks, and RMarkdown.
Talk information: http://www.meetup.com/Data-Wranglers-DC/events/171768162/
Talk materials: https://github.com/nihonjinrxs/dwdc-june2014
Data “publication” attempts to appropriate for data the prestige of publication in the scholarly literature. While the scholarly communication community substantially endorses the idea, it hasn’t fully resolved what a data publication should look like or how data peer review should work. To contribute an important and neglected perspective on these issues, we surveyed ~250 researchers across the sciences and social sciences, asking what expectations “data publication” raises and what features would be useful to evaluate the trustworthiness and impact of a data publication and the contribution of its creator(s).
In early 2014, we asked science and social science researchers...
• What expectations do the terms publication and peer review raise in reference to data?
• What features would be useful to evaluate the trustworthiness, evaluate the impact, and enhance the prestige of a data publication?
To facilitate data sharing from within the University of California system and beyond, the University of California Curation Center (UC3) is developing a new ingest and discovery layer for our data curation service, Dash. Dash uses the Merritt repository for preservation and a self-service overlay layer for submission and discovery of research datasets. The new overlay– dubbed Stash (STore And SHare)– will feature an enhanced user interface with a simple and intuitive deposit workflow, while still accommodating rich metadata. Stash will enable individual scholars to upload data through local file browse or drag-and-drop operation; describe data in terms of scientifically-meaning metadata, including methods, references, and geospatial information; identify datasets for persistent citation and retrieval; preserve and share data in an appropriate repository; and discover, retrieve, and reuse data through faceted search and browse. Stash can be implemented in conjunction with any standards-compliant repository that supports the SWORD protocol for deposit and the OAI-PMH protocol for metadata harvesting. Stash will feature native support for the DataCite or Dublin Core metadata schemas, but is designed to accommodate other schemas to support discipline-specific applications. By alleviating many of the barriers that have historically precluded wider adoption of open data principles, Stash empowers individual scholars to assert active curation control over their research outputs; encourages more widespread data preservation, publication, sharing, and reuse; and promotes open scholarly inquiry and advancement.
EZID: Easy dataset identification & management
Joan Starr, Manager, Strategic and Project Planning and EZID Service Manager, California Digital Library
Data and data curation are assuming a growing role today’s research library. New approaches are needed both to address the resulting challenges and take advantage of the emerging opportunities. Long-term identifiers represent one such tool. In this presentation, Joan Starr will introduce identifiers and an application designed to make them easy to create and manage: EZID. She will provide a closer look at two identifier types: DOIs and ARKs, and discuss what bringing an identifier service to your institution might mean.
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
OpenAIRE Interoperability Workshop (8 Feb. 2013).
DataCite – Bridging the gap and helping to find, access and reuse data – Herbert Gruttemeier, INIST-CNRS
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
Abstract
slides available at: https://zenodo.org/record/7147703#.Y7agoxXP2F4
The Helmholtz Metadata Collaboration aims to make the research data [and software] produced by Helmholtz Centres FAIR for their own and the wider science community by means of metadata enrichment [1]. Why metadata enrichment and why FAIR? Because the whole scientific enterprise depends on a cycle of finding, exchanging, understanding, validating, reproducing), integrating and reusing research entities across a dispersed community of researchers.
Metadata is not just “a love note to the future” [2], it is a love note to today’s collaborators and peers. Moreover, a FAIR Commons must cater for the metadata of all the entities of research – data, software, workflows, protocols, instruments, geo-spatial locations, specimens, samples, people (well as traditional articles) – and their interconnectivity. That is a lot of metadata love notes to manage, bundle up and move around. Notes written in different languages at different times by different folks, produced and hosted by different platforms, yet referring to each other, and building an integrated picture of a multi-part and multi-party investigation. We need a crate!
RO-Crate [3] is an open, community-driven, and lightweight approach to packaging research entities along with their metadata in a machine-readable manner. Following key principles - “just enough” and “developer and legacy friendliness - RO-Crate simplifies the process of making research outputs FAIR while also enhancing research reproducibility and citability. As a self-describing and unbounded “metadata middleware” framework RO-Crate shows that a little bit of packaging goes a long way to realise the goals of FAIR Digital Objects (FDO)[4], and to not just overcome platform diversity but celebrate it while retaining investigation contextual integrity.
In this talk I will present the why, and how Research Object packaging eases Metadata Collaboration using examples in big data and mixed object exchange, mixed object archiving and publishing, mass citation, and reproducibility. Some examples come from the HMC, others from EOSC, USA and Australia, and from different disciplines.
Metadata is a love note to the future, RO-Crate is the delivery package.
[1] https://helmholtz-metadaten.de/en
[2] Scott, Jason The Metadata Mania, http://ascii.textfiles.com/archives/3181, June 2011
[3] Soiland-Reyes, Stian et al. “Packaging Research Artefacts with RO-Crate”. Data Science, 2022; 5(2):97-138, DOI: 10.3233/DS-210053
[4] De Smedt K, Koureas D, Wittenburg P. “FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units”. Publications. 2020; 8(2):21. https://doi.org/10.3390/publications8020021
RDAP13 John Kunze: The Data Management EcosystemASIS&T
John Kunze, University of California, Curation Center
California Digital Library (CDL)
The Data Management Ecosystem
Panel: Partnerships between institutional repositories, domain repositories, and publishers
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
EZID makes it simple for researchers and others to obtain and manage long-term identifiers for their digital content. The service can create and resolve identifiers, and it also allows entry and maintenance of information about the identifier (metadata). This presentation was given as part of a webinar series.
Although there is consensus that datasets should be treated like “first class” research objects in how they are discovered, cited, and recognized, this is still far from a reality. Datasets are poorly indexed by search engines, and they are rarely cited in formal reference lists. A solution that a number of journals are implementing is to publish discovery and citation proxy objects in the form of peer-reviewed “data papers.” A strength of this approach is that it requires dataset creators to write up rich and useful metadata for the paper, but an accompanying weakness is that busy creators are not always willing to invest the necessary time and energy. To enhance dataset discoverability without burdening creators, EZID (easy-eye-dee) will begin using dataset metadata to automatically generate lightweight, non-peer reviewed publications that will increase the exposure of the metadata to search engines. EZID (ezid.cdlib.org) maintains public DataCite metadata records for over 167,000 datasets, any of which could be viewed as HTML or as a dynamically generated PDF. In cases where the creator has submitted only the required DataCite metadata, the document will function as a cover-sheet or landing page. If the creator chooses to submit optional Abstract and Methods metadata (over 2,000 records already contain Abstracts), the document expands to more closely resemble a traditional journal article, while retaining the linking functionality of a landing page. A potential bonus is that providing an incrementally improved document in exchange for the effort of submitting incrementally improved metadata may encourage authors to submit more than the minimum required metadata.
Software development should build on the successful work of others. The DMPTool helps researchers with data management planning, but what about other phases of the data life cycle? In this webinar, we will discuss what software integration with the DMPTool might look like, and why it is important. Topics include:
1. Background: why tools integration is important; why we are talking about this in terms of the DMPTool.
2. Details and plans for DMPTool2 regarding software integration and compatibility.
3. Future possibilities for software integration for DMPTool2
4. Example of successful integration of tools: work at the Center for Open Science.
Data management plans existed long before the NSF started requiring them. DMPs have inherent value despite their being relatively unknown to researchers until now. Proper, thorough data management plans are potentially a major time saver and a huge asset for the project. In this webinar, we will cover how to go beyond funder requirements and develop more thorough data DMPs The Gulf of Mexico Research Initiative requires an extensive data management plan for projects it funds; we will hear about their efforts and how they are planning to use the DMPTool going forward.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
This webinar will discuss the special needs of digital humanities researchers and help you learn how to talk them about their information management needs.
Topics that will be covered:
What is humanities data?
What special considerations are involved in creating DMPs for humanities data?
Where can you store humanities data?
What will humanities funding agencies be looking for? What regulations apply to humanities data (e.g., data sharing, data management, data availability)?
What librarians should know before meeting with a humanist; how humanists differ from other researchers in the way they think about their version of data.
The thorough integration of information technology and resources into scientific workflows has nurtured a new paradigm of data-intensive science. However, far too much research activity still takes place in silos, to the detriment of open scientific inquiry and advancement. Data-intensive science would be facilitated by more universal adoption of good data management practices ensuring the ongoing viability and usability of all legitimate research outputs, including data, and the encouragement of data publication and sharing for reuse. The centerpiece of such data sharing is the digital repository, acting as the foundation for external value-added services supporting and promoting effective data acquisition, publication, discovery, and dissemination. Since a general-purpose curation repository will not be able to offer the same level of specialized user experience provided by disciplinary tools and portals, a layered model built on a stable repository core is an appropriate division of labor, taking best advantage of the relative strengths of the concerned systems.
The Merritt repository, operated by the University of California Curation Center (UC3) at the California Digital Library (CDL), functions as a curation core for several data sharing initiatives, including the eScholarship open access publishing platform, the DataONE network, and the Open Context archaeological portal. This presentation with highlight two recent examples of external integration for purposes of research data sharing: DataShare, an open portal for biomedical data at UC, San Francisco; and Research Hub, an Alfresco-based content management system at UC, Berkeley. They both significantly extend Merritt’s coverage of the full research data lifecycle and workflows, both upstream, with augmented capabilities for data description, packaging, and deposit; and downstream, with enhanced domain-specific discovery. These efforts showcase the catalyzing effect that coupled integration of curation repositories and well-known public disciplinary search environments can have on research data sharing and scientific advancement.
More from University of California Curation Center (20)
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
7. How?
• Key identifying elements
• Emerging recommendations
• Variation among the domains
• In common: Persistent identifier
8. DataCite
German National Library of Economics (ZBW) Canada Institute for Scientific and Technical Information
German National Library of Science and Technology (TIB) (CISTI)
German National Library of Medicine (ZB MED) Technical Information Center of Denmark
GESIS - Leibniz Institute for the Social Sciences, Germany Institute for Scientific & Technical Information (INIST-
Australian National Data Service (ANDS) CNRS), France
ETH Zurich, Switzerland TU Delft Library, The Netherlands
The Swedish National Data Service (SNDS)
The British Library , UK
California Digital Library (CDL), USA
Office of Scientific & Technical Information (OSTI), USA
Purdue University Library
9. What is an identifier?
What you see: alphanumeric string (never changes)
Associated with: location of object (such as a URL)
Optional: who, what, when, etc (i.e. metadata)
By Joelk75: http://www.flickr.com/photos/75001512@N00/2728233597/
10. Identifier example
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.bologna.edu/biology/xfg/123.xls
metadata
creator: Dr. Felix Kottor
title: Data for chromosomal study of catfish (Ictalurus
punctatus)
publisher: University of Bologna
date: 8/31/2011
11. Identifier example
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.state.edu/ecology/783sdr/123.xls
metadata
creator: Dr. Felix Kottor
title: Data for chromosomal study of catfish (Ictalurus
punctatus)
publisher: Dryad Data Repository
date: 10/01/2011
12. EZID: long-term identifiers made easy
take control of the
management and
distribution of your research,
share and get credit for it,
and build your reputation
through its collection and
documentation
Primary Functions
1. Create persistent identifiers
2. Manage identifiers over time
3. Manage associated metadata over time
17. DataCite Metadata V. 2.2
• Small required set = citation elements
• Optional descriptive set:
– extendable lists
– can refer to other standards, schemes
– domain-neutral
– rich ability to describe relationships to other
digital objects
• Metadata Search (MDS) is full-text indexed
18. DataCite Metadata V. 2.2
Required properties
1. Identifier (with type attribute)
2. Creator (with name identifier attributes)
3. Title (with optional type attribute)
4. Publisher
5. PublicationYear
19. DataCite Metadata V. 2.2
Optional properties
6. Subject (with schema attribute)
7. Contributor (with type & name identifier attributes)
8. Date (with type attribute)
9. Language
10. ResourceType (with description attribute)
11. AlternateIdentifier (with type attribute)
12. RelatedIdentifier (with type &relation type attributes)
13. Size
14. Format
15. Version
16. Rights
17. Description (with type attribute)
20. DataCite Metadata V. 2.2
Optional properties
6. Subject (with schema attribute)
7. Contributor (with type & name identifier attributes)
8. Date (with type attribute)
9. Language
10. ResourceType (with description attribute)
11. AlternateIdentifier (with type attribute)
12. RelatedIdentifier (with type &relation type attributes)
13. Size
14. Format
15. Version
16. Rights
17. Description (with type attribute)
21. Data Management Planning
By NASA Goddard Photo and Video: http://www.flickr.com/photos/gsfc/3720663276/
22. A life cycle approach
CDL Curation and Publishing Services
http://www.cdlib.org
Create, edit, share, and save
data management plans
Open source add-in for Microsoft Excel
as a data collection tool
Create and manage
persistent identifiers
Curation repository:
store, manage, and share research data
Open access scholarly publishing services:
papers, journals, books, seminars & more
An infrastructure to publish and get credit Data Publication
for sharing research data
23. Identifiers and data management
Track your Organize
results your data
Get
more
citations
Meet funder requirements
24. Next Steps
DataCite
• Dublin Core application profile
• Content Service
• Metadata v. 2.3
EZID
•UI redesign
•Automated link checking
•Exposure for metadata
By Nicola Whitaker http://www.flickr.com/photos/nicolawhitaker/111009156/
25. Next Steps
Library
• service center
• information center
• your ideas here
By Nicola Whitaker http://www.flickr.com/photos/nicolawhitaker/111009156/
26. For more information
EZID
EZID application: http://n2t.net/ezid/
EZID website:
http://www.cdlib.org/services/uc3/ezid/
DataCite
DataCite Home: http://datacite.org/
DataCite Metadata Schema:
http://schema.datacite.org/meta/kernel-
2.2/index.html
DataCite Metadata Search: http://search.datacite.org
27. Questions?
by Horia Varlan
http://www.flickr.com/photos/horiavarlan/4273168957/in/photostream/
Joan Starr: uc3@ucop.edu
@joan_starr
Editor's Notes
Thank you for this opportunity to speakwith you today about Dataset Metadata. Let me give special thanks to Meghan for asking me to speak.Image credits:By: MDB 28, http://www.flickr.com/photos/mdb28/3787828482/By davecurlee, http://www.flickr.com/photos/davecurlee/4689603488/By sabarishr: http://www.flickr.com/photos/sabarishr/5422105775/By rkrichardson: http://www.flickr.com/photos/45126397@N06/4506403367/By awsheffield: http://www.flickr.com/photos/awsheffield/5932294950/By Scutter: http://www.flickr.com/photos/scutter/109698478/By Amy the Nurse: http://www.flickr.com/photos/amyashcraft/4522601466/By Anita & Greg: http://www.flickr.com/photos/anita__greg/2849453715/
My library:Serving the 10 UC campuses226,000 students 134,000 faculty and staffWorking collaborativelylibrariesdata centersmuseums, archivesfaculty and researchersCDL has historically provided strategic, integrated technical and program services in a broad portfolio, including:Groundbreaking licensing agreementsUnion bibliographic servicesData curation & preservation toolsOpen access publishing servicesCDL: http://www.cdlib.org/
My group:The UC Curation Center is creative partnership between the CDL, the ten UC campuses, and peer institutions in the community.A community of shared concern and practiceProvide solutions, services, resources for digital assets Pool & distribute diverse experience, expertise, & resources
Access: The researchers’ requirements are for: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)To provide fair credit to those responsible: exposureTo aid scientific reproducibility—re-useTo ensure scientific transparency and reasonable accountability: verificationTo aid in tracking the impact of the work: citation trackingPreservation: Easy to maintainThe funders’ requirements are for data management and And the library’s charge is to preserve our institutions’ scholarly assets
How are we going to meet these needs? If we go back to what the domains are doing…From ESIP –Earth Science Information Partners (same link)Author(s)--the people or organizations responsible for the intellectual work to develop the data set. The data creators.Release Date--when the particular version of the data set was first made available for use (and potential citation) by others.Title--the formal title of the data setVersion--the precise version of the data used. Careful version tracking is critical to accurate citation.Archive and/or Distributor--the organization distributing or caring for the data, ideally over the long term.Locator/Identifier--this could be a URL but ideally it should be a persistant service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question.Access Date and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed.From ICPSR—Inter-University Consortium for Political and Social Research http://www.icpsr.umich.edu/icpsrweb/ICPSR/curation/citations.jspTitleAuthorDateVersionPersistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)
What’s in common: the persistent identifier.
DataCite was formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“The number has now grown to 15. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information, so there is a presence in Asia.California Digital Library was one of the founding members.DATACITE’s primary methodology for achieving this mission: issuing DOIs (Digital Object Identifiers) for datasets.
DOIs are one kind of persistent identifier.But what is an identifier?An identifier is an alphanumeric string assigned to an object, and if that assignment is managed with some metadata and the object is made available over time, the identifier becomes a VERY reliable way of keeping track of that object.
Let’s take a look at one.So you can see that with just the identifier and a simple set of metadata, you get:Location for VERIFICATIONEXPOSURE & CITATION TRACKING(this is not an actual DOI, nor an actual study)
And here’s that same DOI some time later.THE STRING NEVER CHANGES. This means it can be cited, tracked and associated with all kinds of metadata. More on that in a minute.
EZID is CDL’s application for offering DataCite DOIs as well as other identifiers.
If you go to the Home Page, you can use the UI to test EZID. CLICK for HELP TAB.
On the Help screen, you have the choice of creating a test ARK or DOI.[CLICK] Click the Create buttonARKs and DOIsARKsFlexibleCase-sensitiveSpecial features support granularityCan be deletedInexpensiveDOIsEstablished brand in publishingIndexed by major A&I citation databases DataCite policies applyCannot be deletedMore costlyDOIs should be assigned to objects that are under good long-term management, and where there is an intention is to make the object persistently available.DOIs must be registered exclusively with metadata that is available to public view.Can DOIs and ARKs work together?Yes. For example, researchers may choose to use ARKs for unpublished materials associated with an object that has been registered with a DOI. These two identifier schemes can work well together, and EZID offers them both, along with policy support consistent across both schemes.
EZID creates the identifier and sends you to the MANAGE tab where you have the opportunity to enter a target URL and other metadata.UI support: Dublin KernelDublin CoreDataCite KernelAPI supportAll of the aboveFull DataCite Schema
When you hover over a field, it opens up for editing as you can see here. This is where you would go if you wanted to maintain the metadata or the target URL.
Now let’s take a look at the full DataCite Metadata set.MDS=Metadata SearchRemember, we said that any solution needed to:ALLOW the submitter to accurately describe the object so that anyone accessing knows what they are getting. ALLOW the submitter to give credit where credit is due. PROVIDEsupport for *data management* – format, version, rights
The 5 Required properties = basic citation elementsIdentifier = DOI now; in future may open upCreator is repeatable; Name can have a nameIdentifier and schema as in ORCHID idTitle is repeatable and has an optional type attribute for Alternative Title; Subtitle; and TranslatedTitlePublisher: “In the case of datasets, "publish" is understood to mean making the data available to the community of researchers.”IDENTIFIER=VERIFICATIONALLOW the submitter to give credit where credit is due. EXPOSURE & CITATION TRACKINGIf the Year field isn’t quite what you want—use the repeatable DATE field in the optional set.
Optional elementsIncludes support for data management FORMAT, VERSION, RIGHTSIn addition, some of these offer expansion of the required set. Contributer expands Creator. Date expands PublicationYear.But the distinctive strength comes from Number 12.[CLICK]
Optional elementsThe Family Jewels = RelatedIdentifer, relationTypeIsCitedBy & Cites IsSupplementTo & IsSupplementedByIsContinuedBy & Continues IsNewVersionOf & IsPreviousVersionOf IsPartOf & HasPart IsDocumentedBy & Documents isCompiledBy & CompilesIsVariantFormOf & IsOriginalFormOfCOMING IN 2.3: IsIdenticalTo
“Data Management Planning” is a popularphrase these days. As metadata and preservation librarians, I think you’ll find many of the concepts to be very familiar, if wearing new clothes.Let me tell you a little story about the life of a dataset.You start out in a laptop (or a tablet) travelling around, or under a deskMaybe then you get emailed across the country or around the world.Years can go by as you get updated and altered.Eventually, maybe you have a day in the sun: your researcher decides to write up the results and cite you.Then, perhaps, it’s back to a server in the dark. Or, you move from server to server. Will you be forgotten?
That’s why we at California Digital Library have taken a life cycle approach with an array of tools.CDL has developed an array of tools and services ranging from the first stage of developing a data management plan, through to formal publication. We encourage researchers to assign an ID early in the process - to provide a credible data management plan for funders;- to make the later stages easier and - to manage situations where changes might occur during the course of the research—a researcher changes institutions or a research team changes the location of their data, for example.
Dublin Core application profile available for the DataCite Metadata Schema; we’ll keep it up to date and in-sync. From the DCMI: “A DCAP is designed to promote interoperability within the constraints of the Dublin Core model and to encourage harmonization of usage and convergence on "emerging semantics" around its edges.”Content Service exposes our metadata stored in the DataCite Metadata Store (MDS) using multiple formats Alpha version: The service can be accessed at http://data.datacite.orgEZID: UI redesignActivity reportingBrowse & searchEnhanced persistence supportAutomated link checking in support of our new Tombstone pages (a web page returned for a resource no longer found at its target location of record. The tombstone may provide “last known” metadata, including the original owner.)Exposure for metadata—evidence that citations will increase (Heather Piwowar’s work)Thomson-Reuters (Web of Knowledge)Elsevier (Scopus)OAI? RSS?GoogleScholar
Library as a service center: Consulting, EZID, DMP,DCXL, IRInformation: pointing people to standards, toolsHelping make connections.
The next steps for you as individuals is to get more information and try things for yourselves.