The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
The tremendous growth in digital data has led to an increase in metadata initiatives for different types of scientific data, as evident in Ball’s survey (2009). Although individual communities have specific needs, there are shared goals that need to be recognized if systems are to effectively support data sharing within and across all domains. This paper considers this need, and explores systems requirements that are essential for metadata supporting the discovery and management of scientific data. The paper begins with an introduction and a review of selected research specific to metadata modeling in the sciences. Next, the paper’s goals are stated, followed by the presentation of valuable systems requirements. The results include a base-model with three chief principles: principle of least effort, infrastructure service, and portability. The principles are intended to support “data user” tasks. Results also include a set of defined user tasks and functions, and applications scenarios.
Data grids are an emerging technology that enables the formation of sharable collections from data distributed across multiple storage resources. The integrated Rule Oriented Data System (iRODS) is a data grid developed by the DICE Center at UNC-CH. The iRODS data grid enforces management policies that control properties of the collection. Examples of policies include retention, disposition, distribution, replication, metadata extraction, time-dependent access controls, data processing, data redaction, and integrity checking. Policies can be defined that automate administrative functions (file migration and replication) and that validate assessment criteria (authenticity, integrity, chain of custody). iRODS is used to build data sharing environments, digital libraries, and preservation environments. The iRODS data grid is used at UNC-CH to support the Carolina Digital Repository, the LifeTime Library for the School of Information and Library Science, data grids for the Renaissance Computing Institute (RENCI), collaborations within North Carolina, and both national and international data sharing. At RENCI, the TUCASI data grid supports shared collections between UNC-CH, Duke, and NCSU. The RENCI data grid is federated with ten other data grids including the National Climatic Data Center, the Texas Advanced Computing Center data grid, and the Ocean Observatories Initiative data grid. International applications include the CyberSKA Square Kilometer Array for radio astronomy and the French National Institute for Nuclear Physics and Particle Physics. The collections that are assembled may contain hundreds of millions of files, and petabytes of data. A specific goal is the integration of institutional repositories with the national data infrastructure that is being assembled under the NSF DataNet program. The software is available as an open source distribution from http://irods.diceresearch.org.
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
This presentation was provided by Marilyn White, Katelynd Bucher, and Briget Wynne, all of NIST, during the NISO webinar, Engineering Access Under the Hood, Part Two, held on November 15, 2017.
This presentation by David Wilcox was part of the NISO Virtual Conference, held on Feb 15, 2017, entitled Institutional Repositories: Ensuring Yours Is Populated, Useful and Thriving.
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
The tremendous growth in digital data has led to an increase in metadata initiatives for different types of scientific data, as evident in Ball’s survey (2009). Although individual communities have specific needs, there are shared goals that need to be recognized if systems are to effectively support data sharing within and across all domains. This paper considers this need, and explores systems requirements that are essential for metadata supporting the discovery and management of scientific data. The paper begins with an introduction and a review of selected research specific to metadata modeling in the sciences. Next, the paper’s goals are stated, followed by the presentation of valuable systems requirements. The results include a base-model with three chief principles: principle of least effort, infrastructure service, and portability. The principles are intended to support “data user” tasks. Results also include a set of defined user tasks and functions, and applications scenarios.
Data grids are an emerging technology that enables the formation of sharable collections from data distributed across multiple storage resources. The integrated Rule Oriented Data System (iRODS) is a data grid developed by the DICE Center at UNC-CH. The iRODS data grid enforces management policies that control properties of the collection. Examples of policies include retention, disposition, distribution, replication, metadata extraction, time-dependent access controls, data processing, data redaction, and integrity checking. Policies can be defined that automate administrative functions (file migration and replication) and that validate assessment criteria (authenticity, integrity, chain of custody). iRODS is used to build data sharing environments, digital libraries, and preservation environments. The iRODS data grid is used at UNC-CH to support the Carolina Digital Repository, the LifeTime Library for the School of Information and Library Science, data grids for the Renaissance Computing Institute (RENCI), collaborations within North Carolina, and both national and international data sharing. At RENCI, the TUCASI data grid supports shared collections between UNC-CH, Duke, and NCSU. The RENCI data grid is federated with ten other data grids including the National Climatic Data Center, the Texas Advanced Computing Center data grid, and the Ocean Observatories Initiative data grid. International applications include the CyberSKA Square Kilometer Array for radio astronomy and the French National Institute for Nuclear Physics and Particle Physics. The collections that are assembled may contain hundreds of millions of files, and petabytes of data. A specific goal is the integration of institutional repositories with the national data infrastructure that is being assembled under the NSF DataNet program. The software is available as an open source distribution from http://irods.diceresearch.org.
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
This presentation was provided by Marilyn White, Katelynd Bucher, and Briget Wynne, all of NIST, during the NISO webinar, Engineering Access Under the Hood, Part Two, held on November 15, 2017.
This presentation by David Wilcox was part of the NISO Virtual Conference, held on Feb 15, 2017, entitled Institutional Repositories: Ensuring Yours Is Populated, Useful and Thriving.
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
From the May 2014 ORCID Outreach Meeting, https://orcid.org/content/orcid-outreach-meeting-and-codefest-may-2014
ORCID at professional associations
For scholarly societies, ORCID can help tie together siloed internal systems, including manuscript submission, membership management, author and reviewer databases, and conferences, improving an organizations ability to serve its members. This session will offer a discussion of integration points, policy issues, data flow between systems, researcher participation, discovered opportunities, and demonstrations by societies and vendors.
Moderator: Bernard Rous, Director of Publications, Association for Computing Machinery
Presenters:
Scott Moore, Director of Technology Services, Society for Neuroscience
Reynold Guida, Director, Product Management, IEEE
Gordon MacPherson, Director of Conference Quality, IEEE
Mary Warner, Assistant Director, Publications, American Geophysical Union
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
Presented at Digital Life 2018, Bergen, March 2018. In the Trust and Accountability session.
In recent years we have seen a change in expectations for the management and availability of all the outcomes of research (models, data, SOPs, software etc) and for greater transparency and reproduciblity in the method of research. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for stewardship [1] have proved to be an effective rallying-cry for community groups and for policy makers.
The FAIRDOM Initiative (FAIR Data Models Operations, http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards and sensitivity to asset sharing and credit anxiety. Our aim is a FAIR Research Commons that blends together the doing of research with the communication of research. The Platform has been installed by over 30 labs/projects and our public, centrally hosted FAIRDOMHub [2] supports the outcomes of 90+ projects. We are proud to support projects in Norway’s Digital Life programme.
2018 is our 10th anniversary. Over the past decade we learned a lot about trust between researchers, between researchers and platform developers and curators and between both these groups and funders. We have experienced the Tragedy of the Commons but also seen shifts in attitudes.
In this talk we will use our experiences in FAIRDOM to explore the political, economic, social and technical, social practicalities of Trust.
[1] Wilkinson et al (2016) The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
[2] Wolstencroft, et al (2016) FAIRDOMHub: a repository and collaboration environment for sharing systems biology research Nucleic Acids Research, 45(D1): D404-D407. DOI: 10.1093/nar/gkw1032
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
Keynote presented at the workshop FAIRe Data Infrastructures, 15 October 2020
https://www.gmds.de/aktivitaeten/medizinische-informatik/projektgruppenseiten/faire-dateninfrastrukturen-fuer-die-biomedizinische-informatik/workshop-2020/
Remarkably it was only in 2016 that the ‘FAIR Guiding Principles for scientific data management and stewardship’ appeared in Scientific Data. The paper was intended to launch a dialogue within the research and policy communities: to start a journey to wider accessibility and reusability of data and prepare for automation-readiness by supporting findability, accessibility, interoperability and reusability for machines. Many of the authors (including myself) came from biomedical and associated communities. The paper succeeded in its aim, at least at the policy, enterprise and professional data infrastructure level. Whether FAIR has impacted the researcher at the bench or bedside is open to doubt. It certainly inspired a great deal of activity, many projects, a lot of positioning of interests and raised awareness. COVID has injected impetus and urgency to the FAIR cause (good) and also highlighted its politicisation (not so good).
In this talk I’ll make some personal reflections on how we are faring with FAIR: as one of the original principles authors; as a participant in many current FAIR initiatives (particularly in the biomedical sector and for research objects other than data) and as a veteran of FAIR before we had the principles.
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
A presentation given by Manjula Patel (UKOLN) at the Repository Curation Environments (RECURSE) Workshop held at the 4th International Digital Curation Conference, Edinburgh, 1st December 2008,
http://www.dcc.ac.uk/events/dcc-2008/programme/
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Matthew Spitzer, Center for Open Science
FAIRsharing Keynote - International Workshop on Sharing, Citation and Publica...Peter McQuilton
A 30 minute presentation on FAIRsharing given at the International Workshop on Sharing, Citation and Publication of
Scientific Data across Disciplines in Tachikawa, Tokyo, Japan on Tuesday 5th December, 2017
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsBrett Tully
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
Islam M1,2, Christiansen J3, Mahboob S4, Valova V4, Baker M4, Capes-Davis D4, Hains P4, Balleine R1,4, Zhong Q1,4, Reddel R1,4, Robinson P1,4, Tully B4
1 The University of Sydney, Camperdown, Sydney, NSW, 2050, Australia
2 Intersect, Level 13/50 Carrington St, Sydney, NSW, 2000, Australia
3 Queensland Cyber Infrastructure Foundation Ltd, Axon Building 47, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
4 Children’s Medical Research Institute, Westmead, NSW, 2145, Australia
Background
The ACRF International Centre for the Proteome of Cancer (ProCan) at Children’s Medical Research Institute (CMRI) is an “industrial scale” program specialising in small-sample proteomics analysis from human cancer tissue.
ProCan seeks to generate both a wide and deep analytics pipeline and requires an enabling data framework. The framework must accommodate initial analysis and proteomic profiling of a large number of tumor samples, along with the clinical and demographic information, subsequent multi-omics studies, and any previously recorded responses to treatment. The curated datasets will provide a valuable resource beyond their primary use and ProCan is committed to making its data accessible to collaborators and the wider scientific community.
Objectives
The objective of the project is to an establish efficient, reliable, secure and ethical data sharing and publication framework based on the best practice data sharing principles, such as the FAIR principle. The framework must address various challenges that stem from the scale and complexity of the program, and ProCan’s focus on human-derived data and associated challenges presented in sharing these data while maintaining the privacy of any research participants.
Method
The project adopted a requirements-driven methodology and engaged with a wide range of ProCan stakeholders nationally and internationally. Together, various industrial-scale proteomics data management and sharing scenarios were explored such that robust and ethical sharing of the data would be achieved.
Results
The project developed a data sharing framework based on the FAIR principle that currently forms the basis of ongoing implementation work within the ProCan program.
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...datascienceiqss
Scholars Portal, a program of the Ontario Council of University Libraries (OCUL), provides the technical infrastructure to store, preserve, and provide access to shared digital library collections in Ontario - including hosting a local instance of Dataverse since 2011. As part of a national project known as Portage (a project of the Canadian Association of Research Libraries), Scholars Portal is partnering with Artefactual Systems, Dataverse, the University of British Columbia, the University of Alberta, and others, to integrate Dataverse with preservation software Archivematica. When completed, this project will facilitate the long-term preservation of research data according to the Open Archival Information System (OAIS) Reference Model.
Rots RDAP11 Data Archives in Federal AgenciesASIS&T
Arnold Rots, VAO; Data Archives in Federal Agencies; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
From the May 2014 ORCID Outreach Meeting, https://orcid.org/content/orcid-outreach-meeting-and-codefest-may-2014
ORCID at professional associations
For scholarly societies, ORCID can help tie together siloed internal systems, including manuscript submission, membership management, author and reviewer databases, and conferences, improving an organizations ability to serve its members. This session will offer a discussion of integration points, policy issues, data flow between systems, researcher participation, discovered opportunities, and demonstrations by societies and vendors.
Moderator: Bernard Rous, Director of Publications, Association for Computing Machinery
Presenters:
Scott Moore, Director of Technology Services, Society for Neuroscience
Reynold Guida, Director, Product Management, IEEE
Gordon MacPherson, Director of Conference Quality, IEEE
Mary Warner, Assistant Director, Publications, American Geophysical Union
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
Presented at Digital Life 2018, Bergen, March 2018. In the Trust and Accountability session.
In recent years we have seen a change in expectations for the management and availability of all the outcomes of research (models, data, SOPs, software etc) and for greater transparency and reproduciblity in the method of research. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for stewardship [1] have proved to be an effective rallying-cry for community groups and for policy makers.
The FAIRDOM Initiative (FAIR Data Models Operations, http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards and sensitivity to asset sharing and credit anxiety. Our aim is a FAIR Research Commons that blends together the doing of research with the communication of research. The Platform has been installed by over 30 labs/projects and our public, centrally hosted FAIRDOMHub [2] supports the outcomes of 90+ projects. We are proud to support projects in Norway’s Digital Life programme.
2018 is our 10th anniversary. Over the past decade we learned a lot about trust between researchers, between researchers and platform developers and curators and between both these groups and funders. We have experienced the Tragedy of the Commons but also seen shifts in attitudes.
In this talk we will use our experiences in FAIRDOM to explore the political, economic, social and technical, social practicalities of Trust.
[1] Wilkinson et al (2016) The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
[2] Wolstencroft, et al (2016) FAIRDOMHub: a repository and collaboration environment for sharing systems biology research Nucleic Acids Research, 45(D1): D404-D407. DOI: 10.1093/nar/gkw1032
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
Keynote presented at the workshop FAIRe Data Infrastructures, 15 October 2020
https://www.gmds.de/aktivitaeten/medizinische-informatik/projektgruppenseiten/faire-dateninfrastrukturen-fuer-die-biomedizinische-informatik/workshop-2020/
Remarkably it was only in 2016 that the ‘FAIR Guiding Principles for scientific data management and stewardship’ appeared in Scientific Data. The paper was intended to launch a dialogue within the research and policy communities: to start a journey to wider accessibility and reusability of data and prepare for automation-readiness by supporting findability, accessibility, interoperability and reusability for machines. Many of the authors (including myself) came from biomedical and associated communities. The paper succeeded in its aim, at least at the policy, enterprise and professional data infrastructure level. Whether FAIR has impacted the researcher at the bench or bedside is open to doubt. It certainly inspired a great deal of activity, many projects, a lot of positioning of interests and raised awareness. COVID has injected impetus and urgency to the FAIR cause (good) and also highlighted its politicisation (not so good).
In this talk I’ll make some personal reflections on how we are faring with FAIR: as one of the original principles authors; as a participant in many current FAIR initiatives (particularly in the biomedical sector and for research objects other than data) and as a veteran of FAIR before we had the principles.
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
A presentation given by Manjula Patel (UKOLN) at the Repository Curation Environments (RECURSE) Workshop held at the 4th International Digital Curation Conference, Edinburgh, 1st December 2008,
http://www.dcc.ac.uk/events/dcc-2008/programme/
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Matthew Spitzer, Center for Open Science
FAIRsharing Keynote - International Workshop on Sharing, Citation and Publica...Peter McQuilton
A 30 minute presentation on FAIRsharing given at the International Workshop on Sharing, Citation and Publication of
Scientific Data across Disciplines in Tachikawa, Tokyo, Japan on Tuesday 5th December, 2017
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsBrett Tully
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
Islam M1,2, Christiansen J3, Mahboob S4, Valova V4, Baker M4, Capes-Davis D4, Hains P4, Balleine R1,4, Zhong Q1,4, Reddel R1,4, Robinson P1,4, Tully B4
1 The University of Sydney, Camperdown, Sydney, NSW, 2050, Australia
2 Intersect, Level 13/50 Carrington St, Sydney, NSW, 2000, Australia
3 Queensland Cyber Infrastructure Foundation Ltd, Axon Building 47, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
4 Children’s Medical Research Institute, Westmead, NSW, 2145, Australia
Background
The ACRF International Centre for the Proteome of Cancer (ProCan) at Children’s Medical Research Institute (CMRI) is an “industrial scale” program specialising in small-sample proteomics analysis from human cancer tissue.
ProCan seeks to generate both a wide and deep analytics pipeline and requires an enabling data framework. The framework must accommodate initial analysis and proteomic profiling of a large number of tumor samples, along with the clinical and demographic information, subsequent multi-omics studies, and any previously recorded responses to treatment. The curated datasets will provide a valuable resource beyond their primary use and ProCan is committed to making its data accessible to collaborators and the wider scientific community.
Objectives
The objective of the project is to an establish efficient, reliable, secure and ethical data sharing and publication framework based on the best practice data sharing principles, such as the FAIR principle. The framework must address various challenges that stem from the scale and complexity of the program, and ProCan’s focus on human-derived data and associated challenges presented in sharing these data while maintaining the privacy of any research participants.
Method
The project adopted a requirements-driven methodology and engaged with a wide range of ProCan stakeholders nationally and internationally. Together, various industrial-scale proteomics data management and sharing scenarios were explored such that robust and ethical sharing of the data would be achieved.
Results
The project developed a data sharing framework based on the FAIR principle that currently forms the basis of ongoing implementation work within the ProCan program.
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...datascienceiqss
Scholars Portal, a program of the Ontario Council of University Libraries (OCUL), provides the technical infrastructure to store, preserve, and provide access to shared digital library collections in Ontario - including hosting a local instance of Dataverse since 2011. As part of a national project known as Portage (a project of the Canadian Association of Research Libraries), Scholars Portal is partnering with Artefactual Systems, Dataverse, the University of British Columbia, the University of Alberta, and others, to integrate Dataverse with preservation software Archivematica. When completed, this project will facilitate the long-term preservation of research data according to the Open Archival Information System (OAIS) Reference Model.
Rots RDAP11 Data Archives in Federal AgenciesASIS&T
Arnold Rots, VAO; Data Archives in Federal Agencies; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Infrastructure, Standards, and Policies for Research Data Management Jian Qin
This presentation discusses the needs and importance of research data management and introduces the concept of research data management as an infrastructure service. Although many resources have been made available for research data management, most of them are developed as “islands” and lack linking mechanisms. The lack of integrated and interconnected resources has contributed to high cost and duplicated efforts in data management operations. The vision of research data management as an infrastructure service is not only to improve the efficiency of research data management but also the productivity of the research enterprise. Each of the three dimensions—infrastructure, standards, and policies—addresses a critical aspect of research data management to make the data infrastructure services work.
Data Science: An Emerging Field for Future JobsJian Qin
Data deluge has become a reality in today's scientific research. What does it mean to future science workforce? How can you prepare yourself to embrace the data challenges and opportunities? This presentation will provide you with an overview of data science and what it means to you as future researchers and career scientists.
Educating a New Breed of Data Scientists for Scientific Data Management Jian Qin
This presentation reports the data science curriculum development and implementation at Syracuse iSchool, which has shaped by the fast changing data-intensive environment not only for science but also for business and research at large.
SSDL для руководителей: как перевести команду на безопасную разработку и не выстрелить себе в ногу.
25 ноября в Технологическом центре Microsoft состоится PDUG Meetup: SSDL for Management — встреча для руководителей R&D- и ИБ-подразделений, управляющих крупными проектами и командами разработки
https://habrahabr.ru/company/pt/blog/315840/
--
Дополнительно: www.slideshare.net/ValeryBoronin/2016-61855208
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...ASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Part of “Beyond metadata: Supporting non-standardized documentation to facilitate data reuse”
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation”
Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.
What is data discovery and how do people find out about data?
Metadata: What information helps potential users decide whether that data might be useful?
How and why do machines exchange information about research data?
Data without metadata and connections is useless:
Linked data
How Scholix is helping publishers and others to link data with publications and more
Metadata, controlled vocabularies, linked data and crosswalks
Things #11, #12, #13 of 23 Things
How do we make FAIR data? Finable, Accessible, Interoperable, Reusable?
This presentation was provided by
Priscilla Caplan of The Florida Center for Library Automation and Jeremy York of The University of Michigan Library, during the NISO Webinar "What It Takes To Make It Last: E-Resources Preservation" held on February 10, 2011.
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
The position paper aims at discussing the potential of exploiting linked data best practice to provide metadata documenting domain specific resources created through verbose acquisition-processing pipelines. It argues that resource selection, namely the process engaged to choose a set of resources suitable for a given analysis/design purpose, must be supported by a deep comparison of their metadata. The semantic similarity proposed in our previous works is discussed for this purpose and the main issues to make it scale up to the web of data are introduced. Discussed issues contribute beyond the re-engineering of our similarity since they largely apply to every tool which is going to exploit information made available as linked data. A research plan and an exploratory phase facing the presented issues are described remarking the lessons we have learnt so far.
Duraspace Hot Topics Series 6: Metadata and Repository ServicesMatthew Critchlow
Presented by Declan Fleming, Arwen Hutt, and Matt Critchlow. The second in a three part Webinar series on Research Data Curation at UC San Diego, as part of the larger Research Cyberinfrastructure initiative.
This presentation was provided by Carolyn Hansen of the University of Cincinnati during the NISO Training Thursday event, Metadata and the IR, held on Thursday, February 23, 2017.
Presentación del Dr. Getaneh Alemu (Solent University, Reino Unido), en el II Congreso de Información, Comunicación e Investigación (CICI 2018) “Metadatos y Organización de la Información”. Facultad de Filosofía y Letras de la Universidad Autónoma de Chihuahua, México. Evento organizado por el Cuerpo Académico 'Estudios de la Información' y el Grupo Disciplinar ‘Información, Lenguaje, Comunicación y Desarrollo Sostenible’. 29 de octubre de 2018.
A Keynote at the Web Science Conference, 2018, held at the VU Amsterdam [1]. This describes in the main the output of the Semantic Technology Institute International (STI2) Summit (for senior researchers in the Semantic Web field) held in Crete in September, 2017 [2].
1. https://websci18.webscience.org/
2. https://www.sti2.org/events/2017-sti2-semantic-summit
http://wiki.knoesis.org/index.php/MaterialWays
http://www.knoesis.org/?q=research/semMat
http://wiki.knoesis.org/index.php/MaterialWays
Abstract
The sharing, discovery, and application of materials science and engineering data and documents are possible only if domain scientists are able and willing to do so. We need to overcome technological challenges such as the development of convenient computational tools and repositories conducive to easy exchange, curation, attribution, and analysis of data, and cultural challenges such as proper protection, control, and credit for sharing data. Our thesis and value proposition is that associating machine-processable semantics with materials science and engineering data and documents can provide a solid foundation for overcoming challenges associated with data discovery, integration, and interoperability caused by data heterogeneity. Specifically, easy to use and low upfront cost lightweight semantics in the form of file-level annotation can enable document discovery and sharing, while deeper data-level annotation using standardized ontologies can benefit semantic search and summarization. Machine processability achieved through fine-grained semantic annotation, extraction, and translation can enable data integration, interoperability and reasoning, ultimately leading to Linked Open Materials Science Data. Thus, a different granularity of semantics provides a continuum of cost/ease of use and expressiveness trade-off. In this presentation, we also show the application of semantic techniques for content extraction from materials and process specifications which are semi-structured and table-rich, and the application of semantic web techniques and technologies for materials vocabulary integration and curation (via semantic media wiki), semantic web visualization, efficient representation of provenance metadata and access control (via singleton property), and biomaterials information extraction
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Designing Great Products: The Power of Design and Leadership by Chief Designe...
How Portable Are the Metadata Standards for Scientific Data?
1. How Portable Are the
Metadata Standards for
Scientific Data?
Jian Qin Kai Li
School of Information Studies
Syracuse University
Syracuse, NY, USA
2. Why study metadata portability?
Complex, very large metadata
standards
• are “…unwieldy to apply…”
• are “difficult to understand
and enact in its entirety…”
• require customization to
tailor to specific needs
• costly in time and personnel
3. Each standard has its own
schema and tools…
EPA Metadata
Editor
Metavist2 Morpho
Protocol
Registration SystemAVM Tagging Tool
…that lead to duplicated efforts
and interoperability problems
4. Metadata for scientific data is
at the juncture of -
Data-Driven Science
RDF
RDFa
ORCID
XML
OWL
DOI
Technical Standards
Infrastructure
5. A few big questions
• What action should and can we take at this
juncture as a community of metadata
practices?
• How much do we know about metadata
standards for scientific data?
• How can we transform the current
metadata standards into an infrastructure-
driven service?
7. An attempt to define “metadata
infrastructure”
Semantically:
• metadata elements, vocabularies, entities, and
other metadata artifacts as the underlying
foundation to build tools, software and
applications
Technically:
• “a data model for describing the resources,
aspects of metadata encoding and storage
formats, metadata for web services, metadata
tools, usage, modification, transformation,
interoperability, and metadata crosswalk
” (CLARIN)
8. Portability is the key
Building blocks of
metadata
Metadata for data citation
Metadata for data discovery
Metadata for data archiving
Metadata for data quality
Metadata for data
provenance
Metadata for data
management
Metadata generation
output
9. How portable are metadata
standards for scientific data?
Two measures of metadata portability:
• Co-occurrence of semantic elements: the
times of semantically identical elements used
in multiple standards
• Degree of modularity: the degree of
independence and self-descriptiveness of a
sub-structure of concept/entity in metadata
standards
10. Data
• 5,800 elements from 16
scientific metadata standards
Element
Collection
• 4,434 unique elements in
terms of semantic
Element De-
duplication
• 9 categories based on
functionalities of the elements
Categorization
11. Element distribution by standard
NetCDF Climate and
Forecast Metadata
Convention (CF)
2427 elements
Ecological Metadata
Language
15. Element co-occurrences across
metadata standards
Of 4,434 unique elements,
539 (12.16%) elements
occurred in more than one
standard
Frequency of
co-occurrences
Numberofelements
17. Modularity
Two levels of modularity:
• Level 1: having multiple XML schema files for
the whole standard;
• Level 2: having separate schemas for entities
such as person/organization, dataset, study,
instrument, and subject
Of the 6 standards with schema files, all of
them belong to Level 1 modularity.
19. Further research
More questions than answers from this study:
• What should a metadata infrastructure
constitute?
• How can the gaps be filled or narrowed
between the infrastructure resources and
metadata applications?
• Is it possible or is there a need to streamline
the metadata scheme design practice toward
a metadata infrastructure?
• …and the list can go on