A presentation on the SageCite project given at the JISC MRD International Workshop in March 2011. Describes the application domain and citation challenges in SageCite.
This document discusses two workflows developed for the MetaGEO project to normalize gene expression data sets from the GEO database. It also describes a Taverna plugin that was developed to register workflow results with a DOI using the DataCite service, allowing users to cite and access workflow outputs. Current users and contributors to the MetaGEO project are listed.
This document discusses data citation and how to implement it for publishers and data repositories. It covers how publishers can include data citations in their Crossref metadata and how repositories can link datasets to publications. It also introduces the Crossref Event Data service, which captures these data citations and other relationships between DOIs and makes them openly available via APIs. This allows data citations to be more widely discovered and adopted.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
This document discusses building FAIR data knowledge graphs from theory to practice. It begins by outlining what R&D researchers want to do with data, such as understanding disease mechanisms and using patient data, but that currently data is fragmented across systems. It then introduces the FAIR data principles and describes building a knowledge graph that incorporates data from multiple sources using standards like the Data Catalog vocabulary. The key challenges discussed are determining canonical representations for entities and linking data to public vocabularies through an enrichment process.
This document discusses efforts to automatically detect data types to enable automatic data processing from large scientific data collections in the cloud. It presents two major processes in scientific data use: data discovery and data processing. Currently, data processing is typically done manually by checking data formats, structures, versions and quality. The document proposes automatically detecting data types using a data type registry connected to metadata about data via persistent identifiers, which would enable shifting from manual to automatic data processing. This could help outsiders process data without extensive expertise in a field's data schemes and tools.
Acquisition, Storage and Management of Research Data in Chemical Sciences: De...LIBER Europe
This presentation by Claudia Kramer and Nicole Jung was part of the "Research Data Support Meets Disciplines: Opportunities & Challenges" workshop at LIBER's 2017 Annual Conference in Patras, Greece. For more information, see www.libereurope.eu
Why would a publisher care about open data?Anita de Waard
A publisher would care about open data for several reasons:
1) Open data increases the value of all parts of the web by allowing programs, not just people, to utilize the data through interconnecting and joining it.
2) Publishers are evolving from linear supply chains focused on content delivery to users, to becoming marketplaces that optimize the number of interactions between users through networked open science.
3) The future of publishing involves networked open science where data is openly accessible, annotated with metadata, and linked together in research objects, increasing findability, accessibility, interoperability, and reusability of research outputs.
The document discusses LOD2 Deliverable 3.1.1, which will publish a survey of tools concerned with knowledge extraction from structured sources. The data will be collected in a Linked Data and SPARQL enabled OntoWiki to allow for sustainability through crowd-sourcing updates and maintenance. The OntoWiki has been deployed at http://data.lod2.eu/2011/tools/ and will need further fine-tuning over the next several weeks. The data is available as Linked Data or through SPARQL queries at the provided URL.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
This document discusses two workflows developed for the MetaGEO project to normalize gene expression data sets from the GEO database. It also describes a Taverna plugin that was developed to register workflow results with a DOI using the DataCite service, allowing users to cite and access workflow outputs. Current users and contributors to the MetaGEO project are listed.
This document discusses data citation and how to implement it for publishers and data repositories. It covers how publishers can include data citations in their Crossref metadata and how repositories can link datasets to publications. It also introduces the Crossref Event Data service, which captures these data citations and other relationships between DOIs and makes them openly available via APIs. This allows data citations to be more widely discovered and adopted.
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
This document discusses building FAIR data knowledge graphs from theory to practice. It begins by outlining what R&D researchers want to do with data, such as understanding disease mechanisms and using patient data, but that currently data is fragmented across systems. It then introduces the FAIR data principles and describes building a knowledge graph that incorporates data from multiple sources using standards like the Data Catalog vocabulary. The key challenges discussed are determining canonical representations for entities and linking data to public vocabularies through an enrichment process.
This document discusses efforts to automatically detect data types to enable automatic data processing from large scientific data collections in the cloud. It presents two major processes in scientific data use: data discovery and data processing. Currently, data processing is typically done manually by checking data formats, structures, versions and quality. The document proposes automatically detecting data types using a data type registry connected to metadata about data via persistent identifiers, which would enable shifting from manual to automatic data processing. This could help outsiders process data without extensive expertise in a field's data schemes and tools.
Acquisition, Storage and Management of Research Data in Chemical Sciences: De...LIBER Europe
This presentation by Claudia Kramer and Nicole Jung was part of the "Research Data Support Meets Disciplines: Opportunities & Challenges" workshop at LIBER's 2017 Annual Conference in Patras, Greece. For more information, see www.libereurope.eu
Why would a publisher care about open data?Anita de Waard
A publisher would care about open data for several reasons:
1) Open data increases the value of all parts of the web by allowing programs, not just people, to utilize the data through interconnecting and joining it.
2) Publishers are evolving from linear supply chains focused on content delivery to users, to becoming marketplaces that optimize the number of interactions between users through networked open science.
3) The future of publishing involves networked open science where data is openly accessible, annotated with metadata, and linked together in research objects, increasing findability, accessibility, interoperability, and reusability of research outputs.
The document discusses LOD2 Deliverable 3.1.1, which will publish a survey of tools concerned with knowledge extraction from structured sources. The data will be collected in a Linked Data and SPARQL enabled OntoWiki to allow for sustainability through crowd-sourcing updates and maintenance. The OntoWiki has been deployed at http://data.lod2.eu/2011/tools/ and will need further fine-tuning over the next several weeks. The data is available as Linked Data or through SPARQL queries at the provided URL.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
This document summarizes an bioinformatics workshop on enabling systems biology. It discusses various databases and standards for sharing molecular interaction data, including IntAct, UniProt, Reactome, PRIDE, and PSI formats. It also describes services for programmatically accessing this data, such as PSICQUIC and PSISCORE, which allow querying multiple source repositories using a common standard. Clients have been created to access these services and visualize interaction networks.
Recording and Reasoning Over Data Provenance in Web and Grid ServicesMartin Szomszor
The document discusses the importance of provenance data in distributed computing environments like grids and web services. It proposes a service-oriented architecture and data model for capturing and querying provenance information. The architecture includes a provenance service for storage and analysis of provenance data gathered during workflow executions across multiple services and systems.
SEEK is an open-source platform for scientists to store, share, and collaborate on heterogeneous data, models, and standard operating procedures. It was developed by researchers in the UK and Germany to facilitate data sharing across multi-group projects. SEEK allows scientists to organize experiments and data using ISA-TAB standards, interlink related assets, and control access to assets at various stages of research from private to public. Key features include hosting and simulating SBML models, exploring and annotating spreadsheets, and finding expertise and collaborators through people profiles.
Role of PIDs in connecting scholarly worksOpenAIRE
Presentation from a joint webinar FREYA and OpenAIRE: New developments in the field of Persistent Identifiers by Dr. Amir Aryani, Director, Research Graph Foundation
The OntoChem IT Solutions GmbH ...
... was founded in 2015 as a purely IT-oriented offshoot of the OntoChem GmbH. Even before we had many years of experience and it has always been our mission to provide added value to our customers by helping them to navigate today’s complex information world by developing cognitive computing solutions, indexing intranet and internet data and applying semantic search solutions for pharmaceutical, material sciences and technology driven businesses.
We strive to support our customers with the most useful tools for knowledge discovery possible, encompassing up-to-date data sources, optimized ontologies and high-throughput semantic document processing and annotation techniques.
We create new knowledge from structured and unstructured data by extracting relationships thereby exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.
We aim at an unprecedented, machine understanding of text and subsequent knowledge extraction and inference. The application of our methods towards chemical compounds and their properties supports our customers in generating intellectual property and their use as novel therapeutics, agrochemical products, nutraceuticals, cosmetics and in the field of novel materials.
It's our mission to provide added value to customers by:
developing and applying cognitive computing solutions
creating intranet and internet data indexing and semantic search solutions
Big Data analytics for technology driven businesses
supporting product development and surveillance.
We deliver useful tools for knowledge discovery for:
creating background knowledge ontologies
high-throughput semantic document processing and annotation
knowledge mining by extracting relationships
exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
Over 15 years ago, Sir Tim Berners Lee proclaimed the founding of an exciting new future involving intelligent agents operating over smarter data in order to perform complex tasks at the behest of their human controllers. At the heart of this vision lies an uneasy alliance between tedious formal knowledge representations and powerful analytics over big, but often messy data. Bio2RDF, our decade old open source project to create Linked Data for the life sciences, has weaved emergent Semantic Web technologies such as ontologies and Linked Data to generate FAIR - Findable, Accessible, Interoperable, and Reusable - data in the form of billions of machine accessible statements for use in downstream biomedical discovery.
This revolution in data publication has been strengthened by action from global bioinformatics institutions such as the NCBI, NCBO, EBI, and DBCLS. Notably, NCBI's PubChem has successfully coupled large scale data integration with community-based standards to offer a remakable biochemical knowledge resource amenable to data hungry discovery tools. Yet, in the face of increasing pressure from researchers, funders, and publishers, will these approaches be sufficient for growing and maintaining a comprehensive knowledge graph that is inclusive of all biomedical research?
This is an overview of the Data Biosphere Project, its goals, its architecture, and the three core projects that form its foundation. We also discuss data commons.
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...OpenAIRE
Presentation from a joint FREYA and OpenAIRE webinar "New developments in the field of Persistent Identifiers" (PIDs) that covers OpenAIRE Content Acquisition Policy, role of PIDs in OpenAIRE, OpenAIRE Guidelines and their objectives, use of PIDs for different kinds of entities and provides some examples.
What is Data Commons and How Can Your Organization Build One?Robert Grossman
1. Data commons co-locate large biomedical datasets with cloud computing infrastructure and analysis tools to create shared resources for the research community.
2. The NCI Genomic Data Commons is an example of a data commons that makes over 2.5 petabytes of cancer genomics data available through web portals, APIs, and harmonized analysis pipelines.
3. The Gen3 platform is an open source software stack for building data commons that can interoperate through common APIs and data models to support reproducible, collaborative research across projects.
Presentation from a joint FREYA and OpenAIRE webinar "New developments in the field of Persistent Identifiers" (PID) on FREYA-WP3: New PID developments by Ketil Koop-Jakobsen, PANGAEA, Bremen University, Germany
THOR Workshop - Persistent Identifier LinkingMaaike Duine
The document discusses challenges around linking data and contributors at different levels of granularity. It addresses issues like how to cite datasets at the right level of specificity, how to link datasets and contributors when a dataset may be part of a larger collection, and how to handle multiple versions of datasets. It also covers linking databases across domains and implementing these linkages using persistent identifiers.
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
Written and presented by Tom Ingraham (F1000), at the Reproducible and Citable Data and Model Workshop, in Warnemünde, Germany. September 14th -16th 2015.
Towards metrics to assess and encourage FAIRnessMichel Dumontier
With an increased interest in the FAIR metrics, there is need to develop tools and appraoches that can assess the FAIRness of a digital resource. This talk begins to explore some ideas in this space, and invites people to participate in a working group focused on the development, application, and evaluation of FAIR metric efforts.
How DataCite and Crossref Support Research Data Sharing - Crossref LIVE HannoverCrossref
Britta Dreyer from DataCite presents on how DataCite and Crossref collaboratively support research data sharing. Presented at Crossref LIVE Hannover, June 27th 2018.
Crowdsourcing Lay Summaries: Bridging the Gap in Health Researchmonicaduke
Liz Lyon presentsCrowdsourcing Lay Summaries: Bridging the Gap in Health Research at the Patients Participate !Workshop at the British Library, 17th June 2011
This document summarizes an bioinformatics workshop on enabling systems biology. It discusses various databases and standards for sharing molecular interaction data, including IntAct, UniProt, Reactome, PRIDE, and PSI formats. It also describes services for programmatically accessing this data, such as PSICQUIC and PSISCORE, which allow querying multiple source repositories using a common standard. Clients have been created to access these services and visualize interaction networks.
Recording and Reasoning Over Data Provenance in Web and Grid ServicesMartin Szomszor
The document discusses the importance of provenance data in distributed computing environments like grids and web services. It proposes a service-oriented architecture and data model for capturing and querying provenance information. The architecture includes a provenance service for storage and analysis of provenance data gathered during workflow executions across multiple services and systems.
SEEK is an open-source platform for scientists to store, share, and collaborate on heterogeneous data, models, and standard operating procedures. It was developed by researchers in the UK and Germany to facilitate data sharing across multi-group projects. SEEK allows scientists to organize experiments and data using ISA-TAB standards, interlink related assets, and control access to assets at various stages of research from private to public. Key features include hosting and simulating SBML models, exploring and annotating spreadsheets, and finding expertise and collaborators through people profiles.
Role of PIDs in connecting scholarly worksOpenAIRE
Presentation from a joint webinar FREYA and OpenAIRE: New developments in the field of Persistent Identifiers by Dr. Amir Aryani, Director, Research Graph Foundation
The OntoChem IT Solutions GmbH ...
... was founded in 2015 as a purely IT-oriented offshoot of the OntoChem GmbH. Even before we had many years of experience and it has always been our mission to provide added value to our customers by helping them to navigate today’s complex information world by developing cognitive computing solutions, indexing intranet and internet data and applying semantic search solutions for pharmaceutical, material sciences and technology driven businesses.
We strive to support our customers with the most useful tools for knowledge discovery possible, encompassing up-to-date data sources, optimized ontologies and high-throughput semantic document processing and annotation techniques.
We create new knowledge from structured and unstructured data by extracting relationships thereby exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.
We aim at an unprecedented, machine understanding of text and subsequent knowledge extraction and inference. The application of our methods towards chemical compounds and their properties supports our customers in generating intellectual property and their use as novel therapeutics, agrochemical products, nutraceuticals, cosmetics and in the field of novel materials.
It's our mission to provide added value to customers by:
developing and applying cognitive computing solutions
creating intranet and internet data indexing and semantic search solutions
Big Data analytics for technology driven businesses
supporting product development and surveillance.
We deliver useful tools for knowledge discovery for:
creating background knowledge ontologies
high-throughput semantic document processing and annotation
knowledge mining by extracting relationships
exploiting the full potential of full-text documents & databases while also scanning social media, news flows and analyzing web-pages.
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
Over 15 years ago, Sir Tim Berners Lee proclaimed the founding of an exciting new future involving intelligent agents operating over smarter data in order to perform complex tasks at the behest of their human controllers. At the heart of this vision lies an uneasy alliance between tedious formal knowledge representations and powerful analytics over big, but often messy data. Bio2RDF, our decade old open source project to create Linked Data for the life sciences, has weaved emergent Semantic Web technologies such as ontologies and Linked Data to generate FAIR - Findable, Accessible, Interoperable, and Reusable - data in the form of billions of machine accessible statements for use in downstream biomedical discovery.
This revolution in data publication has been strengthened by action from global bioinformatics institutions such as the NCBI, NCBO, EBI, and DBCLS. Notably, NCBI's PubChem has successfully coupled large scale data integration with community-based standards to offer a remakable biochemical knowledge resource amenable to data hungry discovery tools. Yet, in the face of increasing pressure from researchers, funders, and publishers, will these approaches be sufficient for growing and maintaining a comprehensive knowledge graph that is inclusive of all biomedical research?
This is an overview of the Data Biosphere Project, its goals, its architecture, and the three core projects that form its foundation. We also discuss data commons.
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...OpenAIRE
Presentation from a joint FREYA and OpenAIRE webinar "New developments in the field of Persistent Identifiers" (PIDs) that covers OpenAIRE Content Acquisition Policy, role of PIDs in OpenAIRE, OpenAIRE Guidelines and their objectives, use of PIDs for different kinds of entities and provides some examples.
What is Data Commons and How Can Your Organization Build One?Robert Grossman
1. Data commons co-locate large biomedical datasets with cloud computing infrastructure and analysis tools to create shared resources for the research community.
2. The NCI Genomic Data Commons is an example of a data commons that makes over 2.5 petabytes of cancer genomics data available through web portals, APIs, and harmonized analysis pipelines.
3. The Gen3 platform is an open source software stack for building data commons that can interoperate through common APIs and data models to support reproducible, collaborative research across projects.
Presentation from a joint FREYA and OpenAIRE webinar "New developments in the field of Persistent Identifiers" (PID) on FREYA-WP3: New PID developments by Ketil Koop-Jakobsen, PANGAEA, Bremen University, Germany
THOR Workshop - Persistent Identifier LinkingMaaike Duine
The document discusses challenges around linking data and contributors at different levels of granularity. It addresses issues like how to cite datasets at the right level of specificity, how to link datasets and contributors when a dataset may be part of a larger collection, and how to handle multiple versions of datasets. It also covers linking databases across domains and implementing these linkages using persistent identifiers.
A presentation to the New Year's Event for Maastricht University's Knowledge Engineering @ Work Program. https://www.maastrichtuniversity.nl/news/kework-first-10-students-academic-workstudy-track-graduate
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
Written and presented by Tom Ingraham (F1000), at the Reproducible and Citable Data and Model Workshop, in Warnemünde, Germany. September 14th -16th 2015.
Towards metrics to assess and encourage FAIRnessMichel Dumontier
With an increased interest in the FAIR metrics, there is need to develop tools and appraoches that can assess the FAIRness of a digital resource. This talk begins to explore some ideas in this space, and invites people to participate in a working group focused on the development, application, and evaluation of FAIR metric efforts.
How DataCite and Crossref Support Research Data Sharing - Crossref LIVE HannoverCrossref
Britta Dreyer from DataCite presents on how DataCite and Crossref collaboratively support research data sharing. Presented at Crossref LIVE Hannover, June 27th 2018.
Crowdsourcing Lay Summaries: Bridging the Gap in Health Researchmonicaduke
Liz Lyon presentsCrowdsourcing Lay Summaries: Bridging the Gap in Health Research at the Patients Participate !Workshop at the British Library, 17th June 2011
The document discusses value added metrics that can be used in HR to drive business results. It suggests metrics related to talent acquisition like time to fill and new hire performance, retention like turnover rates and exit interviews, employee engagement, goal alignment, competency gaps, and workforce planning like retirement rates. HR metrics should be aligned with business strategy, involve financial measures, cause actions or behavior change, and indicate future trends. They should also be benchmarked, involve senior management, and be limited to the most important 8-10 metrics.
This document outlines various innovative advertising methods used by JCDecaux, including product displays of Nike sneakers and new Coca-Cola Light packaging, push-to-vote and smell interactive ads, window displays, di-cut ads, Wii video games, vending machines, auto curtains for product display, decorative lights, U snap apps for iPhone, and bicycle ads available in France.
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Simons orcid forum canberra 2018-PIDs in researchARDC
The value of persistent identifiers in research - Natasha Simons (ARDC) & Josh Brown (ORCID) - presented at the ORCID forum in Canberra 6th September 2018
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk discusses findings from an analysis of data sharing and citation policies in Open Access journals and describes a set of novel tools for open data publication in open access journal workflows. Bring your lunch and enjoy a discussion fit for scholars, Open Access fans, and students alike.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology.
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
Crediting informatics and data folks in life science teamsCarole Goble
Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research: How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels
The People Behind Research Software crediting from the informatics, technical point of view
Linking Data to Publications through Citation and Virtual ArchivesMicah Altman
This document discusses linking data to publications through citation and virtual archives. It argues that data citation and sharing infrastructure are necessary for scientific reproducibility and open data. It outlines elements of data management plans and requirements for data sharing infrastructure, including persistence, provenance, access control and incentives. The document advocates for data citations as first-class objects and emerging practices like assigning DOIs to datasets. It presents several use cases for the Dataverse network, a virtual archive designed for research data sharing through federated and organizational models.
Author's workflow and the role of open accessPaola Gargiulo
This is a presentation made at 10th Fiesole Collection Development Retreat Series. The goal of the presentation is to describe some tools and solutions to make self-archiving easier for authors.
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
The document discusses using a collaborative linked data framework to explore a data landscape. It describes how the framework helps scientists access and integrate disparate data sources to answer translational research questions. Key components of the framework include a semantic wiki for cataloging data sources, linking data concepts, querying across sources, and visualizing relationships between sources. The goal is to provide scientists with flexible tools to discover and leverage relevant data without needing expertise in data management.
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
Slides from Friday 3rd August - Data in the Scholarly Communications Life Cycle Course which is part of the FORCE11 Scholarly Communications Institute.
Presenter - Natasha Simons
The document discusses recommendations from a workshop on peer review of research data. It focuses on three key areas:
1. Connecting data review with data management planning by requiring data sharing plans, ensuring adequate funding for data management, and refusing publication without clear data access.
2. Connecting scientific and technical review with data curation by linking articles and data with versioning, avoiding duplicate review efforts, and addressing issues found in data.
3. Connecting data review with article review by requiring methods/software information, providing review checklists, ensuring data access for reviewers, and permanent dataset identifiers from repositories.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
Metadata and Semantics Research Conference, Manchester, UK 2015
Research Objects: why, what and how,
In practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object? How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples
I’ll present our practical experiences of the why, what and how of Research Objects.
The document discusses the increasing scale and complexity of knowledge generation in science domains like astronomy and medicine over recent centuries. It argues that knowledge generation can be viewed as a systems problem involving many actors and processes. The document proposes a service-oriented approach using web services as an integrating framework to address challenges of scale, complexity, and distributed collaboration in e-Science. Key challenges discussed include semantics, documentation, scaling issues, and sociological factors like incentives.
Open Archives Initiative Object Reuse and Exchangelagoze
This document discusses infrastructure to support new models of scholarly publication by enabling interoperability across repositories through common data modeling and services. It proposes building blocks like repositories, digital objects, a common data model, serialization formats, and core services. This would allow components like publications and data to move across repositories and workflows, facilitating reuse and new value-added services that expose the scholarly communication process.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Presentation of the OECD Artificial Intelligence Review of Germany
Mduke sagecite-jisc-march11
1. UKOLN is supported by: Monica Duke Project Manager/Researcher 29 th March 2011 Aston Business School SageCite Project http://blogs.ukoln.ac.uk/sagecite/ #sagecite [email_address]
12. DataCite sagecitedemorepository Citable data Produces Register, submit metadata Generate landing page for data and store DOIs Mint DataCite API Google API Resolve to landing page Taverna workflow The relationships between data via DataCite DOIs with tools are captured by the provenance (OPM) produced by Taverna 1 2 3 4 5 6 Workflow metadata For referring to data reported in the provenance
16. Workflow Provenance Gene URI myExperiment URI Data sets in GEO database Register Submit Open Provenance Model W3C Provenance Incubator RDF Data creator ORCID DOI, Pubmed Id Scientific publication Publication? Sage bionetwork model (Co-expn, Bayesian) DataCite Workflow user ORCID
17. Identity Workshop prep-meeting, Helsinki, January 27 2011 Publishing a journal article Publishing a dataset G. A. Thorisson, University of Leicester
18.
19.
20. Identity Workshop prep-meeting, Helsinki, January 27 2011 Centrally-managed informatics infrastructure: i) for researchers to manage & use profile ii) for tracking author-to-publication attribution links iii) interaction with other systems (e.g. publishers, digital libraries ORCID ID: G-1442-2009 J. Smith, Univ. North Pole ORCID ID: D-2400-2010 J. Smith, Luthor Corporation ORCID ID: B-1242-2010 G. Thorisson, Univ. Leicester G. A. Thorisson, Univ. Leicester G. A. Thorisson, Cold Spring Harbor Lab. G. A. Thorisson, University of Leicester
21.
22.
Editor's Notes
One of the CLIP group of projects funded by JISC Focus of talk is on the domain as some of the citation issues will be common with other projects. Concentrating on Sage and its data and some of the implications. Not covering everything done in the project.
Holy Trinity as written up in the proposal.
There is something missing from the triangle Cuts across the 3 areas of data, process, publicaton Central to issues of attribution and credit That’s the contributors! Will come back to them later.
This is an overview of a more complex process (simplified view). Does not show original sources which enter the process. Each part can be looked at in more detail – example coming up. Different people involved as each stage is specialized.
This is just one of the stages from the previous diagram. We have a version of this for each of the stages, not enough time to go into each of the stages. Each stage has input and ouput, tools employed e.g. r scripts
Previous slide was a simplified view, actual process is broken down into a workflow, with configuration details, actual scripts, input and output.
Main contribution has been to capture this process, document it better via Taverna, makeit more understandable and re-usable. Understanding the domain before we can ask the questions about citation. Lessons Learned. Publications have many gaps, perl scripts not very user friendly, working from a document shared by SageBionetworks. Based on a visit to Sage Bionetworks, funded by Sage, built relationships and dialogue leading to data sharing
Moving on to a slightly diffferent perspective from the domain to the general citation question that we start to address. Started to think about how citation happens. Scenarios on the blog. For each of these stages we will think about the questions that our example throws. I cite others – input data is derived from somewhere I make my work citable – main work of the project (Taverna) Credit – motivation – least addressed so far in the project.
Identifying the contributor is an issue e.g. the geographicla area of work, may need to identify the organisation that funded. Does the modification change the original, how do I preserve the link.
This has been the main area of work for Sagecite – next slide show the role of taverna.
Assign DataCite DOIs Generate metadata – open question; linked data approach Store is temporary for the purposes of demonstrator
Extra steps have been added to the workflow, within Taverna
More slides on the last question…
This was a diagram we had early on in the project - What about other types of publication?
I presented this at SOLO earlier this month, .. [eda segja annarstadard, i recent & ongoing ORCID activities] I reach for Geoff Bilder’s slides again and nick a few things what I want to do here is replace Geoff’s silly little dude with glasses with my much much cooler ‘academic dude’, as we call him in the office. ### SKIPTA ### I want to show you in the next few slides a hypothetical scenario involving this dude, representing me, submitting a dataset to this digital repository which is a companion to Geoff’s Psychoceramics Review journal this will demonstrate some of the practicalities of how we might actually use ORCID in data publication.
Coming back to those people….. ORCID addressed by a presentation later on, focussed effort on discussions, bulding scenarios.
Have started disussions, no service yet – how tools like myExperiment and Taverna which are on the desktop and manage identity (not global) work with a service like ORCID to exchange information including for validation.
Finally, an advert……..
Collaborators on some of the projects, provided some of the slides, Sage funded the visit, shared data and documentation.