A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

•Download as PPTX, PDF•

0 likes•509 views

Linked Life Sciences Provenance Linked Open Data Query Engine SPARQL Query Query Federation a-posteriori Integration Query Engine

Healthcare

A Provenance assisted Roadmap for
Life Sciences Linked Open Data Cloud
Ali Hasnain et. al
Insight Center for Data Analytics
National University of Ireland, Galway

Agenda
• Motivation
• Linked Life Sciences Roadmap
• Cataloguing and Linking
• Extending Catalogue – Metadata & Provenance
• Query Engine
• Results

Motivation
• Biomedical Data is heterogeneous and spread across
multiple sources (SPARQL endpoints).
• Navigation is a challenge.
• Containing trillions of triples and represented with
insufficient vocabulary reuse.
• Biologists sometimes want to get more information
regarding the data including its source, creator,
publisher and also statistics with respect to its size
(Metadata & Provenance).
3

How to deal heterogeneous data?
DrugBank
DailyMed
CheBI,
KEGG
Reactome
Sider
BioPax
Medicare

We want to query the content, not the source
Proteins
Molecules
Genes
Diseases

A Linked Life Sciences Roadmap
Proteins
Molecules
Genes
Diseases
:Protein
:Molecule
:Gene
:Disease
Uniprot
PDB
Pfam PROSITE
ProDom
Uniref
UniPark Daily
medDrug
Bank ChemBL
Pub
Chem KEGG
Gene
Ontology
GeneID
Affy
metrix
Homo
gene
MGI
Disea
some
SIDER

2- Possible Solutions
• To assemble queries over multiple graphs at
multiple endpoints, either:
• vocabularies and ontologies are reused, Or
• translation maps between different terminologies are
created (“a posteriori integration”)

Describing DataSets- an Extract from Catalogue

Extending Catalogue – Metadata & Provenance

Query Engine
http://srvgal86.deri.ie:8000/graph/Granatum

SPARQL Endpoints returning results per query

Runtimes taken by different queries
(Max, Min, Average, Median)

A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

We describe our on-going work in using the semantic web in support of ecological informatics, and demonstrate a distributed platform for constructing end-to-end use cases. Specifically, we describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface which allows scientists to semi-automatically construct distributed datasets relevant to the queries they want to ask. ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with Triple Shop and other semantic web resources.

MedlinePrasanthperceptron

MBBS year 1 presentation1

PaulaFunnell

Using and extending Darwin Core for structured attribute data

Cyndy Parr

Animal Diversity Web - building and using the database

adwquaardvark

Presentation for the San Francisco #IDCC14 conference (http://www.dcc.ac.uk/events/idcc14/day-two-papers). The presentation covers publishing zooarchaeology data with Open Context (http://opencontext.org) to study the spread of farming from the Near East to Europe through Anatolia. It looks at editorial processes, linked data annotation, and other workflow concerns relating to making raw data more usable for comparative analysis.

MEDLINE

VISHNUMAYA R S

Andy J Gap analysis and crop wild relatives bellagio sept 2010

Decision and Policy Analysis Program

Biocuration 2014 - The Resource Identification Initiativemhaendel

Highly dimensional data_20160926

Laura Clarke

The CATE Project

Kehan Harman

Presentation by Simon Mayo at the KikForum Abstract: As part of the CATE project we are developing keys in Lucid 3 to the genera of Araceae, and to the genus Anthurium (ca. 800spp.), Arum and Philodendron. The key to Arum is already online. These keys will be incorporated into a web-based taxonomic revision of the Araceae family as the plant model group for the project. Anthurium presents a particular challenge as it is a very large and difficult genus, within which it is currently nearly impossible for non-specialists to determine plants to species. We hope the key will go some way to solving this problem.

A Look into Closed Access Capitalism and LIS Publishing Practices

Robyn Hall

Abstract: Drawing on data from an investigation of 126 academic, peer-reviewed journals in library and information science (LIS), this presentation will discuss ways that those working in LIS can take back control over how their work is disseminated and shared online. For this to happen, however, members of the profession need to recognize and consciously grapple with the ubiquitous capitalist system that informs so many of the services, functions, and expectations that are tied to the profession.

Plantwise presentation MIT RDRoland D.J. Dietz - ???

SciVal Biotechnology Portal

Alberto Zigoni

Biomedical data collection for mass gathering research and evaluation: A revi...

Jamie Ranse

Beacon Network: A System for Global Genomic Data Sharing

Miro Cupak

Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data

Miro Cupak

Examples of Ontology Applications

AIMS (Agricultural Information Management Standards)

Examples of ontology applicationsAIMS (Agricultural Information Management Standards)

Processing Life Science Data at Scale - using Semantic Web Technologies

Syed Muhammad Ali Hasnain

The life sciences domain has been one of the early adopters of linked data and, a considerable portion of the Linked Open Data cloud is comprised of datasets from Life Sciences Linked Open Data (LSLOD). The deluge of biomedical data in the last few years, partially caused by the advent of high-throughput gene sequencing technologies, has been a primary motivation for these efforts. This success has lead to the growth in size of data sets and to the need for integrating multiples of these data-sets. This growth requires large scale distributed infrastructure and specific techniques for managing large linked data graphs. Especially in combination with Semantic Web and Linked Data technologies these promises to enable the processing of large as well as semantically heterogeneous data sources and the capturing of new knowledge from those. In this tutorial we present the state of the art in large data processing, as well as the amalgamation with Linked Data and Semantic Web technologies for better knowledge discovery and targeted applications. We aim to provide useful information for the Knowledge Acquisition research community as well as the working Data Scientist.

6 Dimensions of Quality Management Maturity

LNSResearch

What's hot

Global Burden of Animal Diseases: Disease prioritization theme

ILRI

A state-of-the-art biorepository: Challenges and opportunities

ILRI

Azizi biorepository: Challenges and opportunities

ILRI

International Journal of Advances in Biology (IJAB)

ijabjournal

FAIRness and Accountability BioIT 2019 FAIR track

Helena Deus

Bioschemas at bio hackathon 2017

Bioschemas

Idcc kansa-kansa-arbuckle

Eric Kansa

MEDLINE

VISHNUMAYA R S

Andy J Gap analysis and crop wild relatives bellagio sept 2010

Decision and Policy Analysis Program

Biocuration 2014 - The Resource Identification Initiativemhaendel

Highly dimensional data_20160926

Laura Clarke

The CATE Project

Kehan Harman

A Look into Closed Access Capitalism and LIS Publishing Practices

Robyn Hall

Plantwise presentation MIT RDRoland D.J. Dietz - ???

SciVal Biotechnology Portal

Alberto Zigoni

Biomedical data collection for mass gathering research and evaluation: A revi...

Jamie Ranse

Beacon Network: A System for Global Genomic Data Sharing

Miro Cupak

Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data

Miro Cupak

Examples of Ontology Applications

AIMS (Agricultural Information Management Standards)

Examples of ontology applicationsAIMS (Agricultural Information Management Standards)

What's hot (20)

Global Burden of Animal Diseases: Disease prioritization theme

A state-of-the-art biorepository: Challenges and opportunities

Azizi biorepository: Challenges and opportunities

International Journal of Advances in Biology (IJAB)

FAIRness and Accountability BioIT 2019 FAIR track

Bioschemas at bio hackathon 2017

Idcc kansa-kansa-arbuckle

MEDLINE

Andy J Gap analysis and crop wild relatives bellagio sept 2010

Biocuration 2014 - The Resource Identification Initiative

Highly dimensional data_20160926

The CATE Project

A Look into Closed Access Capitalism and LIS Publishing Practices

Plantwise presentation MIT RD

SciVal Biotechnology Portal

Biomedical data collection for mass gathering research and evaluation: A revi...

Beacon Network: A System for Global Genomic Data Sharing

Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data

Examples of Ontology Applications

Examples of ontology applications

Viewers also liked

Processing Life Science Data at Scale - using Semantic Web Technologies

Syed Muhammad Ali Hasnain

6 Dimensions of Quality Management Maturity

LNSResearch

It strategy for life sciences david royle

David Royle

Reinventing Life Sciences: How emerging ecosystems fuel innovation

IBM in Healthcare

Persistent disruptive forces in life sciences now threaten traditional business models over the medium to long term. While high rates of return and strong performance may have masked these forces in the past, today they must be recognized and addressed. Organizations need new ways to continue to thrive despite such hurdles. This latest research study by IBM Instritute of Business Value in collaboration with the University of California, San Diego and Oxford Economics, led to a target innovation model that can guide organizations to discover operational efficiencies, nurture new growth and get positioned more strategically in the new life sciences and healthcare ecosystem.

An IBM Perspective: Life Sciences in the Cloud

IBM in Healthcare

Cloud computing promises to fundamentally transform the global life sciences industry. But most life sciences organizations have only just started to understand the power of cloud to not only drive efficiency, but also to redefine collaboration, partnering, and business models. Life sciences organizations are hungry for the capabilities that cloud can deliver, to meet new competitive pressures and ever-expanding consumer expectations. This new IBM Institute for Business Value (IBV) Cloud point-of-view (POV) for the life sciences industry explores the opportunities and implications of cloud computing for global life sciences companies. It provides a roadmap to formulate and execute cloud strategies.

Gathering Alternative Surface Forms for DBpedia Entities

Heiko Paulheim

Wikipedia is often used a source of surface forms, or alternative reference strings for an entity, required for entity linking, disambiguation or coreference resolution tasks. Surface forms have been extracted in a number of works from Wikipedia labels, redirects, disambiguations and anchor texts of internal Wikipedia links, which we complement with anchor texts of external Wikipedia links from the Common Crawl web corpus. We tackle the problem of quality of Wikipedia-based surface forms, which has not been raised before. We create the gold standard for the dataset quality evaluation, which reveales the surprisingly low precision of the Wikipedia-based surface forms. We propose filtering approaches that allowed boosting the precision from 75% to 85% for a random entity subset, and from 45% to more than 65% for the subset of popular entities. The filtered surface form dataset as well the gold standard are made publicly available.

Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...

Stefan Dietze

Evaluating Named Entity Recognition and Disambiguation in News and Tweets

Marieke van Erp

Named entity recognition and disambiguation are important for information extraction and populating knowledge bases. Detecting and classifying named entities has traditionally been taken on by the natural language processing community, whilst linking of entities to external resources, such as DBpedia and GeoNames, has been the domain of the Semantic Web community. As these tasks are treated in different communities, it is difficult to assess the performance of these tasks combined. We present results on an evaluation of the NERD-ML approach on newswire and tweets for both Named Entity Recognition and Named Entity Disambiguation. Presented at CLIN 24: http://clin24.inl.nl/ http://nerd.eurecom.fr https://github.com/giusepperizzo/nerdml

Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Sören Auer

Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into a very promising candidate for addressing one of the biggest challenges of computer science: the exploitation of the Web as a platform for data and information integration. To translate this initial success into a world-scale reality, a number of research challenges need to be addressed: the performance gap between relational and RDF data management has to be closed, coherence and quality of data published on the Web have to be improved, provenance and trust on the Linked Data Web must be established and generally the entrance barrier for data publishers and users has to be lowered. This tutorial will discuss approaches for tackling these challenges. As an example of a successful Linked Data project we will present DBpedia, which leverages Wikipedia by extracting structured information and by making this information freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as the recently launched DBpedia benchmark.

NLP todoRohit Verma

DBpedia: A Public Data Infrastructure for the Web of Data

Sebastian Hellmann

DBpedia InsideOut

Cristina Pattuelli

Federated SPARQL query processing over the Web of Data

Muhammad Saleem

Linked Data FragmentsRuben Verborgh

LDQL: A Query Language for the Web of Linked Data

Olaf Hartig

Fast Approximate A-box Consistency Checking using Machine Learning

Heiko Paulheim

Ontology reasoning is typically a computationally intensive operation. While soundness and completeness of results is required in some use cases, for many others, a sensible trade-off between computation efforts and correctness of results makes more sense. In this paper, we show that it is possible to approximate a central task in reasoning, i.e., A-box consistency checking, by training a machine learning model which approximates the behavior of that reasoner for a specific ontology. On four different datasets, we show that such learned models constantly achieve an accuracy above 95% at less than 2% of the runtime of a reasoner, using a decision tree with no more than 20 inner nodes. For example, this allows for validating 293M Microdata documents against the schema.org ontology in less than 90 minutes, compared to 18 days required by a state of the art ontology reasoner.

Applying Linked Open Data to Public Procurement

Jindřich Mynarz

Exploiting the query structure for efficient join ordering in SPARQL queries

Luiz Henrique Zambom Santana

Data Mining with Background Knowledge from the Web - Introducing the RapidMin...

Heiko Paulheim

Many data mining problems can be solved better if more background knowledge is added: predictive models can become more accurate, and descriptive models can reveal more interesting findings. However, collecting and integrating background knowledge is tedious manual work. In this paper, we introduce the RapidMiner Linked Open Data Extension, which can extend a dataset at hand with additional attributes drawn from the Linked Open Data (LOD) cloud, a large collection of publicly available datasets on various topics. The extension contains operators for linking local data to open data in the LOD cloud, and for augmenting it with additional attributes. In a case study, we show that the prediction error of car fuel consumption can be reduced by 50% by adding additional attributes, e.g., describing the automobile layout and the car body configuration, from Linked Open Data.

Unsupervised Extraction of Attributes and Their Values from Product Description

Rakuten Group, Inc.

Keiji Shinzato and Satoshi Sekine 17th Oct. 2013 The 6th International Joint Conference on Natural Language Processing This slide shows an unsupervised method for extracting product attributes and their values from an e-commerce product page. Previously, distant supervision has been applied for this task, but it is not applicable in domains where no reliable knowledge base (KB) is available. Instead, the proposed method automatically creates a KB from tables and itemizations embedded in the product’s pages. This KB is applied to annotate the pages automatically and the annotated corpus is used to train a model for the extraction. Because of the incompleteness of the KB, the annotated corpus is not as accurate as a manually annotated one. Our method tries to filter out sentences that are likely to include problematic annotations based on statistical measures and morpheme patterns induced from the entries in the KB. The experimental results show that the performance of our method achieves an average F score of approximately 58.2 points and that filters can improve the performance.

Viewers also liked (20)

Processing Life Science Data at Scale - using Semantic Web Technologies

6 Dimensions of Quality Management Maturity

It strategy for life sciences david royle

Reinventing Life Sciences: How emerging ecosystems fuel innovation

An IBM Perspective: Life Sciences in the Cloud

Gathering Alternative Surface Forms for DBpedia Entities

Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...

Evaluating Named Entity Recognition and Disambiguation in News and Tweets

Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

NLP todo

DBpedia: A Public Data Infrastructure for the Web of Data

DBpedia InsideOut

Federated SPARQL query processing over the Web of Data

Linked Data Fragments

LDQL: A Query Language for the Web of Linked Data

Fast Approximate A-box Consistency Checking using Machine Learning

Applying Linked Open Data to Public Procurement

Exploiting the query structure for efficient join ordering in SPARQL queries

Data Mining with Background Knowledge from the Web - Introducing the RapidMin...

Unsupervised Extraction of Attributes and Their Values from Product Description

Similar to A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...

Fiona Nielsen

Workshop presentation on finding and accessing human genomics data for research. Including statistics of publicly available data sources and tips on how to save time in your workflow of data access. Organised in collaboration between DNAdigest and Open Data Cambridge. Read more about our work: http://DNAdigest.org http://repositive.io https://uk.linkedin.com/in/fionanielsen http://www.data.cam.ac.uk

Workshop finding and accessing data - fiona - lunteren april 18 2016

Fiona Nielsen

Workshop presentation on finding and accessing human genomics data for research. Including statistics of publicly available data sources and tips on how to save time in your workflow of data access. Presented at BioSB2016, pre-conference PhD retreat for young researchers in bioinformatics and systems biology at Congrescentrum De Werelt in Lunteren. #BioSB2016 #BioSB16 Link to event: http://www.youngcb.nl/events/biosb-phd-retreat-2016/ Read more about my work: http://DNAdigest.org http://repositive.io https://uk.linkedin.com/in/fionanielsen

AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...

Peter McQuilton

A 45 minute webinar presented to the AMIA (American Medical Informatics Association - www.amia.org) in May 2016 on BioSharing, a curated, searchable portal of inter-related data standards, databases, and policies in the life, environmental and biomedical sciences. We cover how we describe standards, how one can search using our simple, advanced and faceted search, how our wizard can guide you, and how our recommendations from journal data policies can aid your selection of metadata standards and repositories for your data.

Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...

Frederik van den Broek

Slides from my talk at the ACS CINF Symposium on Collaborations & Data Sharing in Rare & Orphan Disease Drug Discovery on 31 March 2019 in Orlando. Abstract: For the pharmaceutical industry as a whole, addressing the challenge of rare or orphan diseases is high on the agenda. But for the patients and their families, rare diseases can be very isolating and it can often feel like the potential for new treatments is low. One avenue for potential treatments is to identify drug repurposing candidates for the rare disease in question. This talk will give an overview of various collaborative projects undertaken in the last few years, which involved the combination, normalisation and analysis of data from various disparate sources, including some valuable lessons learnt along the way.

Mobilizing informational resources webinar

Ann-Marie Roche

Quantifying the content of biomedical semantic resources as a core for drug d...

Syed Muhammad Ali Hasnain

The biomedical research community is providing large-scale data sources to enable knowledge discovery from the data alone, or from novel scientific experiments in combination with the existing knowledge. Increasingly semantic Web technologies are being developed and used including ontologies, triple stores and combinations thereof. The amount of data is constantly increasing as well as the complexity of data. Since the data sources are publicly available, the amount of content can be derived giving an overview on the accessible content but also on the state of the data representation in comparison to the existing content. For a better understanding of the existing data resources, i.e.\ judgments on the distribution of data triples across concepts, data types and primary providers, we have performed a comprehensive analysis which delivers an overview on the accessible content for semantic Web solutions. It can be derived that the information related to genes, proteins and chemical entities form the center, whereas the content related to diseases and pathways forms a smaller portion. Further data relates to dietary content and specific questions such as cancer prevention and toxicological effects of drugs.

The Missing Link-The Evolving Current State of Linked Data for Serials-Lauruhn

NASIG

Linked data may hold the potential to solve some classic serials dilemmas like latest vs. successive entry, or single vs. multiple records for print and online. How do these hopes mesh with the evolving current state of linked data projects in the commercial and library sector as well as with LC’s Bibframe initiative? The speakers will provide three different perspectives. An “early experimenter” and member of the Bibframe group modeling serials will discuss her experiences and thoughts on future directions. A publisher from a company that has reorganized some of its infrastructure and processes to facilitate linked data will share the goals and provide examples of the benefits of that project. Finally, the head of the U.S. ISSN Center will take an ISSN perspective as well as compare international work modeling serials according to FRBR-OO (object-oriented) with the Bibframe serials modeling effort. Audience input will be solicited in order to provide an exchange of ideas and viewpoints. (moderated by Laurie Kaplan) Michael Lauruhn Disruptive Technology Director, Elsevier Labs See accompanying presentation by Nancy Fallgren

ISB Prosperity Partnership Presentation by John AitchisonInstitute for Systems Biology

Biosample exchanges – the past, the current and the future – how do we make i...

Pistoia Alliance

NCBO haendel talk 2013

mhaendel

6-005-1430-Keeppanasserilmed20su

JALANov2000Ellie Nawara

The Role of Libraries in Data Management and Curation

Nicole Vasilevsky

The Role of Libraries in Data Management and Curation, presented at the American Library Association conference in Las Vegas, NV, 07/29/14. Abstract: As increasing amounts of data are being generated, applying best practices in handling data is important, and librarians are well poised to assist users. During this session, we will discuss the role of libraries in assisting with data management, application of metadata, ontologies, data standards, and the publication of data in repositories and on the Semantic Web. This talk will describe best data practices and engage the attendees in interactive activities to demonstrate these principles.

The Learning Health System: Thinking and Acting Across Scales

Philip Payne

A Learning Health System (LHS) can be defined as an environment in which knowledge generation processes are embedded into daily clinical practice in order to continually improve the quality, safety, and outcomes of healthcare delivery. While still largely an aspirational goal, the promise of the LHS is a future in which every patient encounter is an opportunity to learn and improve that patient’s care, as well as the care their family and broader community receives. The foundation for building such an LHS can and should be the Electronic Health Record (EHR), which provides the basis for the comprehensive instrumentation and measurement of clinical phenotypes, as well as a means of delivering new evidence at the patient- and population levels. In this presentation, we will explore the ways in which such EHR-derived phenotypes can be combined with complementary data across a spectrum from biomolecules to population level trends, to both generate insights and deliver such knowledge in the right time, place, and format, ultimately improving clinical outcomes and value.

Biological data bioinformatics

AakifahAmreen

IRDiRC: progress and expectations

Canadian Organization for Rare Disorders

dkNET Poster Experimental Biology 2019

dkNET

Introduction to Biological database ppt(1).pptx

RAJESHKUMAR428748

Amia tb-review-08

Russ Altman

Biomedical Literature

Arete-Zoe, LLC

ASSESSMENT OF BIOMEDICAL LITERATURE Components of internal and external validity of controlled clinical trials Internal validity — extent to which systematic error (bias) is minimized in clinical trials Selection bias: biased allocation to comparison groups Performance bias: unequal provision of care apart from treatment under evaluation Detection bias: biased assessment of outcome Attrition bias: biased occurrence and handling of deviations from protocol and loss to follow up Requirements, needs Planning, direction Information collection Information Assessment - Evaluation for accuracy, correctness, relevance, usefulness - Source reliability assessment (competency and past behavior based) - Bias assessment (motivators, interests, funding, objectives) - Conflicts of interest - Sources of funding, important business relationships - Grading of individual items (study, report, analysis, article) Collation of information - Exclusion of irrelevant, incorrect, and useless information -Arrangement of information in a form which enables real-time analysis - System for rapid retrieval of information External validity — extent to which results of trials provide a correct basis for generalization to other circumstances Patients: age, sex, severity of disease and risk factors, comorbidity Treatment regimens: dosage, timing and route of administration, type of treatment within a class of treatments, concomitant treatments Settings: level of care (primary to tertiary) and experience and specialization of care provider Modalities of outcomes: type or definition of outcomes and duration of follow up

Similar to A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud (20)

Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...

Workshop finding and accessing data - fiona - lunteren april 18 2016

AMIA Webinar - BioSharing - Mapping the landscape of standards in the life sc...

Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...

Mobilizing informational resources webinar

Quantifying the content of biomedical semantic resources as a core for drug d...

The Missing Link-The Evolving Current State of Linked Data for Serials-Lauruhn

ISB Prosperity Partnership Presentation by John Aitchison

Biosample exchanges – the past, the current and the future – how do we make i...

NCBO haendel talk 2013

6-005-1430-Keeppanasseril

JALANov2000

The Role of Libraries in Data Management and Curation

The Learning Health System: Thinking and Acting Across Scales

Biological data bioinformatics

IRDiRC: progress and expectations

dkNET Poster Experimental Biology 2019

Introduction to Biological database ppt(1).pptx

Amia tb-review-08

Biomedical Literature

More from Syed Muhammad Ali Hasnain

Fair data vs 5 star open data final

Syed Muhammad Ali Hasnain

Access to biomedical data is increasingly important to enable data driven science in the research community. The Linked Open Data (LOD) principles (by Tim Berner-Lee) have been suggested to judge the quality of data by its accessibility (open data access), by its format and structures, and by its interoperability with other data sources. The objective is to use interoperable data sources across the Web with ease. The FAIR (findable, accessible, interoperable, reusable) data principles have been introduced for similar reasons with a stronger emphasis on achieving reusability. In this presentation we assess the FAIR principles against the LOD principles to determine, to which degree, the FAIR principles reuse LOD principles, and to which degree they extend the LOD principles. This assessment helps to clarify the relationship between both schemes and gives a better understanding, what extension FAIR represents in comparison to LOD. We conclude, that LOD gives a clear mandate to the openness of data, whereas FAIR asks for a stated license for access and thus includes the concept of reusability under consideration of the license agreement. Furthermore, FAIR makes strong reference to the contextual information required to improve reuse of the data, e.g., provenance information. According to the LOD principles, such meta-data would be considered interoperable data as well, however, the requirement of extending of data with meta-data does indicate that FAIR is an extension of the LOD (in contrast to the inverse).

SHARP: Harmonizing cross-workflow Provenance

Syed Muhammad Ali Hasnain

PROV has been adopted by a number of workflow systems for encoding the traces of workflow executions. Exploiting these provenance traces is hampered by two main impediments. Firstly, workflow systems extend PROV differently to cater for system-specific constructs. The difference between the adopted PROV extensions yields heterogeneity in the generated provenance traces. This heterogeneity diminishes the value of such traces, e.g. when combining and querying provenance traces of different workflow systems. Secondly, the provenance recorded by workflow systems tends to be large, and as such difficult to browse and understand by a human user. In this paper, we propose SHARP, a Linked Data approach for harmonizing cross-workflow provenance. The harmonization is performed by chasing tuple-generating and equality-generating dependencies defined for workflow provenance. This results in a provenance graph that can be summarized using domain-specific vocabularies. We experimentally evaluate the effectiveness of SHARP using a real-world omic experiment involving workflow traces generated by the Taverna and Galaxy systems.

SHARP: Harmonizing Galaxy and Taverna workflow provenance

Syed Muhammad Ali Hasnain

SHARP is a Linked Data approach for harmonizing cross-workflow provenance. In this demo, we demonstrate SHARP through a real-world omic experiment involving workflow traces generated by Taverna and Galaxy systems. SHARP starts by interlinking provenance traces generated by Galaxy and Taverna workflows and then harmonize the interlinked graphs thanks to OWL and PROV inference rules. The resulting provenance graph can be exploited for answering queries across Galaxy and Taverna workflow runs.

Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...

Syed Muhammad Ali Hasnain

Nowadays, there are plenty of text documents in different domains that have unstructured content which makes them hard to analyze automatically. In particular, in the medical domain, this problem is even more stressed and is earning more and more attention. Medical reports may contain relevant information that can be employed, among many useful applications, to build predictive systems able to classify new medical cases thus supporting physicians to take more correct and reliable actions about diagnosis and cares. It is generally hard and time consuming inferring information for comparing unstructured data and evaluating similarities between various resources. In this work we show how it is possible to cluster medical reports, based on features detected by using two emerging tools, IBM Watson and Framester, from a collection of text documents. Experiments and results have proved the quality of the resulting clusterings and the key role that these services can play.

An Approach for Discovering and Exploring Semantic Relationships between Genes

Syed Muhammad Ali Hasnain

This paper presents an approach for extracting, integrating and mining the annotations from a large corpus of gene summaries. It includes: i) a method for extracting annotations from several ontologies, mapping them into concepts and evaluating the semantic relatedness of genes, ii) the definition of a NoSQL graph database that leverages a loosely structured and multifaceted organization of data for storing concepts and their relationships, and iii) a mechanism to support the customized exploration of stored information. A prototype with a user-friendly interface fully enables users to visualize all concepts of their interest and to take advantage of their visualization for formulating biomedical hypotheses and discovering new knowledge.

Federated Query Formulation and Processing through BioFed

Syed Muhammad Ali Hasnain

A single interface for accessing life sciences (LS) data is a natural consequence to master the data deluge in this domain. The data in the LS requires integration and current integrative solutions increasingly rely on the federation of queries for distributed resources. We introduce a federated query processing system name ``BioFed", customised for LS-LOD. BioFed federates SPARQL queries over more than 130 public SPARQL endpoints.

Improving discovery in Life Sciences Linked Open Data Cloud

Syed Muhammad Ali Hasnain

Knowledge Processing with Big Data and Semantic Web Technologies

Syed Muhammad Ali Hasnain

FedViz: A Visual Interface for SPARQL Queries Formulation and Execution

Syed Muhammad Ali Hasnain

Health care and life sciences research heavily relies on the ability to search, discover, formulate and correlate data from distinct sources. Over the last decade the deluge of health care life science data and the standardisation of linked data technologies resulted in publishing datasets of great importance. This emerged as an opportunity to explore new ways of bio-medical discovery through standardised interfaces. Although the Semantic Web and Linked Data technologies help in dealing with data integration problem there remains a barrier adopting these for non-technical research audiences. In this paper we present FedViz, a visual interface for SPARQL query formulation and execution. FedViz is explicitly designed to increase intuitive data interaction from distributed sources and facilitates federated as well as non-federated SPARQL queries formulation. FedViz uses FedX for query execution and results retrieval. We also evaluate the usability of our system by using the standard system usability scale as well as a custom questionnaire, particularly designed to test the usability of the FedViz interface. Our overall usability score of 74.16 suggests that FedViz interface is easy to learn, consistent, and adequate for frequent use.

More from Syed Muhammad Ali Hasnain (9)

Fair data vs 5 star open data final

SHARP: Harmonizing cross-workflow Provenance

SHARP: Harmonizing Galaxy and Taverna workflow provenance

Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...

An Approach for Discovering and Exploring Semantic Relationships between Genes

Federated Query Formulation and Processing through BioFed

Improving discovery in Life Sciences Linked Open Data Cloud

Knowledge Processing with Big Data and Semantic Web Technologies

FedViz: A Visual Interface for SPARQL Queries Formulation and Execution

Recently uploaded

A Community health , health for prisoners

Ahmed Elmi

Artificial Intelligence to Optimize Cardiovascular Therapy

Iris Thiele Isip-Tan

Myopia Management & Control Strategies.pptx

RitonDeb1

VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...

rajkumar669520

Medical Technology Tackles New Health Care Demand - Research Report - March 2...

pchutichetpong

M Capital Group (“MCG”) predicts that with, against, despite, and even without the global pandemic, the medical technology (MedTech) industry shows signs of continuous healthy growth, driven by smaller, faster, and cheaper devices, growing demand for home-based applications, technological innovation, strategic acquisitions, investments, and SPAC listings. MCG predicts that this should reflects itself in annual growth of over 6%, well beyond 2028. According to Chris Mouchabhani, Managing Partner at M Capital Group, “Despite all economic scenarios that one may consider, beyond overall economic shocks, medical technology should remain one of the most promising and robust sectors over the short to medium term and well beyond 2028.” There is a movement towards home-based care for the elderly, next generation scanning and MRI devices, wearable technology, artificial intelligence incorporation, and online connectivity. Experts also see a focus on predictive, preventive, personalized, participatory, and precision medicine, with rising levels of integration of home care and technological innovation. The average cost of treatment has been rising across the board, creating additional financial burdens to governments, healthcare providers and insurance companies. According to MCG, cost-per-inpatient-stay in the United States alone rose on average annually by over 13% between 2014 to 2021, leading MedTech to focus research efforts on optimized medical equipment at lower price points, whilst emphasizing portability and ease of use. Namely, 46% of the 1,008 medical technology companies in the 2021 MedTech Innovator (“MTI”) database are focusing on prevention, wellness, detection, or diagnosis, signaling a clear push for preventive care to also tackle costs. In addition, there has also been a lasting impact on consumer and medical demand for home care, supported by the pandemic. Lockdowns, closure of care facilities, and healthcare systems subjected to capacity pressure, accelerated demand away from traditional inpatient care. Now, outpatient care solutions are driving industry production, with nearly 70% of recent diagnostics start-up companies producing products in areas such as ambulatory clinics, at-home care, and self-administered diagnostics.

The Docs PPG - 30.05.2024.pptx..........

TheDocs

The Impact of Meeting: How It Can Change Your Life

ranishasharma67

💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...

ranishasharma67

Essential Metrics for Palliative Care Management

Care Coordinations

Explore our infographic on 'Essential Metrics for Palliative Care Management' which highlights key performance indicators crucial for enhancing the quality and efficiency of palliative care services. This visual guide breaks down important metrics across four categories: Patient-Centered Metrics, Care Efficiency Metrics, Quality of Life Metrics, and Staff Metrics. Each section is designed to help healthcare professionals monitor and improve care delivery for patients facing serious illnesses. Understand how to implement these metrics in your palliative care practices for better outcomes and higher satisfaction levels.

Telehealth Psychology Building Trust with Clients.pptx

The Harvest Clinic

Health Education on prevention of hypertension

Radhika kulvi

Hypertension is a chronic condition of concern due to its role in the causation of coronary heart diseases. Hypertension is a worldwide epidemic and important risk factor for coronary artery disease, stroke and renal diseases. Blood pressure is the force exerted by the blood against the walls of the blood vessels and is sufficient to maintain tissue perfusion during activity and rest. Hypertension is sustained elevation of BP. In adults, HTN exists when systolic blood pressure is equal to or greater than 140mmHg or diastolic BP is equal to or greater than 90mmHg. The

ICH Guidelines for Pharmacovigilance.pdf

NEHA GUPTA

The "ICH Guidelines for Pharmacovigilance" PDF provides a comprehensive overview of the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) guidelines related to pharmacovigilance. These guidelines aim to ensure that drugs are safe and effective for patients by monitoring and assessing adverse effects, ensuring proper reporting systems, and improving risk management practices. The document is essential for professionals in the pharmaceutical industry, regulatory authorities, and healthcare providers, offering detailed procedures and standards for pharmacovigilance activities to enhance drug safety and protect public health.

HEAT WAVE presented by priya bhojwani..pptx

priyabhojwani1200

Global launch of the Healthy Ageing and Prevention Index 2nd wave – alongside...

ILC- UK

The Healthy Ageing and Prevention Index is an online tool created by ILC that ranks countries on six metrics including, life span, health span, work span, income, environmental performance, and happiness. The Index helps us understand how well countries have adapted to longevity and inform decision makers on what must be done to maximise the economic benefits that comes with living well for longer. Alongside the 77th World Health Assembly in Geneva on 28 May 2024, we launched the second version of our Index, allowing us to track progress and give new insights into what needs to be done to keep populations healthier for longer. The speakers included: Professor Orazio Schillaci, Minister of Health, Italy Dr Hans Groth, Chairman of the Board, World Demographic & Ageing Forum Professor Ilona Kickbusch, Founder and Chair, Global Health Centre, Geneva Graduate Institute and co-chair, World Health Summit Council Dr Natasha Azzopardi Muscat, Director, Country Health Policies and Systems Division, World Health Organisation EURO Dr Marta Lomazzi, Executive Manager, World Federation of Public Health Associations Dr Shyam Bishen, Head, Centre for Health and Healthcare and Member of the Executive Committee, World Economic Forum Dr Karin Tegmark Wisell, Director General, Public Health Agency of Sweden

How many patients does case series should have In comparison to case reports.pdf

pubrica101

Pubrica’s team of researchers and writers create scientific and medical research articles, which may be important resources for authors and practitioners. Pubrica medical writers assist you in creating and revising the introduction by alerting the reader to gaps in the chosen study subject. Our professionals understand the order in which the hypothesis topic is followed by the broad subject, the issue, and the backdrop. https://pubrica.com/academy/case-study-or-series/how-many-patients-does-case-series-should-have-in-comparison-to-case-reports/

Introduction to Forensic Pathology course

fprxsqvnz5

Navigating Women's Health: Understanding Prenatal Care and Beyond

Aboud Health Group

ventilator, child on ventilator, newborn

Pooja Rani

Dimensions of Healthcare Quality

Naeemshahzad51

The dimensions of healthcare quality refer to various attributes or aspects that define the standard of healthcare services. These dimensions are used to evaluate, measure, and improve the quality of care provided to patients. A comprehensive understanding of these dimensions ensures that healthcare systems can address various aspects of patient care effectively and holistically. Dimensions of Healthcare Quality and Performance of care include the following; Appropriateness, Availability, Competence, Continuity, Effectiveness, Efficiency, Efficacy, Prevention, Respect and Care, Safety as well as Timeliness.

GLOBAL WARMING BY PRIYA BHOJWANI @..pptx

priyabhojwani1200

Recently uploaded (20)

A Community health , health for prisoners

Artificial Intelligence to Optimize Cardiovascular Therapy

Myopia Management & Control Strategies.pptx

VVIP Dehradun Girls 9719300533 Heat-bake { Dehradun } Genteel ℂall Serviℂe By...

Medical Technology Tackles New Health Care Demand - Research Report - March 2...

The Docs PPG - 30.05.2024.pptx..........

The Impact of Meeting: How It Can Change Your Life

💘Ludhiana ℂall Girls 📞]][89011★83002][[ 📱 ❤ESCORTS service in Ludhiana💃💦Ludhi...

Essential Metrics for Palliative Care Management

Telehealth Psychology Building Trust with Clients.pptx

Health Education on prevention of hypertension

ICH Guidelines for Pharmacovigilance.pdf

HEAT WAVE presented by priya bhojwani..pptx

Global launch of the Healthy Ageing and Prevention Index 2nd wave – alongside...

How many patients does case series should have In comparison to case reports.pdf

Introduction to Forensic Pathology course

Navigating Women's Health: Understanding Prenatal Care and Beyond

ventilator, child on ventilator, newborn

Dimensions of Healthcare Quality

GLOBAL WARMING BY PRIYA BHOJWANI @..pptx

A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

1. A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland, Galway

2. Agenda • Motivation • Linked Life Sciences Roadmap • Cataloguing and Linking • Extending Catalogue – Metadata & Provenance • Query Engine • Results

3. Motivation • Biomedical Data is heterogeneous and spread across multiple sources (SPARQL endpoints). • Navigation is a challenge. • Containing trillions of triples and represented with insufficient vocabulary reuse. • Biologists sometimes want to get more information regarding the data including its source, creator, publisher and also statistics with respect to its size (Metadata & Provenance). 3

4. How to deal heterogeneous data? DrugBank DailyMed CheBI, KEGG Reactome Sider BioPax Medicare

5. We want to query the content, not the source Proteins Molecules Genes Diseases

6. A Linked Life Sciences Roadmap Proteins Molecules Genes Diseases :Protein :Molecule :Gene :Disease Uniprot PDB Pfam PROSITE ProDom Uniref UniPark Daily medDrug Bank ChemBL Pub Chem KEGG Gene Ontology GeneID Affy metrix Homo gene MGI Disea some SIDER

7. 2- Possible Solutions • To assemble queries over multiple graphs at multiple endpoints, either: • vocabularies and ontologies are reused, Or • translation maps between different terminologies are created (“a posteriori integration”)

8. a-priori v.s a-posteriori Integration 8

9. Cataloguing and Linking 9

10. Describing DataSets- an Extract from Catalogue

11. Extending Catalogue – Metadata & Provenance

12.

13.

14. Query Engine http://srvgal86.deri.ie:8000/graph/Granatum

15. Visual & Graphical View

16. SPARQL Endpoints returning results per query

17. Runtimes taken by different queries (Max, Min, Average, Median)

Editor's Notes

M: part of the challenge lies in the fact that, even though multiple datasets talk about the same concepts, they don’t use the same terminologies. Both the URI are different, and so are the labels. -> In Granatum, we enable drug discovery by addressing this problem in linked open data
M: the way linked data is organizes still forces us to lookup data by its location, not the content! But those who turn to linked data don’t want to query “PDB”, they want to learn more about proteins, or genes, etc -> Our first task is to catalogue the concepts that are relevant in these various datasets. Proving a common access for data is the first pillar on the bridge that crosses the valley of death
M: when data is catalogues, we can discovering new links by crossreferencing with existing datasets -> once we identify these concepts, how do we actualy query them toegether?
Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API or an RSS feed

A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Similar to A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud (20)

More from Syed Muhammad Ali Hasnain

More from Syed Muhammad Ali Hasnain (9)

Recently uploaded

Recently uploaded (20)

A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Editor's Notes