TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...JohannWanja
Deciding which RDF vocabulary terms to use when modeling data as Linked Open Data (LOD) is far from trivial. We propose "TermPicker" as a novel approach enabling vocabulary reuse by recommending vocabulary terms based on various features of a term. These features include the term’s popularity, whether it is from an already used vocabulary, and the so-called schema-level pattern (SLP) feature that exploits which terms other data providers on the LOD cloud use to describe their data. We apply Learning To Rank to establish a ranking model for vocabulary terms based on the utilized features. The results show that using the SLP-feature improves the recommendation quality by 29% to 36% considering the Mean Average Precision and the Mean Reciprocal Rank at the first five positions compared to recommendations based on solely the term’s popularity and whether it is from an already used vocabulary.
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesKemele M. Endris
The increasing number of RDF data sources that allow for
querying Linked Data via Web services form the basis for federated SPARQL query processing. Federated SPARQL query engines provide a unified view of a federation of RDF data sources, and rely on source descriptions for selecting the data sources over which unified queries will be executed. Albeit efficient, existing federated SPARQL query engines usually ignore the meaning of data accessible from a data source,
and describe sources only in terms of the vocabularies utilized in the data source. Lack of source description may conduce to the erroneous selection of data sources for a query, thus affecting the performance of query processing over the federation. We tackle the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of RDF molecule templates, i.e., abstract descriptions of entities
belonging to the same RDF class. Moreover, MULDER utilizes RDF molecule templates for source selection, and query decomposition and optimization. We empirically study the performance of MULDER on existing benchmarks, and compare MULDER performance with state-of-the-art federated SPARQL query engines. Experimental results suggest that RDF molecule templates empower MULDER federated query processing, and allow for the selection of RDF data sources that not only reduce execution time, but also increase answer completeness.
ParlBench: a SPARQL-benchmark for electronic publishing applications.Tatiana Tarasova
Slides from the workshop on Benchmarking RDF Systems co-located with the Extended Semantic Web Conference 2013. The presentation is about an on-going work on building the benchmark for electronic publishing applications. The benchmark provides real-world data sets, the Dutch parliamentary proceedings and a set of analytical SPARQL queries that were built on top of these data sets. The queries were grouped into micro-benchmarks according to their analytical aims. This allows one to perform better analysis of RDF stores behaviors with respect to a certain SPARQL feature used in a micro-benchmark/query.
Preliminary results of running the benchmark on the Virtuoso native RDF store are presented, as well as references to the on-line material including the data sets, queries and the scripts that were used to obtain the results.
TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from th...JohannWanja
Deciding which RDF vocabulary terms to use when modeling data as Linked Open Data (LOD) is far from trivial. We propose "TermPicker" as a novel approach enabling vocabulary reuse by recommending vocabulary terms based on various features of a term. These features include the term’s popularity, whether it is from an already used vocabulary, and the so-called schema-level pattern (SLP) feature that exploits which terms other data providers on the LOD cloud use to describe their data. We apply Learning To Rank to establish a ranking model for vocabulary terms based on the utilized features. The results show that using the SLP-feature improves the recommendation quality by 29% to 36% considering the Mean Average Precision and the Mean Reciprocal Rank at the first five positions compared to recommendations based on solely the term’s popularity and whether it is from an already used vocabulary.
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesKemele M. Endris
The increasing number of RDF data sources that allow for
querying Linked Data via Web services form the basis for federated SPARQL query processing. Federated SPARQL query engines provide a unified view of a federation of RDF data sources, and rely on source descriptions for selecting the data sources over which unified queries will be executed. Albeit efficient, existing federated SPARQL query engines usually ignore the meaning of data accessible from a data source,
and describe sources only in terms of the vocabularies utilized in the data source. Lack of source description may conduce to the erroneous selection of data sources for a query, thus affecting the performance of query processing over the federation. We tackle the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of RDF molecule templates, i.e., abstract descriptions of entities
belonging to the same RDF class. Moreover, MULDER utilizes RDF molecule templates for source selection, and query decomposition and optimization. We empirically study the performance of MULDER on existing benchmarks, and compare MULDER performance with state-of-the-art federated SPARQL query engines. Experimental results suggest that RDF molecule templates empower MULDER federated query processing, and allow for the selection of RDF data sources that not only reduce execution time, but also increase answer completeness.
ParlBench: a SPARQL-benchmark for electronic publishing applications.Tatiana Tarasova
Slides from the workshop on Benchmarking RDF Systems co-located with the Extended Semantic Web Conference 2013. The presentation is about an on-going work on building the benchmark for electronic publishing applications. The benchmark provides real-world data sets, the Dutch parliamentary proceedings and a set of analytical SPARQL queries that were built on top of these data sets. The queries were grouped into micro-benchmarks according to their analytical aims. This allows one to perform better analysis of RDF stores behaviors with respect to a certain SPARQL feature used in a micro-benchmark/query.
Preliminary results of running the benchmark on the Virtuoso native RDF store are presented, as well as references to the on-line material including the data sets, queries and the scripts that were used to obtain the results.
Lodha Codename Finale is a New Pre-Launch project from Lodha Group,situated at Palava Dombivali East Mumbai. Palava Codename Finale has 1,2&3BHK Flats with premium rates and world-class amenities Palava Codename Finale.
http://www.lodhapalavacodenamefinale.co.in/
With the new year, digital world too needs a new You! While you continue to follow everything that you did on social media for your brand in 2013, make sure you rev up your efforts in 2014 for extra vitality.
A Mapping-based Method to Query MongoDB Documents with SPARQLFranck Michel
Accessing legacy data as virtual RDF stores is a key issue in the building of the Web of Data. In recent years, the MongoDB database has become a popular actor in the NoSQL market, making it a significant potential contributor to the Web of Linked Data. Therefore, in this talk we present an article published at the DEXA 2016 conference. It addresses the question of how to access arbitrary MongoDB documents with SPARQL.
We propose a two-step method to (i) translate a SPARQL query into a pivot abstract query under MongoDB-to-RDF mappings represented in the xR2RML language, then (ii) translate the pivot query into a concrete MongoDB query.
We elaborate on the discrepancy between the expressiveness of SPARQL and the MongoDB query language, and we show that we can always come up with a rewriting that shall produce all correct answers.
Integrating Heterogeneous Data Sources in the Web of DataFranck Michel
These are the slides of a 40mn presentation I've made at the CNRS Software Development days (JDEV 2017), in Marseille (France), July 5th, 2017.
Here is the Webcast, in French: https://webcast.in2p3.fr/videos-integrer_des_sources_de_donnees_heterogenes_dans_le_web_de_donnees
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
"Data integration with a façade.
The case of knowledge graph construction." is an overview of recent research in façade-based data access. The slides introduce core notions of façade-based data access and the design principles of SPARQL Anything, a system that allows querying of many formats (CSV, JSON, XML, HTML, Markdown , Excel, ...) in plain SPARQL.
... or how to query an RDF graph with 28 billion triples in a standard laptop
These slides correspond to my talk at the Stanford Center for Biomedical Informatics, on 25th April 2018
SPARQL introduction and training (130+ slides with exercices)Thomas Francart
Full SPARQL training
Covers all SPARQL : basic graph patterns, FILTERs, functions, property paths, optional, negation, assignation, aggregation, subqueries, federated queries.
Does not cover except SPARQL updates.
Includes exercices on DBPedia.
CC BY license
Lodha Codename Finale is a New Pre-Launch project from Lodha Group,situated at Palava Dombivali East Mumbai. Palava Codename Finale has 1,2&3BHK Flats with premium rates and world-class amenities Palava Codename Finale.
http://www.lodhapalavacodenamefinale.co.in/
With the new year, digital world too needs a new You! While you continue to follow everything that you did on social media for your brand in 2013, make sure you rev up your efforts in 2014 for extra vitality.
A Mapping-based Method to Query MongoDB Documents with SPARQLFranck Michel
Accessing legacy data as virtual RDF stores is a key issue in the building of the Web of Data. In recent years, the MongoDB database has become a popular actor in the NoSQL market, making it a significant potential contributor to the Web of Linked Data. Therefore, in this talk we present an article published at the DEXA 2016 conference. It addresses the question of how to access arbitrary MongoDB documents with SPARQL.
We propose a two-step method to (i) translate a SPARQL query into a pivot abstract query under MongoDB-to-RDF mappings represented in the xR2RML language, then (ii) translate the pivot query into a concrete MongoDB query.
We elaborate on the discrepancy between the expressiveness of SPARQL and the MongoDB query language, and we show that we can always come up with a rewriting that shall produce all correct answers.
Integrating Heterogeneous Data Sources in the Web of DataFranck Michel
These are the slides of a 40mn presentation I've made at the CNRS Software Development days (JDEV 2017), in Marseille (France), July 5th, 2017.
Here is the Webcast, in French: https://webcast.in2p3.fr/videos-integrer_des_sources_de_donnees_heterogenes_dans_le_web_de_donnees
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
"Data integration with a façade.
The case of knowledge graph construction." is an overview of recent research in façade-based data access. The slides introduce core notions of façade-based data access and the design principles of SPARQL Anything, a system that allows querying of many formats (CSV, JSON, XML, HTML, Markdown , Excel, ...) in plain SPARQL.
... or how to query an RDF graph with 28 billion triples in a standard laptop
These slides correspond to my talk at the Stanford Center for Biomedical Informatics, on 25th April 2018
SPARQL introduction and training (130+ slides with exercices)Thomas Francart
Full SPARQL training
Covers all SPARQL : basic graph patterns, FILTERs, functions, property paths, optional, negation, assignation, aggregation, subqueries, federated queries.
Does not cover except SPARQL updates.
Includes exercices on DBPedia.
CC BY license
Oplægget blev holdt ved et seminar i InfinIT-interessegruppen Højniveau sprog til indlejrede systemer den 11. november 2009.
Læs mere om interessegruppen på http://www.infinit.dk/dk/interessegrupper/hoejniveau_sprog_til_indlejrede_systemer/
Large scale, interactive ad-hoc queries over different datastores with Apache...jaxLondonConference
Presented at JAX London 2013
Apache Drill is a distributed system for interactive ad-hoc query and analysis of large-scale datasets. It is the Open Source version of Google’s Dremel technology. Apache Drill is designed to scale to thousands of servers and able to process Petabytes of data in seconds, enabling SQL-on-Hadoop and supporting a variety of data sources.
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Franck Michel
Slides of a presentation I gave at the TDWG 2020 conference.
Paper: https://doi.org/10.3897/biss.4.59046
Video: https://www.youtube.com/watch?v=KiAgTWpEkHE
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...Franck Michel
Presentation of an article published at the 2nd International Workshop on Semantics for Biodiversity (S4Biodiv 2017), co-located with ISWC2017.
Article: https://hal.archives-ouvertes.fr/hal-01617708
Taxonomic registers are key tools to help us comprehend the diversity of nature. Publishing such registers in the Web of Data, following the standards and best practices of Linked Open Data (LOD), is a way of integrating multiple data sources into a world-scale, biological knowledge base. In this pa-per, we present an on-going work aimed at the publication of TAXREF, the French national taxonomic register, on the Web of Data. Far beyond the mere translation of the TAXREF database into LOD standards, we show that the key point of this endeavor is the design of a model capable of capturing the two coexisting yet distinct realities underlying taxonomic registers, namely the nomenclature (the rules for naming biological entities) and the taxonomy (the description and characterization of these biological entities). We first analyze different modelling choices made to represent some international taxonomic registers as LOD, and we underline the issues that arise from these differences. Then, we propose a model aimed to tackle these is-sues. This model separates nomenclature from taxonomy, it is flexible enough to accommodate the ever-changing scientific consensus on taxonomy, and it adheres to the philosophy underpinning the Semantic Web standards. Finally, using the example of TAXREF, we show that the model enables interlinking with third-party LOD data sets, may they represent nomenclatural or taxonomic information.
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataFranck Michel
Presentation of an article published at the 11th workshop on Linked Data on the Web (LDOW2018), co-located with the Web Conference 2018.
Article: https://hal.archives-ouvertes.fr/hal-01722792
We hypothesize that harnessing the Semantic Web standards to enable automatic combination of Linked Data and non-RDF Web APIs data could trigger novel cross-fertilization scenarios.
To achieve this goal, we define the SPARQL Micro-Service architecture. A SPARQL micro-service is a lightweight, task-specific SPARQL endpoint that provides access to a small, resource-centric, virtual graph, while dynamically assigning dereferenceable URIs to Web API resources that do not have URIs beforehand. The graph is delineated by the Web API service being wrapped, the arguments passed to this service, and the restricted types of RDF triples that this SPARQL micro-service is designed to spawn. In this context, we argue that full SPARQL expressiveness can be supported efficiently without jeopardizing servers availability. Eventually, we believe that an ecosystem of SPARQL micro-services could emerge from independent service providers, enabling Linked Data-based applications to glean pieces of data from a wealth of distributed, scalable and reliable services. We describe an experimentation where we dynamically augment biodiversity-related Linked Data with data from Flickr, MusicBrainz and the Macauley scientific media library.
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Franck Michel
Conférence SemWeb.Pro, 21/11/2016.
Une des missions du Muséum National d’Histoire Naturelle (MNHN) est d’établir une synthèse de la biodiversité et du patrimoine naturel français. Dans ce contexte, il est en charge de l’élaboration d’un référentiel taxonomique pour la faune, la flore et la fonge, TAXREF. Ce référentiel unique liste et organise les noms scientifiques de l'ensemble des êtres vivants recensés sur les territoires français, métropole et outremer, et constitue la pierre angulaire du Système d’Information sur la Nature et les Paysages (SINP). Il est utilisé par de nombreux acteurs publics, privés et de la société civile (collectivités, conservateurs, cabinets d’architecte, enseignants, citoyens, etc.). TAXREF est de plus aligné avec d'autres référentiels taxonomiques ou nomenclaturaux internationaux.
Le projet de recherche Zoomathia vise à étudier l’histoire de la connaissance zoologique à travers l’Antiquité et le Moyen-Age. Pour cela, il envisage d’utiliser les technologies du web sémantique afin d’intégrer des sources de données hétérogènes, allant d’encyclopédies médiévales à des données de biologie moderne, en passant par des rapports de fouilles archéologiques et des ressources iconographiques. Ce travail passe nécessairement par la sélection et/ou la définition de vocabulaires pouvant servir de référentiels taxonomique, culturel, géographique, chronologique etc. Afin de rendre les données intégrées interopérables sur le web, ces vocabulaires doivent faire l’objet d’un consensus et être liés à d’autres vocabulaires connexes faisant autorité. TAXREF étant le résultat d’un large consensus scientifique, et étant déjà utilisé pour l’intégration de données de biologie moderne et de données archéologiques, il a été sélectionné pour construire un thésaurus supportant l’intégration des données considérées dans le cadre du projet Zoomathia.
Dans cette présentation, je reviendrai sur les motivations exposées ci-dessus, puis je décrirai la modélisation d’un thésaurus exprimé en SKOS (Simple Knowledge Organisation System) afin de produire une version de TAXREF exploitable avec les technologies du web sémantique. J’aborderai notamment la question du lien entre ce « TAXREF-SKOS » et d’autres thésaurus et ontologies existantes. Enfin, je décrirai la méthode utilisée pour produire le résultat en RDF et son exposition sur le web de données sous forme d’URI pérennes déréférençables, et je ferai une courte démonstration via la navigation dans les URI en Linked Data et l’utilisation de requêtes SPARQL. En conclusion je reviendrai sur le fait que la construction de thésaurus SKOS n’est qu’une étape, un « enabler », visant à encourager les producteurs de données utilisant déjà TAXREF, et les concepteurs d’applications, à utiliser ces technologies et s’appuyer sur TAXREF-SKOS.
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
The presentation investigates the challenges that we must face to share scientific datasets on the Web following the Linked Open Data principles. We present the standards of the Semantic Web and investigate how they can help address those challenges. We give tips as to how to choose vocabularies to describe data and metadata, link datasets to other related datasets by making appropriate alignments, translate existing data sources to RDF and publish it on the Web as linked data.
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Franck Michel
Presentateion of a collective article we submited at the First Semantic Web for Scientific History workshop (SW4SH) co-located with ESWC 2015.
Link to the article: https://hal.archives-ouvertes.fr/hal-01146638v1
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
Lateral Ventricles.pdf very easy good diagrams comprehensive
A Generic Mapping-based Query Translation from SPARQL to Various Target Database Query Languages
1. 1
Franck Michel
A Generic Mapping-based Query Translation
from SPARQL to Various Target Database Query
Languages
A Generic Mapping-based Query Translation
from SPARQL to Various Target Database Query
Languages
F. Michel, C. Faron-Zucker, J. Montagnat
I3S laboratory, CNRS, Univ. Nice Sophia
2. 2
Franck Michel
Publication/interlinking of open datasets
• In a common machine-readable format
• Using common vocabularies
Linking data increases its value
• Produce new knowledge
• Mash up with related data
• Opportunity for new (unexpected) usage
Citizenship demand for access to public data (scientific,
government…)
Towards a Web of Data
From a Web of Documents
...to a Web of (Linked) Data
3. 3
Franck Michel
Driven/supported by various initiatives, e.g.:
• General-purpose: Linking Open Data, W3C Data Activity
• Domain-specific: Bio2RDF, BioPortal
• GAFAs: Facebook OG, Google KG, Yahoo!, Microsoft… consume
and produce RDF
Towards a Web of Data
Linked Open Data Cloud
Linked Datasets as of Aug. 30th 2014. (c) R. Cyganiak & and A. Jentzsch
4. 4
Franck Michel
Web-scale data integration
Need to access data from the Deep Web
• Strd./unstrd. data
hardly indexed by search engines,
hardly linked with other data sources
Exponential data growth goes in
• Various types of DBs:
RDB, Native XML, LDAP directory,
OODB, NoSQL, NewSQL, ...
• Heterogeneous data models and
query capabilities
Whatever the type of DB… it can
be of interest for the Web of Data
… “Raw Data Now” (T. Berners Lee)
6. 6
Franck Michel
Focused on data formats
• HTML: RDFa, Microformats
• XML: Using XPath (RML), XQuery (XSPARQL, SPARQL2XQuery),
XSLT (Scissor-Lift, GRDDL), XSD-to-OWL (SPARQL2XQuery)
• CSV/TSV/Spreadsheets: CSV on the web (W3C WG)
• JSON: using JSONPath (RML), JSON-LD
Focused on types of database
• Extensive work on RDBs: D2RQ, Virtuoso, R2RML…
• XML native DBs: SPARQL2XQuery
• NoSQL stores: xR2RML
Integration frameworks: DataLift, RML, Asio Tool Suite…
Previous works
8. 8
Franck Michel
SPARQL rewriting in the general case
Previous works: SPARQL rewriting closely coupled with the target QL
expressiveness (SQL, XQuery):
support of joins, unions, nested queries, filtering, string manipulation etc.
Solution proposed: two-steps approach
1. Translate SPARQL into a pivot Abstract Query Language (AQL)
under “target DB-to-RDF” mappings: generic mapping language needed
2. Translate from the Abstract QL to the QL of the target database
Enable SPARQL access to a large
range of heterogeneous databasesGoal:
9. 9
Franck Michel
SPARQL rewriting in the general case
Previous works: SPARQL rewriting closely coupled with the target QL
expressiveness (SQL, XQuery):
support of joins, unions, nested queries, filtering, string manipulation etc.
Solution proposed: two-steps approach
1. Translate SPARQL into a pivot Abstract Query Language (AQL)
under “target DB-to-RDF” mappings: generic mapping language needed
2. Translate from the Abstract QL to the QL of the target database
Enable SPARQL access to a large
range of heterogeneous databasesGoal:
12. 12
Franck Michel
The xR2RML mapping language
Describe mappings from various types of DB to RDF
• Query the target database
• Pick data elements from query results
• Translate them to (subject, predicate, object) using arbitrary ontologies
Independent of any target database
• Allow any declarative query language
• Allow any syntax to reference data elements within query results (column name,
JSONPath, XPath, attribute name...)
Extends W3C R2RML (backward compatible) and RML
Turtle RDF Syntax
Mapping graph = set of “triples maps” ~ mappings
18. 18
Franck Michel
(1) Triples patterns bindings
1. Initial set of mappings for each triple pattern
• Check compatibility: term type, datatype, lang
• Check unsatisfiable SPARQL filter constraints about a terms type, data type,
language: isIRI, isLiteral, isBlank, lang(), datatype()…
e.g. “rr:termType rr:Literal” does not match "isIRI(?var)"
2. Reduce bindings
• Consider join constraints implied by shared variables
SELECT ?x WHERE {
?y foaf:mbox "john@foo.com".//tp2
?x foaf:knows ?y. //tp3
}
Bindings:
(tp2, <#Mbox>)
(tp3, <#Knows>)
Shared variable ?y compatibility between
<#Mbox>’s subject map
and
<#Knows>’s object map
M. Rodríguez-Muro, M. Rezk. Efficient SPARQL-to-SQL with R2RML mappings, Web Semant. Sci. Serv. Agents World Wide Web. 33 (2015) 141–169.
J. Unbehauen, C. Stadler, S. Auer. Accessing relational data on the web with SparqlMap, in: Semantic Technol., Springer, 2013: pp. 65–80.
19. 19
Franck Michel
(2) Rewrite each Triple Pattern
(tp1, <#Mbox>)
(tp2, <#Mbox>)
(tp3, <#Knows>)
Bindings
tp2 <#Mbox>
match
Atomic Abstract Query
{ From, Project, Where }
21. 21
Franck Michel
Usual SPARQL-to-SQL example:
FILTER encapsulating SELECT WHERE clause
Relies on the DB engine to optimize the query
In the general case, no assumption on the target DB:
Need to optimize at the earliest stage:
push “down” filter conditions in the translation of triple patterns
(2) Rewrite from SPARQL to the AQL
SELECT ?x WHERE {
?x foaf:age ?age.
…
FILTER (?age > 30)
}
SELECT t2.X FROM
( SELECT t1.ID AS X, t1.AGE AS AGE
FROM PERSON t1
…
) AS t2
WHERE (t2.AGE > 30)
22. 22
Franck Michel
(2) Rewrite from SPARQL to the AQL
transm (P1 AND P2, f) transm (P1, f) INNER JOIN transm (P2, f) ON var(P1) ⋂ var(P2)
transm (P1 OPTIONAL P2, f) transm (P1, f) LEFT JOIN transm (P2, f) ON var(P1) ⋂ var(P2)
transm (P1 UNION P2, f) transm (P1, f) LEFT JOIN transm (P2, f) ON var(P1) ⋂ var(P2)
UNION
transm (P2, f) LEFT JOIN transm (P1, f) ON var(P1) ⋂ var(P2)
transm (tp, f) transTPm (tp, sparqlCond(tp, f))
transm (P FILTER f’, f) transm (P, f && f’) FILTER sparqlCond(P, f && f’)
transm (P) transm (P, true)
Rewrites a well-designed SPARQL graph pattern
into the AQL, under a set of xR2RML mappings m
Function transm (graph pattern, filter)
sparqlCond: Push filter conditions in the translation of relevant triple patterns
23. 23
Franck Michel
(2) Rewrite from SPARQL to the AQL
Function sparqlCond():
Push down filter conditions in the translation of triples patterns
Make inner-queries as selective as possible
Limit the size of intermediary results
SELECT ?x WHERE {
?x foaf:mbox ?mbox. // tp1
?y foaf:mbox "john@foo.com". // tp2
?x foaf:knows ?y. // tp3
FILTER {
contains(str(?mbox),"foo.com") // c1
&& ?x != ?y // c2
}}
transTPm(tp1, c1)
INNER JOIN transTPm(tp2, true) ON {}
INNER JOIN transTPm(tp3, c2) ON {?x,?y}
FILTER c2
25. 25
Franck Michel
(3) Optimization
Abstract Query effective but may be inefficient
• Unnecessary complexity: multiple joins, unions, redundancy
Study/reuse of common query optimization techniques
• Self-Join / Optional-Self-Join Elimination
• When the same mapping is bound to different triple patterns
• Self-Union Elimination
• When multiple mappings are bound to the same triple pattern
• Projection Pushing
SELECT DISTINCT ?p WHERE { ?s ?p ?o }
• Filter propagation in joined queries
B. Elliott, E. Cheng, C. Thomas-Ogbuji, Z.M. Ozsoyoglu, A complete translation from SPARQL into efficient SQL, in: Proc. Int. Database Eng. Appl. Symp. 2009, ACM, 2009: pp. 31–42.
M. Rodríguez-Muro, M. Rezk. Efficient SPARQL-to-SQL with R2RML mappings, Web Semant. Sci. Serv. Agents World Wide Web. 33 (2015) 141–169.
J. Unbehauen, C. Stadler, S. Auer. Accessing relational data on the web with SparqlMap, in: Semantic Technol., Springer, 2013: pp. 65–80.
27. 27
Franck Michel
Application: SPARQL-to-MongoDB
Prototype implementation for MongoDB
• SPARQL-to-AQL implemented as a DB-independent component
Extendable to other target DBs
• AQL-to-MongoDB QL
Not straightforward due to MongoDB limitations:
– No join, no nested query, union hardly supported
– Limited comparison filters, JavaScript filters discouraged
Much work falls back on the query processing engine
Two concrete use cases
• SKOS representation of a taxonomical reference
• Biological studies on rice phenotype data
29. 29
Franck Michel
Conclusions & perspectives
Goal: foster the development of SPARQL interfaces to
heterogeneous databases
Formalized approach:
• Generalize existing works on SQL and XQuery
• Rely on a DB-independent mapping language: xR2RML
• Encompass all DB-independent steps of the rewriting process
• Leave only DB-specific rewriting as a last step
Prototype implementation for MongoDB
• Used in two real world contexts
Perspectives
• Perform benchmarking
• Use it with distributed SPARQL query engine
30. 30
Franck Michel
Conclusions & perspectives
SW vs. NoSQL: two un-reconciliable worlds?
Different paradigms:
• SW manages highly connected graphs,
• NoSQL’s manage isolated documents, joins hardly supported
NoSQL DBs
• pragmatically gave up on consistency and rich query features
• trade-off to high throughput/availability, horizontal elasticity
Filling the gap between the two worlds is not straightforward
The experience of MongoDB shows challenges.
Huge potential source of LOD, can’t be ignored anymore
31. 31
Franck Michel
Contacts:
Franck Michel
Catherine Faron-Zucker
Johan Montagnat
[1] F. Michel, L. Djimenou, C. Faron-Zucker, and J. Montagnat. Translation of Relational and Non-Relational
Databases into RDF with xR2RML. In proc. of WebIST 2015.
[2] C. Callou, F. Michel, C. Faron-Zucker, C. Martin, J. Montagnat. Towards a Shared Reference Thesaurus for
Studies on History of Zoology, Archaeozoology and Conservation Biology. In SW4SH workshop, ESWC’15.
[3] F. Michel, C. Faron-Zucker, and J. Montagnat. Mapping-based SPARQL access to a MongoDB database.
Technical report, CNRS, 2015. https://hal.archives-ouvertes.fr/hal-01245883v4.
https://github.com/frmichel/morph-xr2rml/