Presentation done on the work being done on Data Integration at OEG-UPM (http://www.oeg-upm.net/), for the CredIBLE workshop, in Sophia-Antipolis (October 15th, 2012).
Opening Keynote: The Many and the One: BCE themes in 21st century data curation
Allen Renear, Professor and Interim Dean, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Two scientists can be using "the same data" even though the computer files involved appear to be quite different. This is familiar enough, and for the most part, in small communities with shared practices and familiar datasets, raises few problems. But these informal understandings do not scale to 21st century data curation. To get full value from cyberinfrastructure we must support huge quantities of heterogeneous data developed by diverse communities and used by diverse communities -- often with widely varying methods, tools, and purposes. To accomplish this our informal practices and understandings much be replaced, or at least supplemented, by a shared framework of standard terminology for describing complex cascades of representational levels and relationships. Fundamental problems in data curation -- and in particular problems involving provenance, identifiers, and data citation — cannot be fully resolved without such a framework. Although the deepest problems here have ancient origins, useful practical measures are now within reach. Some recent work toward this end that is being carried out at the Center for Informatics Research in Science and Scholarship (CIRSS) at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign will be described.
Opening Keynote: The Many and the One: BCE themes in 21st century data curation
Allen Renear, Professor and Interim Dean, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Two scientists can be using "the same data" even though the computer files involved appear to be quite different. This is familiar enough, and for the most part, in small communities with shared practices and familiar datasets, raises few problems. But these informal understandings do not scale to 21st century data curation. To get full value from cyberinfrastructure we must support huge quantities of heterogeneous data developed by diverse communities and used by diverse communities -- often with widely varying methods, tools, and purposes. To accomplish this our informal practices and understandings much be replaced, or at least supplemented, by a shared framework of standard terminology for describing complex cascades of representational levels and relationships. Fundamental problems in data curation -- and in particular problems involving provenance, identifiers, and data citation — cannot be fully resolved without such a framework. Although the deepest problems here have ancient origins, useful practical measures are now within reach. Some recent work toward this end that is being carried out at the Center for Informatics Research in Science and Scholarship (CIRSS) at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign will be described.
This presentation is a review of the NoSQL spaces I did for the X Jornades de Programari Lliure in Barcelona.
You will see a complete review of the NoSQL movement, use cases, technology review, an special review of what are the Graph Databases. And more....
Special thanks to @Hagenburger, @sbitxu, @jannis and the inspiration of the big @jimwebber and the amazing community.
Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...Cloudera, Inc.
As Hadoop rapidly becomes the universal standard for scalable data analysis and processing, it becomes increasingly important to understand its strengths and weaknesses for particular application scenarios in order to avoid inefficiency pitfalls. For example, Hadoop has great potential to perform scalable graph analysis, but recent research from Yale University has shown that a conventional approach to graph analysis is 1300 times less efficient than a more advanced approach. This session will give an overview of the advanced approach and then discuss further changes that are needed in the core Hadoop framework to take performance to the next level.
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...Channy Yun
YUN, SEOKCHAN, 1997, The Construction of the Internet Geological Data System Using WWW+Java+DB Technique, Tertiary Deposits of Korea, AAPG Annual Convention Abstracts, Association of American Petroleum Geologists 1997.4.23-26, Dallas, TX, USA, p.420
Exchange and Consumption of Huge RDF DataMario Arias
Huge RDF datasets are currently exchanged on textual RDF formats, hence consumers need to post-process them using RDF stores for local consumption, such as indexing and SPARQL query. This results in a painful task requiring a great effort in terms of time and compu- tational resources. A first approach to lightweight data exchange is a compact (binary) RDF serialization format called HDT. In this paper, we show how to enhance the exchanged HDT with additional structures to support some basic forms of SPARQL query resolution without the need of "unpacking" the data. Experiments show that i) with an exchanging ef- ficiency that outperforms universal compression, ii) post-processing now becomes a fast process which iii) provides competitive query performance at consumption.
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
This presentation is a review of the NoSQL spaces I did for the X Jornades de Programari Lliure in Barcelona.
You will see a complete review of the NoSQL movement, use cases, technology review, an special review of what are the Graph Databases. And more....
Special thanks to @Hagenburger, @sbitxu, @jannis and the inspiration of the big @jimwebber and the amazing community.
Hadoop World 2011: Hadoop and Graph Data Management: Challenges and Opportuni...Cloudera, Inc.
As Hadoop rapidly becomes the universal standard for scalable data analysis and processing, it becomes increasingly important to understand its strengths and weaknesses for particular application scenarios in order to avoid inefficiency pitfalls. For example, Hadoop has great potential to perform scalable graph analysis, but recent research from Yale University has shown that a conventional approach to graph analysis is 1300 times less efficient than a more advanced approach. This session will give an overview of the advanced approach and then discuss further changes that are needed in the core Hadoop framework to take performance to the next level.
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...Channy Yun
YUN, SEOKCHAN, 1997, The Construction of the Internet Geological Data System Using WWW+Java+DB Technique, Tertiary Deposits of Korea, AAPG Annual Convention Abstracts, Association of American Petroleum Geologists 1997.4.23-26, Dallas, TX, USA, p.420
Exchange and Consumption of Huge RDF DataMario Arias
Huge RDF datasets are currently exchanged on textual RDF formats, hence consumers need to post-process them using RDF stores for local consumption, such as indexing and SPARQL query. This results in a painful task requiring a great effort in terms of time and compu- tational resources. A first approach to lightweight data exchange is a compact (binary) RDF serialization format called HDT. In this paper, we show how to enhance the exchanged HDT with additional structures to support some basic forms of SPARQL query resolution without the need of "unpacking" the data. Experiments show that i) with an exchanging ef- ficiency that outperforms universal compression, ii) post-processing now becomes a fast process which iii) provides competitive query performance at consumption.
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies.
Today, we count more than 10,000 datasets made available online following Semantic Web standards.
A major and yet unsolved challenge that research faces today is to perform scalable analysis of large-scale knowledge graphs in order to facilitate applications in various domains including life sciences, publishing, and the internet of things.
The main objective of this thesis is to lay foundations for efficient algorithms performing analytics, i.e. exploration, quality assessment, and querying over semantic knowledge graphs at a scale that has not been possible before.
First, we propose a novel approach for statistical calculations of large RDF datasets, which scales out to clusters of machines.
In particular, we describe the first distributed in-memory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark.
Many applications such as data integration, search, and interlinking, may take full advantage of the data when having a priori statistical information about its internal structure and coverage.
However, such applications may suffer from low quality and not being able to leverage the full advantage of the data when the size of data goes beyond the capacity of the resources available.
Thus, we introduce a distributed approach of quality assessment of large RDF datasets.
It is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data.
Based on the knowledge of the internal statistics of a dataset and its quality, users typically want to query and retrieve large amounts of information.
As a result, it has become difficult to efficiently process these large RDF datasets.
Indeed, these processes require, both efficient storage strategies and query-processing engines, to be able to scale in terms of data size.
Therefore, we propose a scalable approach to evaluate SPARQL queries over distributed RDF datasets by translating SPARQL queries into Spark executable code.
We conducted several empirical evaluations to assess the scalability, effectiveness, and efficiency of our proposed approaches.
More importantly, various use cases i.e. Ethereum analysis, Mining Big Data Logs, and Scalable Integration of POIs, have been developed and leverages by our approach.
The empirical evaluations and concrete applications provide evidence that our methodology and techniques proposed during this thesis help to effectively analyze and process large-scale RDF datasets.
All the proposed approaches during this thesis are integrated into the larger SANSA framework.
Django and Neo4j - Domain modeling that kicks assTobias Lindaaker
Presentation about using Neo4j from Django presented at OSCON 2010, Portland OR.
Sample code is available at: https://svn.neo4j.org/components/neo4j.py/trunk/src/examples/python/djangosites/blog/
Description of the work done for the Semantic Markup activity of the Semantic Sensor Networks Incubator activity (at W3C).
Presentation made at the Australian Ontology Workshop, Melbourne, December 2009. The full title of the paper is: "Review of semantic enablement techniques used in geospatial and semantic standards for legacy and opportunistic mashups" (and it is available via crpit.com)
Tracking Trends in Korean Information Science Research, 2000-2011SoYoung YU
This is a presentation file of "Tracking Trends in Korean Information Science Research, 2000-2011" which was published in COLLNET 2012 proceeding, October 23rd, 2012.
If you need a full paper of it, feel free to contact So Young Yu (soyoung.yu21@gmail.com)
Whitepaper : CHI: Hadoop's Rise in Life Sciences EMC
Genomics large, semi-structured, file-based data is ideally suited for a Hadoop Distributed File System. The EMC Isilon OneFS file system features connectivity to the Hadoop Distributed File System (HDFS) that makes the Hadoop storage "oscale-out" and truly distributed. An example from the "CrossBow" project is explored.
Organisational Interoperability in Practice at Universidad Politécnica de MadridOscar Corcho
Presentation on EOSC Interoperability Framework in relation to Organisational Interoperability, and how it can be applied to a Research Performing Organisation such as UPM
Open Data (and Software, and other Research Artefacts) -A proper managementOscar Corcho
Presentation at the event "Let's do it together: How to implement Open Science Practices in Research Projects" (29/11/2019), organised by Universidad Politécnica de Madrid, where we discuss on the need to take into account not only open access or open research data, but also all the other artefacts that are a result of our research processes.
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosOscar Corcho
Esta presentación se ha realizado en el contexto de la Jornada sobre difusión, accesibilidad y reutilización de la estadística y cartografía oficial (http://www.juntadeandalucia.es/institutodeestadisticaycartografia/blog/2019/11/jornada-plan/), organizada por el Instituto de Estadística y Cartografía de Andalucía.
Ontology Engineering at Scale for Open City Data SharingOscar Corcho
Seminar at the School of Informatics, The University of Edinburgh.
In this talk we will present how we are applying ontology engineering principles and tools for the development of a set of shared vocabularies across municipalities in Spain, so that they can start homogenising the generation and publication of open data that may be useful for their own internal reuse as well as for third parties who want to develop applications reusing open data once and deploy them for all municipalities. We will discuss on the main challenges for ontology engineering that arise in this setting, as well as present the work that we have done to integrate ontology development tools into common software development infrastructure used by those who are not experts in Ontology Engineering.
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Oscar Corcho
Presentación sobre iniciativas de Open Data Internacionales y nacionales, realizada en el contexto del Curso de Verano de la Universidad de Extremadura "BigData y Machine Learning junto a fuentes de datos abiertos para especializar el sector agroganadero", el 25/09/2018
Presentación general sobre contaminación lumínica, en español, del proyecto STARS4ALL (www.stars4all.eu). Generada por el consorcio del proyecto, con especial agradecimiento a Lucía García (@shekda) por generar la primera versión en inglés, y Miquel Serra-Ricart, por realizar su traducción inicial.
Towards Reproducible Science: a few building blocks from my personal experienceOscar Corcho
Invited keynote given at the Second International Workshop on Semantics for BioDiversity (http://fusion.cs.uni-jena.de/s4biodiv2017/), held in conjunction with ISWC2017 (https://iswc2017.semanticweb.org/)
Publishing Linked Statistical Data: Aragón, a case studyOscar Corcho
Presentation at the Semstats2017 workshop (http://semstats.org/2017/) for the paper "Publishing Linked Statistical Data: Aragón, a Case Study", by Oscar Corcho, Idafen Santana-Pérez, Hugo Lafuente, David Portolés, César Cano, Alfredo Peris, José María Subero.
An initial analysis of topic-based similarity among scientific documents base...Oscar Corcho
Presentation given at the SemSci2017 workshop (https://semsci.github.io/semSci2017/), for the paper "An Initial Analysis of Topic-based Similarity among Scientific Documents Based on their Rhetorical Discourse Parts" http://ceur-ws.org/Vol-1931/paper-03.pdf
Introductory talk on the usage of Linked Data for official statistics, given at the ESS (Linked) Open Data Workshop 2017, in Malta, January 2017.
In this introductory talk we will discuss the main foundations for the application of Linked Data principles into official statistics. We will briefly introduce what Linked Data is, as well as the main principles, languages and technologies behind it (URIs, RDF, SPARQL). We will also discuss about the different formats in which data can be made available on the Web (e.g., RDF Turtle, JSON-LD, CSV on the Web). We will then move into providing a detailed presentation, with step by step examples based on existing Linked Statistical Data sources, of the W3C recommendation RDF DataCube, which is the basis for the dissemination of statistical data as Linked Data. Finally, we will provide some examples of applications, and the opportunities that this approach offers for the development of the proofs of concepts selected by Eurostat and to be discussed during the meeting.
Aplicando los principios de Linked Data en AEMETOscar Corcho
Presentación realizada en uno de los paneles de la jornada sobre datos abiertos organizada por AEMET el 13 de diciembre del 2016, sobre la aplicación de los principios de Linked Data la API REST de AEMET
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
This is the presentation of the #ojoaldata100 initiative (http://ojoaldata100.okfn.es) for the selection of 100 datasets that every city should be publishing in their open data portal. This presentation was used in a call for sharing session at the 4th International Open Data Conference (IODC2016).
Educando sobre datos abiertos: desde el colegio a la universidadOscar Corcho
Presentación realizada en la mesa 3 del evento Aporta 2016, uno de los pre-eventos de la semana de los datos abiertos en Madrid. Realizada el 3 de octubre del 2016.
http://datos.gob.es/encuentro-aporta?q=node/654503
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaOscar Corcho
En esta presentación mostramos el trabajo realizado para la generación y publicación de datos enlazados a partir de los datos de estadística local del Instituto Aragonés de Estadística
Presentación de la red de excelencia de Open Data y Smart CitiesOscar Corcho
Presentación general de la red de excelencia de Open Data y Smart Cities (http://www.opencitydata.es), realizada en Medialab-Prado el 18 de febrero de 2016
Why do they call it Linked Data when they want to say...?Oscar Corcho
The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.
Linked Statistical Data: does it actually pay off?Oscar Corcho
Invited keynote at the ISWC2015 Workshop on Semantics and Statistics (SemStats 2015). http://semstats.github.io/2015/
The release of the W3C RDF Data Cube recommendation was a significant milestone towards improving the maturity of the area of Linked Statistical Data. Many Data Cube-based datasets have been released since then. Tools for the generation and exploitation of such datasets have also appeared. While the benefits for the usage of RDF Data Cube and the generation of Linked Data in this area seem to be clear, there are still many challenges associated to the generation and exploitation of such data. In this talk we will reflect about them, based on our experience on generating and exploiting such type of data, and hopefully provoke some discussion about what the next steps should be.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Data Integration at the Ontology Engineering Group
1. Data integration at our group:
ingredients and some
prospects
Credible workshop
Sophia-Antipolis, October 15th 2012
Oscar Corcho
ocorcho@fi.upm.es
Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain
With contributions from: José Mora (OEG-UPM), Boris Villazón-Terrazas (OEG-UPM, now at
iSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos Buil-Aranda
(OEG-UPM, now at PUC Chile)
2. Our data integration needs, problems (and challenges)
And data may be available from data
streams (e.g., sensors)
Need to submit SPARQL queries into
distributed SPARQL endpoints
Need to access heterogeneous relational
data sources (mainly in the area of Geography)
• Some of the databases are available
in different DBMSs
• And some of the data sources are
available as spreadsheets
• Furthermore, many of these datasets
are already published as Linked Data
2
3. Ingredients
100
80 thin applications (mas hups )
middleware
60 5 Reasoning Este
40
s emantic data integration and querying
Oeste
20 1 RDB2RDF
Norte
0
1er 3er 2 legacy
Sensor-based
3 query rewriting
Optimisations
data s ources
trim. regis tries
s ens or networks trim.
Federated Query
4 Processing
Linked Open Data Spreadsheets
From SemsorGrid4Env architecture (http://www.semsorgrid4env.eu/) 3
4. Disclaimer
When I talk about ontology-based querying,
I will be normally talking about SPARQL querying
4
5. 1. RDB2RDF
In other words, how to make relational data available as
RDF (and connected to ontologies)
5
6. RDB2RDF. Motivation
• A majority of dynamic Web content is backed by relational databases (RDB),
and so are many enterprise systems.
transformation
transformation engine
description
6
7. RDB2RDF. Query rewriting for OBDA with mappings
Q
Rewriting Mappings
Q’
There may be some
mappings to translate
between ontology and DB.
The rewriting should
consider those mappings.
7
8. RDB2RDF. Existing approaches
1
2
1. To build a new ontology from a
database schema and content
(direct mappings)
2. To map the ontology created in
approach (1) to a legacy ontology
3. To map an existing DB to a legacy
ontology
3
new ontology
existing ontology
9. OEG’s background knowledge in RDB2RDF
• R2O and ODEMapster
• GaV wrapper generation (no mediators)
• Syntactic sugar for the generation of SQL queries.
• Simple use of this language and processor in the domains of
fund finding, cultural information, and fisheries.
• NeOn Toolkit plugin for common mappings
Barrasa J, Corcho O, Gómez-Pérez A. (2004)
R2O, an extensible and semantically based
database-to-ontology mapping language. In:
Proceedings of the Second Workshop on
Semantic Web and Databases, SWDB 2004.
9
10. R2O (Relational-to-Ontology) Language
For concepts... One or more
concepts can be
extracted from a
A view maps exactly single data field (not
one concept in the in 1NF).
ontology.
For attributes...
A subset of the A column in a
columns in the view database view maps
map a concept in the directly an attribute
ontology. or a relation.
A subset (selection) of
the records of a A column in a
database view map a database view maps
concept in the an attribute or a
ontology. relation after some
transformation.
A subset of the
records of a database
view map a concept
in the onto. but the A set of columns in a
selection cannot be database view map
made using SQL. an attribute or a
relation.
11. The W3C RDB2RDF Working Group
• Created in 2007
• W3C Recommendations in
September 2012
• R2RML: RDB to RDF Mapping
Language -
http://www.w3.org/TR/r2rml/
• Direct Mapping -
http://www.w3.org/TR/rdb-
direct-mapping/
• R2RML and Direct Mapping
Test Cases -
http://www.w3.org/2001/sw/rdb
2rdf/test-cases/
• RDB2RDF Implementation
Report -
http://www.w3.org/2001/sw/rdb
2rdf/implementation-report/
11
14. Ongoing work
• Provide a list of common patterns in R2RML
transformations, so that they can be reused
(increasing productivity)
• Sequeda J, Priyatna F, Villazón-Terrazas B. Relational
Database to RDF Mapping Patterns. In: Proceedings of the
3rd Workshop on Ontology Patterns (WOP2012).
• Villazón-Terrazas B, Priyatna F. Building Ontologies by
using Re-engineering Patterns and R2RML Mappings. In:
Proceedings of the 3rd Workshop on Ontology Patterns
(WOP2012).Priyatna
• http://mappingpedia.linkeddata.es/
• Improve our support at Morph for all test cases
• Adapt existing GUIs for the generation of mappings
(such as NeOn Toolkit’s one).
14
15. 2. R2RML query
rewriting optimisations
In other words, how to make this query rewriting
optimised, so that we don’t suffer from a bad efficiency
in our results
15
16. R2RML is now a W3C Recommendation
• That’s very good to ensure wide uptake, but…
• Implementations still suffer from their lack of
efficiency
• UltraWrap has shown that a similar performance can be
obtained with direct mappings on high-end databases
(Oracle, SQL Server)
• What happens with low-end databases (mySQL)?
16
17. Several works on SPARQL to SQL translation
• Barrasa J, Corcho O, Gómez-Pérez A. (2004) R2O, an
extensible and semantically based database-to-ontology
mapping language. In: Proceedings of the Second Workshop on
Semantic Web and Databases, SWDB 2004.
• R. Cyganiak. A relational algebra for sparql. Digital Media
Systems Laboratory. HP Laboratories Bristol. HPL-2005-170,
2005.
• B. Elliott, E. Cheng, C. Thomas-Ogbuji, and Z.M. Ozsoyoglu. A
complete translation from sparql into ecient sql. In Proceedings
of the 2009 International Database Engineering & Applications
Symposium, pages 31-42. ACM, 2009.
• A. Chebotko, S. Lu, and F. Fotouhi. Semantics preserving
sparql-to-sql translation. Data & Knowledge Engineering,
68(10):973-1000, 2009.
17
20. An example. BSBM08
NATIVE
SELECT r.title, r.text, r.reviewDate, p.personID, p.name, r.rating1, r.rating2, r.rating3, r.rating4
FROM review r, person p
WHERE r.productID=55547 AND r.personID=p.personID AND r.language='en'
ORDER BY r.reviewDate desc
CHEBOTKO
SELECT var_rating2 AS rating2, var_reviewerName AS reviewerName, var_title AS title, var_rating1
AS rating1, var_reviewDate AS reviewDate, var_reviewer AS reviewer, var_rating3 AS rating3,
var_rating4 AS rating4, var_text AS text
FROM (SELECT *
FROM (SELECT uri_rating41477446315 AS uri_rating41477446315, var_rating2 AS var_rating2,
var_reviewer AS var_reviewer, uri_reviewDate750573656 AS uri_reviewDate750573656, var_rating4
AS var_rating4, var_rating1 AS var_rating1, var_text AS var_text, uri_title1963229325 AS
uri_title1963229325, var_rating3 AS var_rating3, uri_reviewer2088452952 AS
uri_reviewer2088452952, uri_rating21477446253 AS uri_rating21477446253, uri_text1457367120 AS
uri_text1457367120, uri_rating31477446284 AS uri_rating31477446284, uri_rating11477446222 AS
uri_rating11477446222, uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_reviewDate AS
var_reviewDate, var_title AS var_title, uri_language269987354 AS uri_language269987354,
uri_Product555472014519903 AS uri_Product555472014519903, v_7634.var_review AS var_review,
var_reviewerName AS var_reviewerName, uri_name1396749066 AS uri_name1396749066, var_lang
AS var_lang
FROM (SELECT uri_reviewer2088452952 AS uri_reviewer2088452952, v_6537.var_review AS
var_review, uri_rating11477446222 AS uri_rating11477446222, uri_rating31477446284 AS
uri_rating31477446284, uri_Product555472014519903 AS uri_Product555472014519903,
uri_reviewFor1499735727 AS uri_reviewFor1499735727, var_rating2 AS var_rating2,
20
21. An example. BSBM08
OUR APPROACH
SELECT var_rating2 AS rating2, var_reviewDate AS reviewDate, var_rating4 AS rating4, var_rating1
AS rating1, var_reviewer AS reviewer, var_rating3 AS rating3, var_reviewerName AS reviewerName,
var_text AS text, var_title AS title
FROM (SELECT *
FROM (SELECT v_2660.var_reviewer AS var_reviewer, var_reviewDate AS var_reviewDate,
var_review AS var_review, uri_rating31477446284 AS uri_rating31477446284, uri_rating21477446253
AS uri_rating21477446253, uri_title1963229325 AS uri_title1963229325, var_rating3 AS var_rating3,
uri_reviewDate750573656 AS uri_reviewDate750573656, uri_reviewFor1499735727 AS
uri_reviewFor1499735727, uri_language269987354 AS uri_language269987354,
uri_name1396749066 AS uri_name1396749066, var_rating1 AS var_rating1, var_reviewerName AS
var_reviewerName, var_lang AS var_lang, uri_Product555472014519903 AS
uri_Product555472014519903, var_rating2 AS var_rating2, uri_rating41477446315 AS
uri_rating41477446315, var_title AS var_title, var_rating4 AS var_rating4, var_text AS var_text,
uri_rating11477446222 AS uri_rating11477446222, uri_text1457367120 AS uri_text1457367120,
uri_reviewer2088452952 AS uri_reviewer2088452952
FROM (SELECT v_8722.PERSONID AS var_reviewer, 'http://xmlns.com/foaf/0.1/name' AS
uri_name1396749066, v_8722.NAME AS var_reviewerName
FROM PERSON v_8722
WHERE (v_8722.NAME IS NOT NULL) ) v_2660
INNER JOIN (SELECT v_3353.REVIEWDATE AS var_reviewDate, 'http://www4.wiwiss.fu-
berlin.de/bizer/bsbm/v01/vocabulary/rating1' AS uri_rating11477446222, v_3353.REVIEWID AS
var_review, v_3353.TEXT AS var_text, 'http://purl.org/stuff/rev#reviewer' AS uri_reviewer2088452952,
v_3353.RATING1 AS var_rating1, 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating2'
AS uri_rating21477446253, v_3353.TITLE AS var_title, 'http://www4.wiwiss.fu-
berlin.de/bizer/bsbm/v01/vocabulary/language' AS uri_language269987354, 'http://www4.wiwiss.fu-
berlin.de/bizer/bsbm/v01/vocabulary/reviewDate' AS uri_reviewDate750573656, 'http://www4.wiwiss.fu-
berlin.de/bizer/bsbm/v01/vocabulary/rating3' AS uri_rating31477446284, 'http://www4.wiwiss.fu-
21
23. Ongoing work
• Writing the paper describing our optimisations
• Proposing a comprehensive benchmarking platform
to test R2RML-compliant query rewriting systems
• Extending our current work on the R2RML implementation
testcases
23
24. 3. Ontology-based
sensor query rewriting
In other words, what happens if our data sources are
not static, but data streams. Can we still use similar
techniques?
24
26. Data from the Web
Flood risk alert:
South East England
Emergency
I have to make
planner
sense out of all this
data
wave data Environmental
forecasts defenses
Heterogeneity
Continuous querying
Streaming data
26
27. Ingredients for Linked Sensor Data
Core ontological model
Additional domain ontologies
Guidelines for generation of identifiers
Sensor Web programming interfaces
Query processing engines
http://www.flickr.com/photos/santos/2252824606/
28. Overview of the SSN ontology
Deployment deploymentProcesPart only System OperatingRestriction
hasSubsystem only, some hasSurvivalRange only
SurvivalRange
DeploymentRelatedProcess
hasDeployment only
System
OperatingRange
Deployment deployedSystem only hasOperatingRange only
deployedOnPlatform only Process
inDeployment only Device hasInput only
Input
PlatformSite onPlatform only Device Process
Platform Output
attachedSystem only hasOutput only, some
Data Skeleton
isProducedBy some implements some
Sensor
Sensing
hasValue some sensingMethodUsed only
SensorOutput
detects only
SensingDevice observes only
ObservationValue SensorInput
isProxyFor only
Property
includesEvent some isPropertyOf some
observedProperty only
observationResult only
observedBy only hasProperty only, some
Observation FeatureOfInterest
featureOfInterest only
MeasuringCapability ConstraintBlock
hasMeasurementCapability only forProperty only
inCondition only inCondition only
MeasurementCapability Condition
Compton M, Barnaghi P, Bermúdez L, García-Castro R, Corcho O, Cox S, Graybeal J, Hauswirth M, Henson C, Herzog A,
Huang V, Janowicz K, Kelsey WD, Le Phuoc D, Lefort L, Leggieri M, Neuhaus H, Nikolov A, Page K, Passant A, Sheth A,
Taylor K. The SSN Ontology of the W3C Semantic Sensor Network Incubator Group. Journal of Web Semantics. In press
29. SSN Ontology with other Ontologies
García-Castro R, Corcho O, Hill C. A Core Ontology Model for Semantic Sensor Web Infrastructures.
International Journal of Semantic Web and Information Systems 8(1):22-42
29
30. Queries to Sensor Data
SNEEql
RSTREAM SELECT id, speed, direction FROM wind [NOW];
Data Stream Mgmt System
Esper QL
SELECT wind_speed FROM wind_sensor.win:time(10 min)
Complex Event Processors
GSN RESTful service
http://montblanc.slf.ch:22001/multidata?vs[0]=wind_sensor&field[0]=wind_speed&
from=15/09/2011+05:00:00&to=15/09/2011+15:00:00
Pachube RESTful service
http://api.pachube.com/v2/feeds/14321/datastreams/4?start=2011-09-
02T14:01:46Z&end=2011-09-02T17:01:46Z
Sensor Data Middleware
Querying through ontologies?
30
31. SPARQL-Stream
SELECT ?windspeed ?tidespeed
FROM NAMED STREAM <http://swiss-experiment.ch/data#WannengratSensors.srdf>
[NOW-10 MINUTES TO NOW-0 MINUTES]
WHERE {
?WaveObs a ssn:Observation;
ssn:observationResult ?windspeed;
ssn:observedProperty sweetSpeed:WindSpeed.
?TideObs a ssn:Observation;
ssn:observationResult ?tidespeed;
ssn:observedProperty sweetSpeed:TideSpeed.
FILTER (?tidespeed<?windspeed)}
Query processing closer to data
Use ontologies as conceptual model
Query virtual stream graphs
31
32. SPARQL-Stream
SELECT ?name ( AVG(?temperature) AS ?avgTemperature )
( AVG(?humidity) AS ?avgHumidity )
FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS SLIDE 1 HOURS]
FROM <http://www.cwi.nl/SRBench/sensors>
FROM <http://www.cwi.nl/SRBench/geonames>
WHERE {
?sensor om-owl:generatedObservation ?temperatureObservation;
Aggregates
om-owl:generatedObservation ?humidityObservation;
Static & Streaming
om-owl:hasLocatedNearRel [ om-owl:hasLocation ?nearbyLocation ] .
?temperatureObservation om-owl:observedProperty weather:_AirTemperature ;
om-owl:result [ om-owl:floatValue ?temperature ] .
?humidityObservation om-owl:observedProperty weather:_RelativeHumidity ;
{ SELECT ?name
om-owl:result [ om-owl:floatValue ?humidity ] .
Windows
WHERE {
Filters, Functions
?nearbyLocation gn:featureClass ?featureClass ;
gn:name | gn:officialName ?name ;
gn:population ?population .
FILTER ( ?population > 15000 && REGEX(?featureClass, “P” , “i") )
}
}
UNION
{ SELECT ?name
WHERE {
Disclaimer: some features NYI
?nearbyLocation gn:parentFeature+ ?parentFeature .
?parentFeature gn:featureClass ?parentClass ;
gn:name | gn:officialName ?name ;
gn:population ?parentPopulation .
FILTER ( ?parentPopulation > 15000 && REGEX(?parentClass, “P” , “i") )
}
}} GROUP BY ?name
32
33. Querying the Observations
SELECT ?waveheight
FROM STREAM <www.ssg4env.eu/SensorReadings.srdf>
[NOW -10 MINUTES TO NOW STEP 1 MINUTE]
WHERE {
?WaveObs a sea:WaveHeightObservation;
sea:hasValue ?waveheight; }
http://montblanc.slf.ch :22001/ multidata ?vs [0]= wan7 &
field [0]= sp_wind
Query
:Wan4WindSpeed a rr:TriplesMapClass;
rr:tableName "wan7"; Rewriting GSN
SPARQLStream
rr:subjectMap [ rr:template
API
"http://swissex.ch/ns#WindSpeed/Wan7/
{timed}";
Mappings
rr:class ssn:ObservationValue;
Query
rr:graph ssg:swissexsnow.srdf ]; Processing
rr:predicateObjectMap [ Sensor
Client
rr:predicateMap [ rr:predicate Network
ssn:hasQuantityValue ];
rr:objectMap[ rr:column "sp_wind" ] ];
Data [tuples]
[triples] translation
R2RML Query processing
Mappings
engines
33
34. Rewriting to different technologies
SELECT ?windspeed
FROM NAMED STREAM <http://swiss-
experiment.ch/data#WannengratSensors.srdf>
[NOW-10 MINUTE TO NOW-0 MINUTE]
WHERE { Query
?WaveObs a ssn:Observation; Rewriting
ssn:observationResult ?windspeed;
Algebra
ssn:observedProperty sweetSpeed:WindSpeed.
} representation
SELECT wind_speed_scalar_av, timed FROM wan7.win:time(10
min)
Esper (CEP)
SELECT wan7.wind_speed_scalar_av AS windspeed, wan7.timed AS
windts FROM wan7[FROM NOW-10 MINUTES TO NOW]
SNEE (DSMS)
http://montblanc.slf.ch:22001/multidata?vs[0]=wan7&
field[0]=wind_speed_scalar_av&
from=15/05/2011+05:00:00&to=15/05/2011+15:00:00 GSN (Middleware)
http://api.pachube.com/v2/feeds/14321/datastreams/4?start=2011-09-
02T14:01:46Z&end=2011-09-02T17:01:46Z Pachube (Middleware)
Calbimonte JP, Corcho O, Yeung H, Aberer K. Enabling Query Technologies for the Semantic Sensor Web.
International Journal of Semantic Web and Information Systems 8(1):43-63
34
35. Ongoing work
• Benchmarking of ontology-based streaming data
engines
• Zhang Y, Pham MD, Corcho O, Calbimonte JP. SRBench: A
Streaming RDF/SPARQL Benchmark. Proceedings of the
11th International Semantic Web Conference (ISWC2012)
• Improve optimisations when joining static and
streaming data
• Automatic characterisation of sensor data streams
• Useful in citizen science approaches (e.g., AirQualityEgg)
• Calbimonte JP, Yan Z, Jeung H, Corcho O, Aberer K.
Deriving Semantic Sensor Metadata from Raw
Measurements. ISWC2012 5th International Workshop on
Semantic Sensor Networks 2011 (SSN2012). CEUR
Workshop Proceedings, Vol-904, http://ceur-ws.org/Vol-904/
35
36. 4. Federated query
processing
In other words, how can we access data from federated
data sources
36
37. Example
• We query the life science domain
1. Using the Pubmed references obtained from the GeneID
gene dataset, retrieve information about genes and their
references in the Pubmed dataset.
2. From Pubmed we access the information in the National
Library of Medicines controlled vocabulary thesaurus,
stored at the MeSH endpoint, so we have more complete
information about such genes.
3. Finally, we also access the HHPID endpoint, which is the
knowledge base for the HIV-1 protein.
37
38. Introduction
• Question:
• How can we access such amount of RDF data in an
integrated manner?
• Current approaches
• Replicate data in local stores, access it using existing RDF
databases.
• Execute individual queries and manually join data.
• Use existing distributed query systems (starting to appear).
38
39. Problem
• Existing tools for distributed SPARQL query
processing differ in the way of handling distribution
• SPARQL-published the Federated Query Document Last
Call Working Draft
• It homogenises the access to distributed RDF data
repositories
• SERVICE <http://dbpedia.org/sparql> {...}
• Problems in semantics: SERVICE ?X not well defined
• Current Access to SPARQL endpoints is not optimal
• Work on SPARQL distributed query optimization is beginning
39
40. State of the Art
• ANAPSID, RDF::Query, OpenAnzo, ARQ, Rasqal
RDF Query Library
• ANAPSID provides SPARQL optimization based on
adaptive query processing operators
• RDF::Query provides basic pattern reordering
• Implement the federation using query
predicates
• List of SPARQL endpoints needed
• Helps user to direct queries to
remote datasets
• FedX, SPLENDID, SemWIQ,
NetworkedGraphs
• All provide basic optimisations: pattern
grouping (FedX), cost based
optimizations(SemWIQ, SPLENDID and
recently FedX, NetworkedGraphs)
• SPARQL 1.1 is mostly syntactic sugar
40
41. Assumptions & Restrictions
• Assumptions
1. Users know how to create a
query to the endpoints
2. No statistics of any kind are
available for the query
processing system.
3. Data are distributed
• Restrictions
1. We only consider the
Federation Extension of
SPARQL 1.1
2. We are not aware of the
capabilities or implementation
of the remote SPARQL server
3. No registry of endpoints
41
42. SERVICE Semantics
Example:
SELECT ?name ?email SELECT ?name ?email
WHERE { WHERE {
?y :name ?name . SERVICE <http://example1.org/sparql>
?y :email ?email {?y :name ?name} .
} SERVICE <http://example2.org/sparql>
{?y :email ?email}
}
• We extend [PAG09] with the semantics for SERVICE:
42
44. SPARQL Optimisation - OPTIONAL
• We assume that we have no statistics of endpoints
• This means that we cannot use cost-based optimisations
• We will only focus on static optimisations
• Besides the usual static optimisations (e.g. Pushing
down filters) SPARQL queries can be optimised if
they contain OPTIONAL operators
• The OPTIONAL operator is responsible for PSPACE-
completeness in SPARQL [PAG09]
• OPTIONAL is a key operator in SPARQL
44
46. Well-designed Patterns
• We extended the notion of well-designed patterns for
the SPARQL 1.1 Federation Extension
• The previous rules also hold for SERVICE
46
47. Implementation: SPARQL-DQP
• SPARQL-DQP is implemented on top of OGSA-DAI and OGSA-
DQP
• OGSA-DAI is a Web service-based framework for accessing
distributed data resources
• OGSA-DQP adds distributed query processing infrastructure
• We reuse some OGSA-DQP operators
• We added RDF and SPARQL endpoint data access
• RDFB2RDF data resource
• RDF data resource
• SPARQL endpoint resources
• Good behaviour for large
datasets
Buil C, Arenas M, Corcho O. Semantics and
optimization of the SPARQL 1.1 federation
extension. Proceedings of the 8th Extended
Semantic Web Conference (ESWC2011).
Springer-Verlag LNCS 6644, pages 1-15
47
48. Ongoing Work
• An extensive benchmark has been produced
• Montoya G, Vidal ME, Corcho O, Ruckhaus E, Buil-Aranda
C. Benchmarking Federated SPARQL Query Engines: Are
Existing Testbeds Enough? In: Proceedings of the 11th
International Semantic Web Conference (ISWC2012)
• Focusing now on Adaptive Query Processing
• Query Processing should be adapted to the user's specific
needs and specific network requirements
48
49. 5. Entailment in query
rewriting
In other words, how can we take into account the
existence of ontologies in the query rewriting process,
so as to provide simple entailment
49
50. Main approaches in the state of the art
Expressiveness Author System Output
[R] Datalog,
ELHIO¬ Pérez-Urbina et al. REQUIEM
UCQ
Sticky-join [linear] datalog± Gottlob et al. Nyaya UCQ
DL-LiteR, DL-LiteF Calvanese et al. QuOnto UCQ
DL-LiteR Chortaras et al. Rapid UCQ
Presto & NR-Datalog &
DL-LiteR [+EBox] Rosati et al.
Prexto UCQ
50
51. Optimizations in the rewriting
• The rewriting can be optimized in
several ways
• Ontology preprocessing
• Subsumption checks
• Prioritize inferences
• Constrain the searches
51
53. Conclusion and Future Work
• We have proposed some small incremental
improvements over the current state of the art in
entailment-aware query rewriting
• Need to integrate it with the rest of our work
• This will happen during Fall 2012
53
55. Ingredients
100
80 thin applications (mas hups )
middleware
60 5 Reasoning Este
40
s emantic data integration and querying
Oeste
20 1 RDB2RDF
Norte
0
1er 3er 3 legacy
Sensor-based
2 query rewriting
Optimisations
data s ources
trim. regis tries
s ens or networks trim.
Federated Query
4 Processing
Linked Open Data Spreadsheets
55
56. Data integration at our group:
ingredients and some
prospects
Credible workshop
Sophia-Antipolis, October 15th 2012
Oscar Corcho
ocorcho@fi.upm.es
Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo s/n. 28660 Boadilla del Monte, Madrid, Spain
With contributions from: José Mora (OEG-UPM), Boris Villazón-Terrazas (OEG-UPM, now at
iSOCO), Jean Paul Calbimonte (OEG-UPM), Freddy Priyatna (OEG-UPM), Carlos Buil-Aranda
(OEG-UPM, now at PUC Chile)