Money and technology…goodness, wouldn’t it be great to have lots of both? In the words of a famous science fiction writer, William Gibson, “..The future has arrived—it’s just not evenly distributed yet.” In other words, off-the-shelf techniques and equipment exist to take airway management to the next level. Do we need “new” equipment, or do we need a “new” plan to use the existing equipment? Let’s briefly discuss both recent new technologies and some new plans.
Approximately 5 months after this lecture was delivered at SMACC 2015, the Difficult Airway Society of Great Britain (DAS) published its revised guidelines on difficult airway management (coinciding with the World Airway Meeting in Dublin, Ireland in November 2015). The new guideline was simple, straightforward, and relied on the major planning points of a 4-step progression from “ plan “A” (intended to represent a best initial approach to plan “D”, representing the prompt performance of a surgical airway. “A-B-C-D” is thus the progression of the DAS plan. What is interesting about the DAS guideline is:
#1—The lack of readiness of most Emergency Medicine and Critical Care practitioners to formulate and implement a plan “B” (Use of Supraglottic airway for ventilation rescue and as an intubation conduit), and
#2—The lack of applicability of plan “C” to the same group, namely, the possibility of allowing the patient to awaken from anesthesia should an airway attempt prove unsuccessful.
The lecture on the Airway Toolbox pays homage to the great “Plan B”, the legacy of the brilliance of the creator of the Laryngeal Mask Airway and all the brilliant clinicians who developed the techniques to make these types of Supraglottic airways quite potentially the most solid method of difficult airway management that will ever grace the planet. As an add on to SGA’s, consider the use of video endoscopy (bronchoscopy or video stylets) to assist DL and VL.
The document discusses the National Digital Library Program (NDLP) in Pakistan. It provides the following key points:
1. The NDLP is a flagship program of the Higher Education Commission of Pakistan that provides online resources to over 250 public and private institutions through subscriptions with INASP.
2. A digital library provides networked access to digital materials like text, images, video and data, with search features to support research and teaching while ensuring long-term preservation.
3. The NDLP provides access to 31 databases with over 25,000 full-text journals and over 50,000 e-books from 220 publishers to beneficiary institutions in Pakistan.
Ontop: Answering SPARQL Queries over Relational DatabasesGuohui Xiao
We present Ontop, an open-source Ontology-Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA, which avoids materializing triples and is implemented through the query rewriting technique, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL 2 QL and RDFS ontologies), and its support for all major relational databases.
A tutorial on how to create mappings using ontop, how inference (OWL 2 QL and RDFS) plays a role answering SPARQL queries in ontop, and how ontop's support for on-the-fly SQL query translation enables scenarios of semantic data access and data integration.
an API for standardised access to Big Earth Observation data in a landscape of a growing number of EO cloud providers
11:40
16/11/2019
With the opening of global archives of Earth Observation data streams from satellites we have arrived at a richness of operationally available observations over the whole globe, starting from the Landsat series of satellites and now the plethora of available data coming through Copernicus and its series of Sentinel satellites, that has never been available before. This created huge opportunities for research and businesses, being able to exploit the temporal domain of those observations in a powerful manner, but also poses challenges in terms of data management and processing capacities. As a consequence, a growing number of cloud services and customized solutions in various research centres have been developed, leading to processing workflows optimized for specific system architectures and back-end infrastructures. As such this is limiting portability and reproducibility of workflows across backends, both for science and business applications. (...)
This document summarizes research into discovering lost web pages using techniques from digital preservation and information retrieval. Key points include:
- Web pages are frequently lost due to broken links or content being moved/removed, but copies may still exist in search engine caches or archives.
- Techniques like lexical signatures (representing a page's content in a few keywords) and analyzing page titles, tags and link neighborhoods can help characterize lost pages and find similar replacement content.
- Experiments showed that lexical signatures degrade over time but page titles are more stable, and combining techniques improves performance in locating replacement content. The goal is to develop a browser extension to help users find lost web pages.
Money and technology…goodness, wouldn’t it be great to have lots of both? In the words of a famous science fiction writer, William Gibson, “..The future has arrived—it’s just not evenly distributed yet.” In other words, off-the-shelf techniques and equipment exist to take airway management to the next level. Do we need “new” equipment, or do we need a “new” plan to use the existing equipment? Let’s briefly discuss both recent new technologies and some new plans.
Approximately 5 months after this lecture was delivered at SMACC 2015, the Difficult Airway Society of Great Britain (DAS) published its revised guidelines on difficult airway management (coinciding with the World Airway Meeting in Dublin, Ireland in November 2015). The new guideline was simple, straightforward, and relied on the major planning points of a 4-step progression from “ plan “A” (intended to represent a best initial approach to plan “D”, representing the prompt performance of a surgical airway. “A-B-C-D” is thus the progression of the DAS plan. What is interesting about the DAS guideline is:
#1—The lack of readiness of most Emergency Medicine and Critical Care practitioners to formulate and implement a plan “B” (Use of Supraglottic airway for ventilation rescue and as an intubation conduit), and
#2—The lack of applicability of plan “C” to the same group, namely, the possibility of allowing the patient to awaken from anesthesia should an airway attempt prove unsuccessful.
The lecture on the Airway Toolbox pays homage to the great “Plan B”, the legacy of the brilliance of the creator of the Laryngeal Mask Airway and all the brilliant clinicians who developed the techniques to make these types of Supraglottic airways quite potentially the most solid method of difficult airway management that will ever grace the planet. As an add on to SGA’s, consider the use of video endoscopy (bronchoscopy or video stylets) to assist DL and VL.
The document discusses the National Digital Library Program (NDLP) in Pakistan. It provides the following key points:
1. The NDLP is a flagship program of the Higher Education Commission of Pakistan that provides online resources to over 250 public and private institutions through subscriptions with INASP.
2. A digital library provides networked access to digital materials like text, images, video and data, with search features to support research and teaching while ensuring long-term preservation.
3. The NDLP provides access to 31 databases with over 25,000 full-text journals and over 50,000 e-books from 220 publishers to beneficiary institutions in Pakistan.
Ontop: Answering SPARQL Queries over Relational DatabasesGuohui Xiao
We present Ontop, an open-source Ontology-Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA, which avoids materializing triples and is implemented through the query rewriting technique, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL 2 QL and RDFS ontologies), and its support for all major relational databases.
A tutorial on how to create mappings using ontop, how inference (OWL 2 QL and RDFS) plays a role answering SPARQL queries in ontop, and how ontop's support for on-the-fly SQL query translation enables scenarios of semantic data access and data integration.
an API for standardised access to Big Earth Observation data in a landscape of a growing number of EO cloud providers
11:40
16/11/2019
With the opening of global archives of Earth Observation data streams from satellites we have arrived at a richness of operationally available observations over the whole globe, starting from the Landsat series of satellites and now the plethora of available data coming through Copernicus and its series of Sentinel satellites, that has never been available before. This created huge opportunities for research and businesses, being able to exploit the temporal domain of those observations in a powerful manner, but also poses challenges in terms of data management and processing capacities. As a consequence, a growing number of cloud services and customized solutions in various research centres have been developed, leading to processing workflows optimized for specific system architectures and back-end infrastructures. As such this is limiting portability and reproducibility of workflows across backends, both for science and business applications. (...)
This document summarizes research into discovering lost web pages using techniques from digital preservation and information retrieval. Key points include:
- Web pages are frequently lost due to broken links or content being moved/removed, but copies may still exist in search engine caches or archives.
- Techniques like lexical signatures (representing a page's content in a few keywords) and analyzing page titles, tags and link neighborhoods can help characterize lost pages and find similar replacement content.
- Experiments showed that lexical signatures degrade over time but page titles are more stable, and combining techniques improves performance in locating replacement content. The goal is to develop a browser extension to help users find lost web pages.
An accademic project developed for the Web Information Retrieval class at Sapienza Università di Roma, Master Degree in Computer Engineering A.Y. 2019-20
The goal of the project is to rank computer scientists based on their influence using Google's PageRank and HITS algorithm.
For further information you can check our repository:
https://github.com/LucaTomei/Computer_Scientists
Members of the project:
Luca Tomei
Andrea Aurizi
Daniele Iacomini
The document discusses PostgreSQL and its capabilities. It describes how PostgreSQL was created in 1982 and became open source in 1996. It discusses PostgreSQL's support for large databases, high-performance transactions using MVCC, ACID compliance, and its ability to run on most operating systems. The document also covers PostgreSQL's JSON and NoSQL capabilities and provides performance comparisons of JSON, JSONB and text fields.
Aqua Browser Implementation at Oklahoma State Universityyouthelectronix
On Wednesday November 7th Dr. Anne Prestamo discussed "AquaBrowser Implementation at Oklahoma
State Univerity Library" as part of a program on Next Generation Catalogs held at at the University of Massachusetts at Amherst and co-sponsored by the Five Colleges' Librarians
Council and Simmons College Graduate School of Library and Information Science (GSLIS).
Learning To Rank has been the first integration of machine learning techniques with Apache Solr allowing you to improve the ranking of your search results using training data.
One limitation is that documents have to contain the keywords that the user typed in the search box in order to be retrieved(and then reranked). For example, the query “jaguar” won’t retrieve documents containing only the terms “panthera onca”. This is called the vocabulary mismatch problem.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s information need without necessarily containing those query terms; it learns the similarity of terms and sentences in your collection through deep neural networks and numerical vector representation(so no manual synonyms are needed!).
This talk explores the first Apache Solr official contribution about this topic, available from Apache Solr 9.0.
We start with an overview of neural search (Don’t worry - we keep it simple!): we describe vector representations for queries and documents, and how Approximate K-Nearest Neighbor (KNN) vector search works. We show how neural search can be used along with deep learning techniques (e.g, BERT) or directly on vector data, and how we implemented this feature in Apache Solr, giving usage examples!
Join us as we explore this new exciting Apache Solr feature and learn how you can leverage it to improve your search experience!
- The document discusses challenges with analyzing data stored in MongoDB, a NoSQL database, using typical analyst tools which expect structured data.
- It presents an open-source solution to synchronize data from MongoDB to PostgreSQL in real-time, and extract the MongoDB schema to normalize it for analysis in SQL and tools like Superset.
- The stack includes MongoDB Connector to replicate data to PostgreSQL, Pymongo-Schema to define the MongoDB schema, and Doc-manager to translate the data model for PostgreSQL. This allows analysts to work with the data using standard SQL and BI tools.
An investigation of how PostgreSQL and its latest capabilities (JSONB data type, GIN indices, Full Text Search) can be used to store, index and perform queries on structured Bibliographic Data such as MARC21/MARCXML, breaking the dependence on proprietary and arcane or obsolete software products.
Talk presented at FOSDEM 2016 in Brussels on 31/01/2016. This is a very practical & hands-on presentation with example code which is certainly not optimal ;)
Towards a rebirth of data science (by Data Fellas)Andy Petrella
Nowadays, Data Science is buzzing all over the place.
But what is a, so-called, Data Scientist?
Some will argue that a Data Scientist is a person able to report and present insights in a data set. Others will say that a Data Scientist can handle a high throughput of values and expose them in services. Yet another definition includes the capacity to create meaningful visualizations on the data.
However, we enter an age where velocity is a key. Not only the velocity of your data is high, but the time to market is shortened. Hence, the time separating the moment you receive a set of data and the time you’ll be able to deliver added value is crucial.
In this talk, we’ll review the legacy Data Science methodologies, what it meant in terms of delivered work and results.
Afterwards, we’ll slightly move towards different concepts, techniques and tools that Data Scientists will have to learn and appropriate in order to accomplish their tasks in the age of Big Data.
The dissertation is closed by exposing the Data Fellas view on a solution to the challenges, specially thanks to the Spark Notebook and the Shar3 product we develop.
The document discusses the Semantic Web and declarative knowledge representation in information technology. It provides an introduction to key concepts including semantics, ontologies, rules, and logic-based knowledge representation. It also outlines technologies that make up the Semantic Web such as RDF, RDF Schema, OWL, and SPARQL. The goal of these technologies is to represent information on the web in a structured, machine-readable format in order to enable automated processing of data.
What is a distributed data science pipeline. how with apache spark and friends.Andy Petrella
What was a data product before the world changed and got so complex.
Why distributed computing/data science is the solution.
What problems does that add?
How to solve most of them using the right technologies like spark notebook, spark, scala, mesos and so on in a accompanied framework
The document discusses the Semantic Web and Linked Data. It provides an overview of RDF syntaxes, storage and querying technologies for the Semantic Web. It also discusses issues around scalability and reasoning over large amounts of semantic data. Examples are provided to illustrate SPARQL querying of RDF data, including graph patterns, conjunctions, optional patterns and value testing.
Dense Retrieval with Apache Solr Neural Search.pdfSease
This document provides an overview of dense retrieval with Apache Solr neural search. It discusses semantic search problems that neural search aims to address through vector-based representations of queries and documents. It then describes Apache Solr's implementation of neural search using dense vector fields and HNSW graphs to perform k-nearest neighbor retrieval. Functions are shown for indexing and searching vector data. The document also discusses using vector queries for filtering, re-ranking, and hybrid searches combining dense and sparse criteria.
The document provides an overview of the CS101 Introduction to Computing course. It discusses the course objectives to build an appreciation of fundamental computing concepts, proficiency in productivity software, and beginner web development skills. The course structure is outlined, covering these topics over 15 weeks through lectures, readings, and assignments culminating in a midterm and final exam. Assessment is based on assignments, a midterm exam, and a final exam. Key figures in early computing history and capabilities of modern computers are also introduced.
The document proposes a graphical visualization of knowledge articles from sources like Wikipedia to provide an interactive and connected graph representation. It discusses algorithms to extract keywords, calculate weights, and map keywords to links between articles. A mathematical model and feasibility analysis are presented, along with examples of a proposed user interface and publications/conferences where the paper was submitted. The goal is to provide an easily understandable overview of article topics to reduce reading time.
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Sease
The first integrations of machine learning techniques with search allowed to improve the ranking of your search results (Learning To Rank) – but one limitation has always been that documents had to contain the keywords that the user typed in the search box in order to be retrieved. For example, the query “tiger” won’t retrieve documents containing only the terms “panthera tigris”. This is called the vocabulary mismatch problem and over the years it has been mitigated through query and document expansion approaches.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s query without necessarily containing those terms; it avoids the need for long lists of synonyms by automatically learning the similarity of terms and sentences in your collection through the utilisation of deep neural networks and numerical vector representation.
The document outlines an agenda for a conference on Apache Spark and data science, including sessions on Spark's capabilities and direction, using DataFrames in PySpark, linear regression, text analysis, classification, clustering, and recommendation engines using Spark MLlib. Breakout sessions are scheduled between many of the technical sessions to allow for hands-on work and discussion.
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
Searching for information within large sets of unstructured, heterogeneous scientific data can be very challenging unless an inverted index has been created in advance. Several solutions, mainly based on the Hadoop ecosystem, have been proposed to accelerate the process of index construction. These solutions perform well when data are already distributed across the cluster nodes involved in the elaboration. On the other hand, the cost of distributing data can introduce noticeable overhead. We propose ISODAC, a new approach aimed at improving efficiency without sacrificing reliability. Our solution reduces to the bare minimum the number of I/O operations by using a stream of in-memory operations to extract and index heterogeneous data. We further improve the performance by using GPUs and POSIX Threads programming for the most computationally intensive tasks of the indexing procedure. ISODAC indexes heterogeneous documents up to 10.6x faster than other widely adopted solutions, such as Apache Spark.
This document discusses using JSON in Oracle Database 18c/19c. It begins by introducing the presenter and their background. It then covers storing and querying JSON in the database using various SQL and PL/SQL features like JSON_QUERY, JSON_OBJECT, and JSON_TABLE. The document discusses how different SQL data types are converted to JSON. It shows examples of converting rows to CSV, XML, and JSON formats for data exchange. In summary, the document provides an overview of Oracle Database's support for JSON focusing on using it properly for data exchange and storage.
This document provides an overview and introduction to IST 380, a data science course taught by Zach Dodds. The course covers topics like R programming, statistical analysis, machine learning algorithms, and a final project. Students will learn skills in data visualization, predictive modeling, and applying data science techniques to real-world datasets. The course emphasizes hands-on learning through weekly assignments completed in R.
An accademic project developed for the Web Information Retrieval class at Sapienza Università di Roma, Master Degree in Computer Engineering A.Y. 2019-20
The goal of the project is to rank computer scientists based on their influence using Google's PageRank and HITS algorithm.
For further information you can check our repository:
https://github.com/LucaTomei/Computer_Scientists
Members of the project:
Luca Tomei
Andrea Aurizi
Daniele Iacomini
The document discusses PostgreSQL and its capabilities. It describes how PostgreSQL was created in 1982 and became open source in 1996. It discusses PostgreSQL's support for large databases, high-performance transactions using MVCC, ACID compliance, and its ability to run on most operating systems. The document also covers PostgreSQL's JSON and NoSQL capabilities and provides performance comparisons of JSON, JSONB and text fields.
Aqua Browser Implementation at Oklahoma State Universityyouthelectronix
On Wednesday November 7th Dr. Anne Prestamo discussed "AquaBrowser Implementation at Oklahoma
State Univerity Library" as part of a program on Next Generation Catalogs held at at the University of Massachusetts at Amherst and co-sponsored by the Five Colleges' Librarians
Council and Simmons College Graduate School of Library and Information Science (GSLIS).
Learning To Rank has been the first integration of machine learning techniques with Apache Solr allowing you to improve the ranking of your search results using training data.
One limitation is that documents have to contain the keywords that the user typed in the search box in order to be retrieved(and then reranked). For example, the query “jaguar” won’t retrieve documents containing only the terms “panthera onca”. This is called the vocabulary mismatch problem.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s information need without necessarily containing those query terms; it learns the similarity of terms and sentences in your collection through deep neural networks and numerical vector representation(so no manual synonyms are needed!).
This talk explores the first Apache Solr official contribution about this topic, available from Apache Solr 9.0.
We start with an overview of neural search (Don’t worry - we keep it simple!): we describe vector representations for queries and documents, and how Approximate K-Nearest Neighbor (KNN) vector search works. We show how neural search can be used along with deep learning techniques (e.g, BERT) or directly on vector data, and how we implemented this feature in Apache Solr, giving usage examples!
Join us as we explore this new exciting Apache Solr feature and learn how you can leverage it to improve your search experience!
- The document discusses challenges with analyzing data stored in MongoDB, a NoSQL database, using typical analyst tools which expect structured data.
- It presents an open-source solution to synchronize data from MongoDB to PostgreSQL in real-time, and extract the MongoDB schema to normalize it for analysis in SQL and tools like Superset.
- The stack includes MongoDB Connector to replicate data to PostgreSQL, Pymongo-Schema to define the MongoDB schema, and Doc-manager to translate the data model for PostgreSQL. This allows analysts to work with the data using standard SQL and BI tools.
An investigation of how PostgreSQL and its latest capabilities (JSONB data type, GIN indices, Full Text Search) can be used to store, index and perform queries on structured Bibliographic Data such as MARC21/MARCXML, breaking the dependence on proprietary and arcane or obsolete software products.
Talk presented at FOSDEM 2016 in Brussels on 31/01/2016. This is a very practical & hands-on presentation with example code which is certainly not optimal ;)
Towards a rebirth of data science (by Data Fellas)Andy Petrella
Nowadays, Data Science is buzzing all over the place.
But what is a, so-called, Data Scientist?
Some will argue that a Data Scientist is a person able to report and present insights in a data set. Others will say that a Data Scientist can handle a high throughput of values and expose them in services. Yet another definition includes the capacity to create meaningful visualizations on the data.
However, we enter an age where velocity is a key. Not only the velocity of your data is high, but the time to market is shortened. Hence, the time separating the moment you receive a set of data and the time you’ll be able to deliver added value is crucial.
In this talk, we’ll review the legacy Data Science methodologies, what it meant in terms of delivered work and results.
Afterwards, we’ll slightly move towards different concepts, techniques and tools that Data Scientists will have to learn and appropriate in order to accomplish their tasks in the age of Big Data.
The dissertation is closed by exposing the Data Fellas view on a solution to the challenges, specially thanks to the Spark Notebook and the Shar3 product we develop.
The document discusses the Semantic Web and declarative knowledge representation in information technology. It provides an introduction to key concepts including semantics, ontologies, rules, and logic-based knowledge representation. It also outlines technologies that make up the Semantic Web such as RDF, RDF Schema, OWL, and SPARQL. The goal of these technologies is to represent information on the web in a structured, machine-readable format in order to enable automated processing of data.
What is a distributed data science pipeline. how with apache spark and friends.Andy Petrella
What was a data product before the world changed and got so complex.
Why distributed computing/data science is the solution.
What problems does that add?
How to solve most of them using the right technologies like spark notebook, spark, scala, mesos and so on in a accompanied framework
The document discusses the Semantic Web and Linked Data. It provides an overview of RDF syntaxes, storage and querying technologies for the Semantic Web. It also discusses issues around scalability and reasoning over large amounts of semantic data. Examples are provided to illustrate SPARQL querying of RDF data, including graph patterns, conjunctions, optional patterns and value testing.
Dense Retrieval with Apache Solr Neural Search.pdfSease
This document provides an overview of dense retrieval with Apache Solr neural search. It discusses semantic search problems that neural search aims to address through vector-based representations of queries and documents. It then describes Apache Solr's implementation of neural search using dense vector fields and HNSW graphs to perform k-nearest neighbor retrieval. Functions are shown for indexing and searching vector data. The document also discusses using vector queries for filtering, re-ranking, and hybrid searches combining dense and sparse criteria.
The document provides an overview of the CS101 Introduction to Computing course. It discusses the course objectives to build an appreciation of fundamental computing concepts, proficiency in productivity software, and beginner web development skills. The course structure is outlined, covering these topics over 15 weeks through lectures, readings, and assignments culminating in a midterm and final exam. Assessment is based on assignments, a midterm exam, and a final exam. Key figures in early computing history and capabilities of modern computers are also introduced.
The document proposes a graphical visualization of knowledge articles from sources like Wikipedia to provide an interactive and connected graph representation. It discusses algorithms to extract keywords, calculate weights, and map keywords to links between articles. A mathematical model and feasibility analysis are presented, along with examples of a proposed user interface and publications/conferences where the paper was submitted. The goal is to provide an easily understandable overview of article topics to reduce reading time.
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Sease
The first integrations of machine learning techniques with search allowed to improve the ranking of your search results (Learning To Rank) – but one limitation has always been that documents had to contain the keywords that the user typed in the search box in order to be retrieved. For example, the query “tiger” won’t retrieve documents containing only the terms “panthera tigris”. This is called the vocabulary mismatch problem and over the years it has been mitigated through query and document expansion approaches.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s query without necessarily containing those terms; it avoids the need for long lists of synonyms by automatically learning the similarity of terms and sentences in your collection through the utilisation of deep neural networks and numerical vector representation.
The document outlines an agenda for a conference on Apache Spark and data science, including sessions on Spark's capabilities and direction, using DataFrames in PySpark, linear regression, text analysis, classification, clustering, and recommendation engines using Spark MLlib. Breakout sessions are scheduled between many of the technical sessions to allow for hands-on work and discussion.
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
Searching for information within large sets of unstructured, heterogeneous scientific data can be very challenging unless an inverted index has been created in advance. Several solutions, mainly based on the Hadoop ecosystem, have been proposed to accelerate the process of index construction. These solutions perform well when data are already distributed across the cluster nodes involved in the elaboration. On the other hand, the cost of distributing data can introduce noticeable overhead. We propose ISODAC, a new approach aimed at improving efficiency without sacrificing reliability. Our solution reduces to the bare minimum the number of I/O operations by using a stream of in-memory operations to extract and index heterogeneous data. We further improve the performance by using GPUs and POSIX Threads programming for the most computationally intensive tasks of the indexing procedure. ISODAC indexes heterogeneous documents up to 10.6x faster than other widely adopted solutions, such as Apache Spark.
This document discusses using JSON in Oracle Database 18c/19c. It begins by introducing the presenter and their background. It then covers storing and querying JSON in the database using various SQL and PL/SQL features like JSON_QUERY, JSON_OBJECT, and JSON_TABLE. The document discusses how different SQL data types are converted to JSON. It shows examples of converting rows to CSV, XML, and JSON formats for data exchange. In summary, the document provides an overview of Oracle Database's support for JSON focusing on using it properly for data exchange and storage.
This document provides an overview and introduction to IST 380, a data science course taught by Zach Dodds. The course covers topics like R programming, statistical analysis, machine learning algorithms, and a final project. Students will learn skills in data visualization, predictive modeling, and applying data science techniques to real-world datasets. The course emphasizes hands-on learning through weekly assignments completed in R.
Similar to Ontology-Based Data Access with Ontop (20)
Discover the benefits of outsourcing SEO to Indiadavidjhones387
"Discover the benefits of outsourcing SEO to India! From cost-effective services and expert professionals to round-the-clock work advantages, learn how your business can achieve digital success with Indian SEO solutions.
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...APNIC
Adli Wahid, Senior Internet Security Specialist at APNIC, delivered a presentation titled 'Honeypots Unveiled: Proactive Defense Tactics for Cyber Security' at the Phoenix Summit held in Dhaka, Bangladesh from 23 to 24 May 2024.
Securing BGP: Operational Strategies and Best Practices for Network Defenders...APNIC
Md. Zobair Khan,
Network Analyst and Technical Trainer at APNIC, presented 'Securing BGP: Operational Strategies and Best Practices for Network Defenders' at the Phoenix Summit held in Dhaka, Bangladesh from 23 to 24 May 2024.
HijackLoader Evolution: Interactive Process HollowingDonato Onofri
CrowdStrike researchers have identified a HijackLoader (aka IDAT Loader) sample that employs sophisticated evasion techniques to enhance the complexity of the threat. HijackLoader, an increasingly popular tool among adversaries for deploying additional payloads and tooling, continues to evolve as its developers experiment and enhance its capabilities.
In their analysis of a recent HijackLoader sample, CrowdStrike researchers discovered new techniques designed to increase the defense evasion capabilities of the loader. The malware developer used a standard process hollowing technique coupled with an additional trigger that was activated by the parent process writing to a pipe. This new approach, called "Interactive Process Hollowing", has the potential to make defense evasion stealthier.
HijackLoader Evolution: Interactive Process Hollowing
Ontology-Based Data Access with Ontop
1. Ontology-Based Data Access with Ontop
Benjamin Cogrel
benjamin.cogrel@unibz.it
KRDB Research Centre for Knowledge and Data
Free University of Bozen-Bolzano, Italy
Free University of Bozen-Bolzano
IT4BI,
Blois, 22 April 2016
2. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Ontology-Based Data Access (OBDA)
Outline
1 SQL queries over tables can be hard to write manually
2 RDF and other Semantic Web standards
3 Ontology-Based Data Access
4 Optique platform
5 Recent work
6 Conclusion
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (1/40)
3. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Outline
1 SQL queries over tables can be hard to write manually
Toy example
Industrial case: stratigraphic model design
Semantic gap
Solutions
2 RDF and other Semantic Web standards
3 Ontology-Based Data Access
4 Optique platform
5 Recent work
6 Conclusion
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (2/40)
4. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Toy example: University Information System
Relational source
uni1.student
s id first name last name
1 Mary Smith
2 John Doe
uni1.academic
a id first name last name position
1 Anna Chambers 1
2 Edward May 9
3 Rachel Ward 8
uni1.course
c id title
1234 Linear algebra
uni1.teaching
c id a id
1234 1
1234 2
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (2/40)
5. Information need SQL query
1. First and last
names of the students
SELECT DISTINCT "first_name", "last_name"
FROM "uni1"."student"
2. First and last
names of the persons
SELECT DISTINCT "first_name", "last_name"
FROM "uni1"."student"
UNION
SELECT DISTINCT "first_name", "last_name"
FROM "uni1"."academic"
3. Course titles and
teacher names
SELECT DISTINCT co."title", ac."last_name"
FROM "uni1"."course" co ,
"uni1"."academic" ac ,
"uni1"."teaching" teach
WHERE co."c_id" = teach."c_id"
AND ac."a_id" = teach."a_id"
4. All the teachers
SELECT DISTINCT "a_id"
FROM "uni1"."teaching" UNION
SELECT DISTINCT "a_id"
FROM "uni1"."academic"
WHERE "position" BETWEEN 1 AND 8
6. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Integration of a second source
Fusion of two universities
uni2.person
pid fname lname status
1 Zak Lane 8
2 Mattie Moses 1
3 C´eline Mendez 2
uni2.course
cid lecturer lab teacher topic
1 1 3 Information
security
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (4/40)
7. Translation of information needs I
Information need SQL query
1. First and last
names of the stu-
dents
SELECT DISTINCT "first_name", "last_name"
FROM "uni1"."student"
UNION
SELECT DISTINCT "fname" AS "first_name",
"lname" AS "last_name"
FROM "uni2 "." person"
WHERE "status" BETWEEN 1 and 2
2. First and last
names of the persons
SELECT DISTINCT "first_name", "last_name"
FROM "uni1"."student"
UNION
SELECT DISTINCT "first_name", "last_name"
FROM "uni1"."academic"
UNION
SELECT DISTINCT "fname" AS "first_name",
"lname" AS "last_name"
FROM "uni2 "." person"
8. Translation of information needs II
Information need SQL query
3. Course titles and
teacher names
SELECT DISTINCT co."title", ac."last_name"
FROM "uni1"."course" co ,
"uni1"."academic" ac ,
"uni1"."teaching" teach
WHERE co."c_id" = teach."c_id"
AND ac."a_id" = teach."a_id"
UNION
SELECT DISTINCT co." topic" AS "title",
pe." lname" AS "last_name"
FROM "uni2 "." person" pe ,
"uni2 "." course" co
WHERE pe."pid" = co." lecturer"
OR pe."pid" = co." lab_teacher"
9. Translation of information needs III
Information
need
SQL query
4. All the
teachers
SELECT DISTINCT ’uni1/’ || "a_id" AS "id"
FROM "uni1"."teaching"
UNION
SELECT DISTINCT ’uni1/’ || "a_id" AS "id"
FROM "uni1"."academic"
WHERE "position" BETWEEN 1 AND 8
UNION
SELECT DISTINCT ’uni2/’ || "lecturer" AS "id"
FROM "uni2 "." course"
UNION
SELECT DISTINCT ’uni2/’ || "lab_teacher" AS "id"
FROM "uni2 "." course"
UNION
SELECT DISTINCT ’uni2/’ || "pid" AS "id"
FROM "uni2 "." person"
WHERE "status" BETWEEN 6 AND 9
10. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Industrial case: stratigraphic model design
Users: domain experts
∼ 900 geologists et geophysicists
Data collecting: 30-70% of their time
Sources
Exploitation and Production Data Store: ∼ 1500 tables (100s GB)
Norwegian Petroleum Directorate FactPages
OpenWorks
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (8/40)
11. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Designing a new (ad-hoc) query
Geologist
Data sources
IT expert
Information needs SQL query
All norwegian wellbores of this type
nearby this place having a permeability
near this value. [. . . ]
Attributes: completion date, depth, etc.
NB: Simplified information needs
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (9/40)
12. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Designing a new (ad-hoc) query
Geologist
Data sources
IT expert
Information needs SQL query
All norwegian wellbores of this type
nearby this place having a permeability
near this value. [. . . ]
Attributes: completion date, depth, etc.
Takes 4 days in average (with EPDS only)
NB: Simplified information needs
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (9/40)
13. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Anonymized extract of a typical query
SELECT [...]
FROM
db_name.table1 table1,
db_name.table2 table2a,
db_name.table2 table2b,
db_name.table3 table3a,
db_name.table3 table3b,
db_name.table3 table3c,
db_name.table3 table3d,
db_name.table4 table4a,
db_name.table4 table4b,
db_name.table4 table4c,
db_name.table4 table4d,
db_name.table4 table4e,
db_name.table4 table4f,
db_name.table5 table5a,
db_name.table5 table5b,
db_name.table6 table6a,
db_name.table6 table6b,
db_name.table7 table7a,
db_name.table7 table7b,
db_name.table8 table8,
db_name.table9 table9,
db_name.table10 table10a,
db_name.table10 table10b,
db_name.table10 table10c,
db_name.table11 table11,
db_name.table12 table12,
db_name.table13 table13,
db_name.table14 table14,
db_name.table15 table15,
db_name.table16 table16
WHERE [...]
table2a.attr1=‘keyword’ AND
table3a.attr2=table10c.attr1 AND
table3a.attr6=table6a.attr3 AND
table3a.attr9=‘keyword’ AND
table4a.attr10 IN (‘keyword’) AND
table4a.attr1 IN (‘keyword’) AND
table5a.kinds=table4a.attr13 AND
table5b.kinds=table4c.attr74 AND
table5b.name=‘keyword’ AND
(table6a.attr19=table10c.attr17 OR
(table6a.attr2 IS NULL AND
table10c.attr4 IS NULL)) AND
table6a.attr14=table5b.attr14 AND
table6a.attr2=‘keyword’ AND
(table6b.attr14=table10c.attr8 OR
(table6b.attr4 IS NULL AND
table10c.attr7 IS NULL)) AND
table6b.attr19=table5a.attr55 AND
table6b.attr2=‘keyword’ AND
table7a.attr19=table2b.attr19 AND
table7a.attr17=table15.attr19 AND
table4b.attr11=‘keyword’ AND
table8.attr19=table7a.attr80 AND
table8.attr19=table13.attr20 AND
table8.attr4=‘keyword’ AND
table9.attr10=table16.attr11 AND
table3b.attr19=table10c.attr18 AND
table3b.attr22=table12.attr63 AND
table3b.attr66=‘keyword’ AND
table10a.attr54=table7a.attr8 AND
table10a.attr70=table10c.attr10 AND
table10a.attr16=table4d.attr11 AND
table4c.attr99=‘keyword’ AND
table4c.attr1=‘keyword’ AND
table11.attr10=table5a.attr10 AND
table11.attr40=‘keyword’ AND
table11.attr50=‘keyword’ AND
table2b.attr1=table1.attr8 AND
table2b.attr9 IN (‘keyword’) AND
table2b.attr2 LIKE ‘keyword’% AND
table12.attr9 IN (‘keyword’) AND
table7b.attr1=table2a.attr10 AND
table3c.attr13=table10c.attr1 AND
table3c.attr10=table6b.attr20 AND
table3c.attr13=‘keyword’ AND
table10b.attr16=table10a.attr7 AND
table10b.attr11=table7b.attr8 AND
table10b.attr13=table4b.attr89 AND
table13.attr1=table2b.attr10 AND
table13.attr20=’‘keyword’’ AND
table13.attr15=‘keyword’ AND
table3d.attr49=table12.attr18 AND
table3d.attr18=table10c.attr11 AND
table3d.attr14=‘keyword’ AND
table4d.attr17 IN (‘keyword’) AND
table4d.attr19 IN (‘keyword’) AND
table16.attr28=table11.attr56 AND
table16.attr16=table10b.attr78 AND
table16.attr5=table14.attr56 AND
table4e.attr34 IN (‘keyword’) AND
table4e.attr48 IN (‘keyword’) AND
table4f.attr89=table5b.attr7 AND
table4f.attr45 IN (‘keyword’) AND
table4f.attr1=‘keyword’ AND
table10c.attr2=table4e.attr19 AND
(table10c.attr78=table12.attr56 OR
(table10c.attr55 IS NULL AND
table12.attr17 IS NULL))
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (10/40)
14. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Semantic gap
User
Data sources
IT expert
Information needs SQL query
Large semantic gap
Querying over tables
Requires a lot of knowledge about:
1 Magic numbers
(e.g. 1 → full professor)
2 Cardinalities and normal forms
3 Spreading of closely-related
information across many tables
Data integration
Make things (much) worse!
Variety: challenge #1 for
most Big Data initiatives
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (11/40)
15. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
High-level translation
Main bottleneck: translation
of the information needs
. . . into a formal query
Goal
Make such a translation easy
(Ideally: IT expertise not required)
User
Data sources
Mediator 1 ?
Inform.
needs
Reduced semantic gap
High-level
query
Derived
data
Mediator 1 could be a user, an IT expert or a GUI
General approach: two steps
1 Translate the information needs into a high-level query
2 Answer the high-level query automatically
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (12/40)
16. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Choice 1: How to derive data from the data sources
Extract Transform Load (ETL) process
E.g. relational data warehouse, triplestore
User
Data sources
Mediator 1
Data
Warehouse
Inform.
needs
High-level
query
ETL
Virtual views
E.g. virtual databases (Teiid, Apache Drill, Exareme), OBDA (Ontop)
User
Data sources
Mediator 1 Mediator 2
Inform.
needs
High-level
query
Native
query
(SQL)
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (13/40)
17. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Choice 2: How to represent the derived data
New representation Corresponding query language
Relational schema SQL
JSON document Mongo Aggregate, SQL (with e.g. Drill or Teiid)
XML document XPath, XQuery, SQL (with e.g. Teiid)
RDF graph SPARQL
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (14/40)
18. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Ontology-Based Data Access (OBDA)
User
Data sources
Mediator 1
OBDA
system
Inform.
needs
SPARQL
query
Native
query
(SQL)
Choice 1: How to derive data from the DBs
1 Extract Transform Load (ETL) process
2 Virtual views
Choice 2: How to represent the derived data
1 New relational schema, JSON or XML documents
2 Resource Description Framework (RDF)
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (15/40)
19. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Outline
1 SQL queries over tables can be hard to write manually
2 RDF and other Semantic Web standards
RDF
SPARQL
Ontologies
Mappings
3 Ontology-Based Data Access
4 Optique platform
5 Recent work
6 Conclusion
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (16/40)
20. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Resource Description Framework (RDF)
W3C standard
Subject Predicate/Property Object/Attribute
<http://example.org/uni2/person/3>
<http://xmlns.com/foaf/0.1/firstName>
“C´eline”ˆˆxsd:string
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (16/40)
21. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Resource Description Framework (RDF)
W3C standard
Subject Predicate/Property Object/Attribute
<http://example.org/uni2/person/3>
<http://xmlns.com/foaf/0.1/firstName>
“C´eline”ˆˆxsd:string
With the base IRI http://example.org/ and some prefixes:
<uni2/person/3> foaf:lastName “Mendez”ˆˆxsd:string
<uni2/person/1> rdf:type :AssociateProfessor
<uni2/person/1> :givesLecture <uni2/course/1>
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (16/40)
22. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Resource Description Framework (RDF)
W3C standard
Subject Predicate/Property Object/Attribute
<http://example.org/uni2/person/3>
<http://xmlns.com/foaf/0.1/firstName>
“C´eline”ˆˆxsd:string
With the base IRI http://example.org/ and some prefixes:
<uni2/person/3> foaf:lastName “Mendez”ˆˆxsd:string
<uni2/person/1> rdf:type :AssociateProfessor
<uni2/person/1> :givesLecture <uni2/course/1>
Some characteristics
Use global identifiers (IRI)
Fixed arity (ternary)
Self-descriptive
Advanced: blank nodes
RDF graph
Labelled directed graph
Set of triples
Trivial to merge
Advanced: named graph
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (16/40)
23. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
RDF graph
<uni2/person/1>
<uni2/course/1>
:givesLecture
:AssociateProfessor
rdf:type
"Lane"
:lastName
<uni2/person/3>
:givesLab
"Céline"
:firstName
"Mendez"
:lastName
"Information security"
:title
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (17/40)
24. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Semantic Web technologies
Layer cake
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (18/40)
25. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
SPARQL
SPARQL Protocol and RDF Query Language
Title of courses taught by a professor and
professor names
PREFIX : <http :// example.org/voc#>
# Other prefixes omitted
SELECT ?title ?fName ?lName {
?teacher rdf:type :Professor .
?teacher :teaches ?course .
?teacher foaf:lastName ?lName .
?course :title ?title .
OPTIONAL {
?teacher foaf:firstName ?fName .
}
}
Algebra
Basic Graph Patterns
OPTIONAL
UNION
GROUP BY
MINUS
FILTER NOT EXISTS
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (19/40)
27. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Web Ontology Language (OWL)
Some constructs
owl:inverseOf
:isTaughtBy owl:inverseOf :teaches .
<uni2/academic /2> :teaches <uni2/course /1> .
=⇒ <uni2/course /1> :isTaughtBy <uni2/academic /2> .
owl:disjointWith
:Student owl:disjointWith :Professor .
<uni1/academic /19> rdf:type Professor .
<uni1/academic /19> rdf:type Student .
=⇒ Inconsistent RDF graph
owl:sameAs
<uni2/person /2> :sameAs <uni1/academic /21> .
<uni2/person /2> :teaches <uni2/course /1> .
=⇒ <uni1/academic /21> :teaches <uni2/course /1> .
Full OWL 2 is very expressive
Many more constructs
Computation costs become easily prohibitive
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (21/40)
28. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Profile OWL 2 QL
Based on the Description Logic DL-LiteR
Supported constructs
Class and property hierarchies
(rdfs:subClassOf and
(rdfs:subPropertyOf)
Property domain and range
(rdfs:domain, rdfs:range)
Inverse properties (owl:inverseOf)
Class disjunction (owl:disjointWith)
Mandatory participation (advanced)
Not supported
Individual identities
(owl:sameAs)
Cardinality constraints
(functional property,
etc.)
Many other constructs
Summary
Lightweight ontologies
A bit more than RDFS
First-order rewritability
(rewritable into a SQL query)
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (22/40)
29. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Mappings RDB-RDF
Ontop native format (similar to the R2RML standard)
Source (SQL)
SELECT s_id , firstName , lastName
FROM uni1.student
Target (RDF, Turtle-like)
ex:uni1/student /{ s_id} a :Student ;
foaf:firstName "{firstName}"ˆˆxsd:string ;
foaf:lastName "{lastName}"ˆˆxsd:string .
Result
DBs unified into one RDF graph
This graph can be queried with SPARQL
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (23/40)
30. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Mappings RDB-RDF
Other mappings
Object property (:teaches)
Target
(RDF)
ex:uni1/academic /{ a_id} :teaches
ex:uni1/course /{ c_id} .
Source
SELECT *
FROM "uni1"."teaching"
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (24/40)
31. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Mappings RDB-RDF
Other mappings
Object property (:teaches)
Target
(RDF)
ex:uni1/academic /{ a_id} :teaches
ex:uni1/course /{ c_id} .
Source
SELECT *
FROM "uni1"."teaching"
Magic number
Target
(RDF)
ex:uni1/academic /{ a_id} a :FullProfessor .
Source
SELECT * FROM "uni1"."academic"
WHERE "position" = 1
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (24/40)
32. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Outline
1 SQL queries over tables can be hard to write manually
2 RDF and other Semantic Web standards
3 Ontology-Based Data Access
Querying the saturated RDF graph
Query reformulation
SQL query optimization
Ontop
4 Optique platform
5 Recent work
6 Conclusion
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (25/40)
33. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Querying the saturated RDF graph
With SPARQL
RDFR2RML
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (25/40)
34. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Querying the saturated RDF graph
With SPARQL
Saturated RDF graph
Saturation of the RDF graph
derived from the mappings
According to the ontology
constraints
Usually much bigger graph!
RDFR2RML
Saturated
graph
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (25/40)
35. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Querying the saturated RDF graph
With SPARQL
Saturated RDF graph
Saturation of the RDF graph
derived from the mappings
According to the ontology
constraints
Usually much bigger graph!
RDFR2RML
Saturated
graph
Materialized RDF graph
ETL + saturation
− Maintenance
+ Expressive ontology profiles
(like OWL 2 RL)
Virtual RDF graph
Query reformulation
+ No materialization
− Limited profiles like OWL 2 QL*
(*) Includes an inference mechanism not present in the OWL 2 RL profile
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (25/40)
36. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Query reformulation
Rewriting
(optional)
Query unfolding
based on
saturated
mappings
Optimization
SPARQL SPARQL SQL SQL
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (26/40)
37. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Query reformulation
Rewriting
(optional)
Query unfolding
based on
saturated
mappings
Optimization
SPARQL SPARQL SQL SQL
Role of the OWL 2 QL ontology
Minor: SPARQL query rewriting (very specific cases)
Main: mapping saturation (offline)
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (26/40)
38. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Query reformulation
Rewriting
(optional)
Query unfolding
based on
saturated
mappings
Optimization
SPARQL SPARQL SQL SQL
Role of the OWL 2 QL ontology
Minor: SPARQL query rewriting (very specific cases)
Main: mapping saturation (offline)
Mapping saturation
Query containment optimization
Not only OWL 2 QL:
Horn fragment of OWL 2 [Botoeva et al., 2016c]
SWRL with linear recursion [Xiao et al., 2014]
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (26/40)
39. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
SQL query optimization
Objective : produce a SQL query. . .
Similar to manually written ones
Adapted to existing query planners
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (27/40)
40. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
SQL query optimization
Objective : produce a SQL query. . .
Similar to manually written ones
Adapted to existing query planners
Structural optimization
From Join-of-unions to
union-of-joins
IRI decomposition to improve
joining performance
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (27/40)
41. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
SQL query optimization
Objective : produce a SQL query. . .
Similar to manually written ones
Adapted to existing query planners
Structural optimization
From Join-of-unions to
union-of-joins
IRI decomposition to improve
joining performance
Semantic optimization
Redundant join elimination
Redundant union elimination
Using functional constraints
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (27/40)
42. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
SQL query optimization
Objective : produce a SQL query. . .
Similar to manually written ones
Adapted to existing query planners
Structural optimization
From Join-of-unions to
union-of-joins
IRI decomposition to improve
joining performance
Semantic optimization
Redundant join elimination
Redundant union elimination
Using functional constraints
Functional constraints
Primary and foreign keys, unique constraints
Implicit in the business processes (Statoil)
Vital for query reformulation!
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (27/40)
43. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Ontop
http://ontop.inf.unibz.it
Ontop framework
Started in 2010
Open-source (Apache 2)
W3C standard compliant
(SPARQL, OWL 2 QL, R2RML)
Supports all major relational DBs
(Oracle, DB2, Postgres, MySQL, etc.)
and some virtual DBs (Teiid, Exareme)
Components
Java APIs
Prot´eg´e extension (GUI)
Sesame endpoint
Integration
Optique platform
Stardog 4.0 (virtual graphs)
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (28/40)
44. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Outline
1 SQL queries over tables can be hard to write manually
2 RDF and other Semantic Web standards
3 Ontology-Based Data Access
4 Optique platform
5 Recent work
6 Conclusion
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (29/40)
45. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Optique platform
User Application
Data sources
Ontop
1. SPARQL query
(high-level)
4. SPARQL results
2. SQL query
(low-level)
3. SQL results
Onto-
logy
Map-
pings
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (29/40)
47. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Outline
1 SQL queries over tables can be hard to write manually
2 RDF and other Semantic Web standards
3 Ontology-Based Data Access
4 Optique platform
5 Recent work
Cross-linked datasets
Beyond OWL 2 QL
MongoDB support
6 Conclusion
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (31/40)
48. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Cross-linked datasets
[Calvanese et al., 2015]
Linking tables
Different identifiers used across datasets
Tables keeping track of the equivalence
Support for linking tables
SPARQL query rewriting
owl:sameAs properties specified in the mappings
Pruning based on incompatible URI templates
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (31/40)
49. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Beyond OWL 2 QL (I)
Framework for rewriting and approximation of OBDA specifications [Botoeva et al., 2016c]
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (32/40)
50. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Beyond OWL 2 QL (II)
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (33/40)
51. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Beyond OWL 2 QL (III)
New tool: ontoprox
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (34/40)
52. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
MongoDB
A popular document database
JSON document described an awarded scientist
{ "_id": 4,
"awards": [ {"award": "Rosing Prize", "year": 1999, "by": "Norwegian Data Association"},
{"award": "Turing Award", "year": 2001, "by": "ACM" },
{"award": "IEEE John von Neumann Medal", "year": 2001, "by": "IEEE"} ],
"birth": "1926-08-27",
"contribs": ["OOP", "Simula"],
"death": "2002-08-10",
"name": {"first": "Kristen", "last": "Nygaard"}
}
Persons who received two awards in the same year
db.bios.aggregate([
{$project : {"name": true, "award1": "$awards", "award2": "$awards" }},
{$unwind: "$award1"}, {$unwind: "$award2"},
{$project: {"name": true, "award1": true, "award2": true,
"twoInOneYear": { $and: [ {$eq: ["$award1.year", "$award2.year"]},
{$ne: ["$award1.award", "$award2.award"]} ]}}},
{$match: {"twoInOneYear": true} },
{$project : {"firstName": "$name.first", "lastName": "$name.last" ,
"awardName1": "$award1.award", "awardName2": "$award2.award",
"year": "$award1.year" }}
])
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (35/40)
53. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
MongoDB support
[Botoeva et al., 2016a] [Botoeva et al., 2016b]
MongoDB
Document database
JSON-like documents
Does not respect first normal form (arrays)
Mongo Aggregation Framework
Query language
Use absolute paths
At least as expressive as relational algebra (MUPGL fragment)
Integration in the OBDA setting
JSON-RDF mapping language
(First normal form) relational views over MongoDB
Translation from relational algebra
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (36/40)
54. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Evolution of the Ontop architecture
For supporting non-relational databases
In red: components that are DB-specific.
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (37/40)
55. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Outline
1 SQL queries over tables can be hard to write manually
2 RDF and other Semantic Web standards
3 Ontology-Based Data Access
4 Optique platform
5 Recent work
6 Conclusion
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (38/40)
56. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Conclusion
Main message: we need high-level access to data
1 SQL queries over tables can be difficult to write manually (low-level)
2 OBDA is a powerful solution for high-level data access
3 Ontop is an open-source OBDA framework
Work in progress
SPARQL aggregation
MongoDB
Better SPARQL OPTIONAL
SPARQL MINUS
Links
Github : ontop/ontop
ontop4obda@googlegroups.com
Twitter : @ontop4obda
http://ontop.inf.unibz.it
http://optique-project.eu
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (38/40)
57. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Ontop team
Diego Calvanese
Guohui Xiao
Elena Botoeva
Roman Kontchakov (Birbeck, London)
Sarah Komla-Ebri
Elem G¨uzel Kalayci
Ugur D¨onmez
Davide Lanti
Dag Hovland (Oslo)
Mariano Rodriguez-Muro (now in IBM Research, NY)
Martin Rezk (now in Rakuten, Tokyo)
Me
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (39/40)
58. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
Introductory resources
Journal paper [Calvanese et al., 2016]
Ontop: Answering SPARQL Queries over Relational Databases.
Diego Calvanese, Benjamin Cogrel, Sarah Komla-Ebri, Roman Kontchakov,
Davide Lanti, Martin Rezk, Mariano Rodriguez-Muro, and Guohui Xiao.
Semantic Web Journal. 2016 http://www.semantic-web-journal.net/content/
ontop-answering-sparql-queries-over-relational-databases-1
Tutorial
https://github.com/ontop/ontop-examples/tree/master/swj-2015
University example
https://github.com/ontop/ontop-examples/tree/master/university
EPNET SPARQL endpoint
http://136.243.8.213/epnet-pleiades-edh/
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (40/40)
59. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
References I
[Botoeva et al., 2016a] Elena Botoeva, Diego Calvanese, Benjamin Cogrel, Martin Rezk, and
Guohui Xiao.
A formal presentation of MongoDB (Extended version).
CoRR Technical Report abs/1603.09291, arXiv.org e-Print archive, 2016.
Available at http://arxiv.org/abs/1603.09291.
[Botoeva et al., 2016b] Elena Botoeva, Diego Calvanese, Benjamin Cogrel, Martin Rezk, and
Guohui Xiao.
OBDA beyond relational DBs: A study for MongoDB.
In International workshop on Description Logic, 2016.
[Botoeva et al., 2016c] Elena Botoeva, Diego Calvanese, Valerio Santarelli, Domenico F.
Savo, Alessandro Solimando, and Guohui Xiao.
Beyond OWL 2 QL in OBDA: Rewritings and approximations.
In Proc. of the 30th AAAI Conf. on Artificial Intelligence (AAAI), 2016.
[Calvanese et al., 2015] Diego Calvanese, Martin Giese, Dag Hovland, and Martin Rezk.
Ontology-based integration of cross-linked datasets.
volume 9366 of LNCS, pages 199–216. Springer, 2015.
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (41/40)
60. SQL queries Semantic Web OBDA Optique platform Recent work Conclusion References
References II
[Calvanese et al., 2016] Diego Calvanese, Benjamin Cogrel, Sarah Komla-Ebri, Roman
Kontchakov, Davide Lanti, Martin Rezk, Mariano Rodriguez-Muro, and Guohui Xiao.
Ontop: Answering SPARQL queries over relational databases.
Semantic Web J., 2016.
DOI: 10.3233/SW-160217.
[Xiao et al., 2014] Guohui Xiao, Martin Rezk, Mariano Rodriguez-Muro, and Diego
Calvanese.
Rules and ontology based data access.
volume 8741 of LNCS, pages 157–172. Springer, 2014.
Benjamin Cogrel (Free University of Bozen-Bolzano) OBDA/Ontop 22/04/2016 (42/40)