LOD2 plenary meeting in Paris: presentation of WP2: State of Play (Storing and Querying Very Large Knowledge Bases) by Peter Boncz (CWI) and Orri Erling (OpenLink Software)
eXTend DB. An embedded extensible document database. Extend with custom queries and object modifiers. Learn More ».
Morph DB. A Key-Value pair database. Allows fast in-place updates / object expansion. Learn More ».
Block Manager
An innovative library which manages on-disk blocks inside a file and provides a very simple interface to be used for variety of on-disk datastructures.
http://sscreation.net.in
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors' most important findings.
Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies.
Today, we count more than 10,000 datasets made available online following Semantic Web standards.
A major and yet unsolved challenge that research faces today is to perform scalable analysis of large-scale knowledge graphs in order to facilitate applications in various domains including life sciences, publishing, and the internet of things.
The main objective of this thesis is to lay foundations for efficient algorithms performing analytics, i.e. exploration, quality assessment, and querying over semantic knowledge graphs at a scale that has not been possible before.
First, we propose a novel approach for statistical calculations of large RDF datasets, which scales out to clusters of machines.
In particular, we describe the first distributed in-memory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark.
Many applications such as data integration, search, and interlinking, may take full advantage of the data when having a priori statistical information about its internal structure and coverage.
However, such applications may suffer from low quality and not being able to leverage the full advantage of the data when the size of data goes beyond the capacity of the resources available.
Thus, we introduce a distributed approach of quality assessment of large RDF datasets.
It is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data.
Based on the knowledge of the internal statistics of a dataset and its quality, users typically want to query and retrieve large amounts of information.
As a result, it has become difficult to efficiently process these large RDF datasets.
Indeed, these processes require, both efficient storage strategies and query-processing engines, to be able to scale in terms of data size.
Therefore, we propose a scalable approach to evaluate SPARQL queries over distributed RDF datasets by translating SPARQL queries into Spark executable code.
We conducted several empirical evaluations to assess the scalability, effectiveness, and efficiency of our proposed approaches.
More importantly, various use cases i.e. Ethereum analysis, Mining Big Data Logs, and Scalable Integration of POIs, have been developed and leverages by our approach.
The empirical evaluations and concrete applications provide evidence that our methodology and techniques proposed during this thesis help to effectively analyze and process large-scale RDF datasets.
All the proposed approaches during this thesis are integrated into the larger SANSA framework.
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies.
A major and yet unsolved challenge that research faces today is to perform scalable analysis of large scale knowledge graphs in order to facilitate applications like link prediction, knowledge base completion, and question answering.
Most machine learning approaches, which scale horizontally (i.e. can be executed in a distributed environment) work on simpler feature vector based input rather than more expressive knowledge structures.
On the other hand, the learning methods which exploit the expressive structures, e.g. Statistical Relational Learning and Inductive Logic Programming approaches, usually do not scale well to very large knowledge bases owing to their working complexity.
This talk gives an overview of the ongoing project Semantic Analytics Stack (SANSA) which aims to bridge this research gap by creating an out of the box library for scalable, in-memory, structured learning.
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)EUDAT
In this hands on session will make use of the scenario demonstrated previously to show how the same results can be achieved programmatically. The ability to use the services via the API are essential to automate the data management process when dealing with large volumes of data, potentially from many different sources. This will require hands on coding (the demonstrations will be given using python but if users are confident they may choose their own language). By the end of this sessions, attendees should be able to understand how to use the B2 services within their own scientific workflows to allow automated data management.
Visit: https://www.eudat.eu/eudat-summer-school
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
eXTend DB. An embedded extensible document database. Extend with custom queries and object modifiers. Learn More ».
Morph DB. A Key-Value pair database. Allows fast in-place updates / object expansion. Learn More ».
Block Manager
An innovative library which manages on-disk blocks inside a file and provides a very simple interface to be used for variety of on-disk datastructures.
http://sscreation.net.in
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors' most important findings.
Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies.
Today, we count more than 10,000 datasets made available online following Semantic Web standards.
A major and yet unsolved challenge that research faces today is to perform scalable analysis of large-scale knowledge graphs in order to facilitate applications in various domains including life sciences, publishing, and the internet of things.
The main objective of this thesis is to lay foundations for efficient algorithms performing analytics, i.e. exploration, quality assessment, and querying over semantic knowledge graphs at a scale that has not been possible before.
First, we propose a novel approach for statistical calculations of large RDF datasets, which scales out to clusters of machines.
In particular, we describe the first distributed in-memory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark.
Many applications such as data integration, search, and interlinking, may take full advantage of the data when having a priori statistical information about its internal structure and coverage.
However, such applications may suffer from low quality and not being able to leverage the full advantage of the data when the size of data goes beyond the capacity of the resources available.
Thus, we introduce a distributed approach of quality assessment of large RDF datasets.
It is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data.
Based on the knowledge of the internal statistics of a dataset and its quality, users typically want to query and retrieve large amounts of information.
As a result, it has become difficult to efficiently process these large RDF datasets.
Indeed, these processes require, both efficient storage strategies and query-processing engines, to be able to scale in terms of data size.
Therefore, we propose a scalable approach to evaluate SPARQL queries over distributed RDF datasets by translating SPARQL queries into Spark executable code.
We conducted several empirical evaluations to assess the scalability, effectiveness, and efficiency of our proposed approaches.
More importantly, various use cases i.e. Ethereum analysis, Mining Big Data Logs, and Scalable Integration of POIs, have been developed and leverages by our approach.
The empirical evaluations and concrete applications provide evidence that our methodology and techniques proposed during this thesis help to effectively analyze and process large-scale RDF datasets.
All the proposed approaches during this thesis are integrated into the larger SANSA framework.
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
Over the past decade, vast amounts of machine-readable structured information have become available through the automation of research processes as well as the increasing popularity of knowledge graphs and semantic technologies.
A major and yet unsolved challenge that research faces today is to perform scalable analysis of large scale knowledge graphs in order to facilitate applications like link prediction, knowledge base completion, and question answering.
Most machine learning approaches, which scale horizontally (i.e. can be executed in a distributed environment) work on simpler feature vector based input rather than more expressive knowledge structures.
On the other hand, the learning methods which exploit the expressive structures, e.g. Statistical Relational Learning and Inductive Logic Programming approaches, usually do not scale well to very large knowledge bases owing to their working complexity.
This talk gives an overview of the ongoing project Semantic Analytics Stack (SANSA) which aims to bridge this research gap by creating an out of the box library for scalable, in-memory, structured learning.
B2SHARE REST API Hands-on - EUDAT Summer School (Hans van Piggelen, SURFsara)EUDAT
In this hands on session will make use of the scenario demonstrated previously to show how the same results can be achieved programmatically. The ability to use the services via the API are essential to automate the data management process when dealing with large volumes of data, potentially from many different sources. This will require hands on coding (the demonstrations will be given using python but if users are confident they may choose their own language). By the end of this sessions, attendees should be able to understand how to use the B2 services within their own scientific workflows to allow automated data management.
Visit: https://www.eudat.eu/eudat-summer-school
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
Although the amount of Linked Data published on the web is steady increasing, its consumption is still mainly limited to technical users and domain experts. Thus, it is necessary to foster intuitive visualizations of Linked Data, in order to support users without a technical background. DBpedia Mobile Explorer is a visualization framework to enable non-experts to visualize Linked Data on mobile devices relying on DBpedia (the Linked Data version of Wikipedia).
The webinar will be based on LODE-BD Recommendations - Linked Open Data (LOD)-enabled bibliographical data- which aims at providing bibliographic data providers of open repositories with a set of recommendations that will support the selection of appropriate encoding strategies for producing meaningful Linked Open Data (LOD)-enabled bibliographical data (LODE-BD).
Querying Mongo Without Programming Using FunqlMongoDB
Querying Mongo Without Programming Using Funql
Hans Marggraff, CTO, Qint Software
Funql is the federated unified query language. It is designed to make it fun to query and aggregate data from non-relational databases. MongoDB is the reference document database used for deloping Funql. This talk explains Funql, and how it maps into the MongoDB document model. We show how easy it can be to access MongoDB even for a non-technical person. We also explain the insights we gained devloping the driver for MongoDB and using the language with MongoDB.
Big Data is an evolution of Business Intelligence (BI).
Whereas traditional BI relies on data warehouses limited in size
(some terabytes) and it hardly manages unstructured data and
real-time analysis, the era of Big Data opens up a new technological
period offering advanced architectures and infrastructures
allowing sophisticated analyzes taking into account these new
data integrated into the ecosystem of the business . In this article,
we will present the results of an experimental study on the performance
of the best framework of Big Analytics (Spark) with the
most popular databases of NoSQL MongoDB and Hadoop. The
objective of this study is to determine the software combination
that allows sophisticated analysis in real time.
Database Integrated Analytics using R InitialExperiences wiOllieShoresna
Database Integrated Analytics using R: Initial
Experiences with SQL-Server + R
Josep Ll. Berral and Nicolas Poggi
Barcelona Supercomputing Center (BSC)
Universitat Politècnica de Catalunya (BarcelonaTech)
Barcelona, Spain
Abstract—Most data scientists use nowadays functional or
semi-functional languages like SQL, Scala or R to treat data,
obtained directly from databases. Such process requires to fetch
data, process it, then store again, and such process tends to
be done outside the DB, in often complex data-flows. Recently,
database service providers have decided to integrate “R-as-a-
Service” in their DB solutions. The analytics engine is called
directly from the SQL query tree, and results are returned as
part of the same query. Here we show a first taste of such
technology by testing the portability of our ALOJA-ML analytics
framework, coded in R, to Microsoft SQL-Server 2016, one of
the SQL+R solutions released recently. In this work we discuss
some data-flow schemes for porting a local DB + analytics engine
architecture towards Big Data, focusing specially on the new
DB Integrated Analytics approach, and commenting the first
experiences in usability and performance obtained from such
new services and capabilities.
I. INTRODUCTION
Current data mining methodologies, techniques and algo-
rithms are based in heavy data browsing, slicing and process-
ing. For data scientists, also users of analytics, the capability
of defining the data to be retrieved and the operations to be
applied over this data in an easy way is essential. This is the
reason why functional languages like SQL, Scala or R are so
popular in such fields as, although these languages allow high
level programming, they free the user from programming the
infrastructure for accessing and browsing data.
The usual trend when processing data is to fetch the data
from the source or storage (file system or relational database),
bring it into a local environment (memory, distributed workers,
...), treat it, and then store back the results. In such schema
functional language applications are used to retrieve and slice
the data, while imperative language applications are used to
process the data and manage the data-flow between systems.
In most languages and frameworks, database connection pro-
tocols like ODBC or JDBC are available to enhance this data-
flow, allowing applications to directly retrieve data from DBs.
And although most SQL-based DB services allow user-written
procedures and functions, these do not include a high variety
of primitive functions or operators.
The arrival of the Big Data favored distributed frameworks
like Apache Hadoop and Apache Spark, where the data is
distributed “in the Cloud” and the data processing can also be
distributed where the data is placed, then results are joined
and aggregated. Such technologies have the advantage of
distributed computing, but when the schema for accessing data
and using it is still the same, ...
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
As part of the final BETTER Hackathon, project partners prepared 4 hackathon exercises. Fraunhofer IAIS organised this exercise in conjunction with external partner MKLab ITI-CERTH (EOPEN project). This step-by-step exercise featured the setup of local Docker images on Linux OS featuring Dcoker Compose and (pre-installed) Python, SANSA, Hadoop, Apache Spark and Apache Zeppelin. It featured semantic transformation and and the use of SANSA (Scalable Semantic Analytics Stack - http://sansa-stack.net/) libraries on a sample of tweets ahead of geo-clustering.
Project website (Hackathon information): https://www.ec-better.eu/pages/2nd-hackathon
Github repository: https://github.com/ec-better/hackathon-2020-semanticgeoclustering
UnifiedViews is a joint project currently maintained by Semantic Web Company (SWC) and Semantica.cz (Semantica.cz). It has been mainly developed by Charles University in Prague as a student project called ODCleanStore (version 2). It is based on the experience SWC obtained with the LOD Management Suite (LODMS) used in WP7 and ODCleansStore (version 1) developed by Charles University in Prague for the WP9a use case of the LOD2 FP7 project. In the next stack release of the LOD2 stack, UnifiedViews will replace LODMS as an ETL tool in the stack and the tool has already been adopted in other projects.
In the webinar we will give a brief overview of the UnifiedViews project (Helmut Nagy). The main part will be a presentation of the tool and it's capabilities (Tomas Knap)
More Related Content
Similar to LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
Although the amount of Linked Data published on the web is steady increasing, its consumption is still mainly limited to technical users and domain experts. Thus, it is necessary to foster intuitive visualizations of Linked Data, in order to support users without a technical background. DBpedia Mobile Explorer is a visualization framework to enable non-experts to visualize Linked Data on mobile devices relying on DBpedia (the Linked Data version of Wikipedia).
The webinar will be based on LODE-BD Recommendations - Linked Open Data (LOD)-enabled bibliographical data- which aims at providing bibliographic data providers of open repositories with a set of recommendations that will support the selection of appropriate encoding strategies for producing meaningful Linked Open Data (LOD)-enabled bibliographical data (LODE-BD).
Querying Mongo Without Programming Using FunqlMongoDB
Querying Mongo Without Programming Using Funql
Hans Marggraff, CTO, Qint Software
Funql is the federated unified query language. It is designed to make it fun to query and aggregate data from non-relational databases. MongoDB is the reference document database used for deloping Funql. This talk explains Funql, and how it maps into the MongoDB document model. We show how easy it can be to access MongoDB even for a non-technical person. We also explain the insights we gained devloping the driver for MongoDB and using the language with MongoDB.
Big Data is an evolution of Business Intelligence (BI).
Whereas traditional BI relies on data warehouses limited in size
(some terabytes) and it hardly manages unstructured data and
real-time analysis, the era of Big Data opens up a new technological
period offering advanced architectures and infrastructures
allowing sophisticated analyzes taking into account these new
data integrated into the ecosystem of the business . In this article,
we will present the results of an experimental study on the performance
of the best framework of Big Analytics (Spark) with the
most popular databases of NoSQL MongoDB and Hadoop. The
objective of this study is to determine the software combination
that allows sophisticated analysis in real time.
Database Integrated Analytics using R InitialExperiences wiOllieShoresna
Database Integrated Analytics using R: Initial
Experiences with SQL-Server + R
Josep Ll. Berral and Nicolas Poggi
Barcelona Supercomputing Center (BSC)
Universitat Politècnica de Catalunya (BarcelonaTech)
Barcelona, Spain
Abstract—Most data scientists use nowadays functional or
semi-functional languages like SQL, Scala or R to treat data,
obtained directly from databases. Such process requires to fetch
data, process it, then store again, and such process tends to
be done outside the DB, in often complex data-flows. Recently,
database service providers have decided to integrate “R-as-a-
Service” in their DB solutions. The analytics engine is called
directly from the SQL query tree, and results are returned as
part of the same query. Here we show a first taste of such
technology by testing the portability of our ALOJA-ML analytics
framework, coded in R, to Microsoft SQL-Server 2016, one of
the SQL+R solutions released recently. In this work we discuss
some data-flow schemes for porting a local DB + analytics engine
architecture towards Big Data, focusing specially on the new
DB Integrated Analytics approach, and commenting the first
experiences in usability and performance obtained from such
new services and capabilities.
I. INTRODUCTION
Current data mining methodologies, techniques and algo-
rithms are based in heavy data browsing, slicing and process-
ing. For data scientists, also users of analytics, the capability
of defining the data to be retrieved and the operations to be
applied over this data in an easy way is essential. This is the
reason why functional languages like SQL, Scala or R are so
popular in such fields as, although these languages allow high
level programming, they free the user from programming the
infrastructure for accessing and browsing data.
The usual trend when processing data is to fetch the data
from the source or storage (file system or relational database),
bring it into a local environment (memory, distributed workers,
...), treat it, and then store back the results. In such schema
functional language applications are used to retrieve and slice
the data, while imperative language applications are used to
process the data and manage the data-flow between systems.
In most languages and frameworks, database connection pro-
tocols like ODBC or JDBC are available to enhance this data-
flow, allowing applications to directly retrieve data from DBs.
And although most SQL-based DB services allow user-written
procedures and functions, these do not include a high variety
of primitive functions or operators.
The arrival of the Big Data favored distributed frameworks
like Apache Hadoop and Apache Spark, where the data is
distributed “in the Cloud” and the data processing can also be
distributed where the data is placed, then results are joined
and aggregated. Such technologies have the advantage of
distributed computing, but when the schema for accessing data
and using it is still the same, ...
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
As part of the final BETTER Hackathon, project partners prepared 4 hackathon exercises. Fraunhofer IAIS organised this exercise in conjunction with external partner MKLab ITI-CERTH (EOPEN project). This step-by-step exercise featured the setup of local Docker images on Linux OS featuring Dcoker Compose and (pre-installed) Python, SANSA, Hadoop, Apache Spark and Apache Zeppelin. It featured semantic transformation and and the use of SANSA (Scalable Semantic Analytics Stack - http://sansa-stack.net/) libraries on a sample of tweets ahead of geo-clustering.
Project website (Hackathon information): https://www.ec-better.eu/pages/2nd-hackathon
Github repository: https://github.com/ec-better/hackathon-2020-semanticgeoclustering
UnifiedViews is a joint project currently maintained by Semantic Web Company (SWC) and Semantica.cz (Semantica.cz). It has been mainly developed by Charles University in Prague as a student project called ODCleanStore (version 2). It is based on the experience SWC obtained with the LOD Management Suite (LODMS) used in WP7 and ODCleansStore (version 1) developed by Charles University in Prague for the WP9a use case of the LOD2 FP7 project. In the next stack release of the LOD2 stack, UnifiedViews will replace LODMS as an ETL tool in the stack and the tool has already been adopted in other projects.
In the webinar we will give a brief overview of the UnifiedViews project (Helmut Nagy). The main part will be a presentation of the tool and it's capabilities (Tomas Knap)
In this Webinar Lorenz Bühmann presents the ontology repair and enrichment tool ORE and also the DL-Learner , a machine learning tool to solve supervised learnings tasks and support knowledge engineers in constructing knowledge. Those two beneighbored tools in the LOD2 Stack are for classification and the following quality analysis of Linked Data.
http://lod2.eu/BlogPost/webinar-series
This webinar in the course of the LOD2 webinar series will present the release 3.0 of the LOD2 stack, which contains updates to
*) Virtuoso 7 [Openlink]: the original row store of the Virtuoso 6 universal server has now been replaced by a column store, increasing the performance of SPARQL queries significantly, the store is now up to three times as fast as the previous major version.
Linked Open Data Manager Suite [SWC]: the 'lodms' application allows the user to quickly set up pipelines for transforming linked data through the use of its many extensions. It also allows operations for extracting rdf from other types of data.
*) dbpedia-spotlight-ui [ULEI]: a graphical user interface component that allows the user to use a remote DBpedia spotlight instance to annotate a text with DBpedia concepts.
*) sparqlify [ULEI]: a scalable SPARQL-SQL rewriter, allowing you to query an SQL database as if it were a triple store.
*) SIREn [DERI]: a Lucene plugin that allows you to efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields.
*) CubeViz [ULEI]: CubeViz allows visualization of the Data Cube linked data representation of statistical data. It has support for the more advanced DataCube features, such as slices. It also allows the selection of a remote SPARQL endpoint and export of a modified cube.
*) R2R [UMA]: the R2R mapping API is now included directly into the lod2 demonstrator application, allowing users to experience the full effect of the R2R semantic mapping language through a graphical user interface.
*) ontowiki-csvimport [ULEI]: an OntoWiki extension that transforms CSV files to RDF. The extension can create Data Cubes that can be visualized by CubeViz.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
(http://lod2.eu/BlogPost/webinar-series) In this Webinar Michael Martin presents CubeViz - a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. This vocabulary is compatible with SDMX and increasingly being adopted. Based on the vocabulary and the encoded Data Cube, CubeViz is generating a facetted browsing widget that can be used to filter interactively observations to be visualized in charts. Based on the selected structure, CubeViz offer beneficiary chart types and options which can be selected by users.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
This webinar in the course of the LOD2 webinar series will present Virtuoso 7. Virtuoso Column Store, Adaptive Techniques for RDF Graph Databases. In this webinar we shall discuss the application of column store techniques to both graph (RDF) and relational data for mixed work-loads ranging from lookup to analytics.
Virtuoso is an innovative enterprise grade multi-model data server for agile enterprises & individuals. It delivers an unrivaled platform agnostic solution for data management, access, and integration. The unique hybrid server architecture of Virtuoso enables it to offer traditionally distinct server functionality within a single product
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
http://lod2.eu/BlogPost/webinar-series
DBpedia Spotlight is a tool employed in the Extraction stage of the LOD Lyfe Cycle, performing Entity Recognition and Linking. Although the tool currently specializes in English language, the support for other languages is currently being tested, and demos for German, Dutch and others are available or underway. The tool can be used to enable faceted browsing, semantic search, among other applications. In this webinar we will describe what is DBpedia Spotlight, how it works and how can you benefit from it in your application.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!
http://lod2.eu/BlogPost/webinar-series
PublicData.eu is striving to become the Pan European one-stop-shop, providing access to open, freely reusable datasets from numerous local, regional and national public bodies across Europe.
After the first release of the PublicData.eu website (Alpha release was Jan 2011 & Beta release was June 2011) and it's subsequent upgrades (a significant upgrade was efected March 2012), OKFN worked towards the deployment of various personalization features, meant to improve the user experience on Publicdata.eu and spur more interest and interaction around the official data-sets.
This webinar in the course of the LOD2 webinar series will present Zemanta and its LODRefine - a LOD-enabled version of OpenRefine (previously Google Refine), which is a part of the LOD2 stack. LODRefine extends cleansing and linking functionalities of OpenRefine by providing means to reconcile and augment your data with DBpedia or any other SPARQL endpoint, extract named entities using Zemanta API, export data in one of the RDF formats, and recently also to exploit available crowdsourcing services. In webinar we will demonstrate several task which demonstrate the ease of use and versatility of LODRefine.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series: http://lod2.eu/BlogPost/webinar-series
This webinar in the course of the LOD2 webinar series will present the implications of Linked Open Data and Semantic Web Technologies in the information and publishing industry.
The publishing industry is struggling with too much information on the one hand and too less resources to bring meaning to this information on the other hand. As an industrial use case partner in LOD2, Wolters Kluwer Deutschland GmbH investigates in detail, how LOD and Semantic Web have the potential to solve this critical issue for their business. The presentation will show what parts of the LOD2 stack are used within the use case and what challenges had to be addressed in the last two years. Interesting future areas like natural language processing will also be mentioned. The topics covered are relevant for any industry that deals with a lot of data and documents, not only publishing.
This series will provide a monthly webinar about Linked (Open) Data tools and services around the LOD2 project, the LOD2 Stack and the Linked Open Data Life Cycle, also in the form of 3rd party tools. Please find continuously updated information here: http://lod2.eu/BlogPost/webinar-series
This webinar in the course of the LOD2 webinar series will present use cases and live demos of PoolParty (by Semantic Web Company).
Knowledge organization systems like taxonomies or thesauri can benefit from linked data approaches and vice versa. In recent years SKOS became very popular in various industries due to its simplicity, SKOS turned out to be the entry point to the Semantic Web. Learn more about the possibilities to link your enterprise metadata with the web of data! Learn more about the possibilities to link your enterprise metadata with the web of data and PoolParty as means for linked data management!
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the LOD2 webinar series!
http://lod2.eu/BlogPost/webinar-series
This webinar in the course of the LOD2 webinar series will present use cases and live demos of D2R (Free University Berlin) and Sparqlify (University of Leipzig).
D2R Server is a tool for publishing relational databases on the Semantic Web. It enables RDF and HTML browsers to navigate the content of the database, and allows applications to query the database using the SPARQL query language.
Sparqlify is a tool enabling one to define expressive RDF views on relational databases and query them with a subset of the SPARQL query language. By featuring a novel RDF view definition syntax, it aims at simplifying the RDB-RDF mapping process.
more to be found at:
Born from the wish to make linking tractable, the Link Discovery Framework for Metric Spaces (LIMES) is tailored towards the time-efficient and lossless discovery of links across knowledge bases. LIMES is an extensible declarative framework that encapsulates manifold algorithms dedicated to the processing of structured data of any sort. Built with extensibility and easy integration in mind, LIMES allows implementing applications that integrate, consume and/or generate Linked Data. Within LOD2, it will be used for discovering links between knowledge bases.
This webinar will be presented by the LOD2 Partner: University of Leipzig (ULEI), Germany.
State of Play presentation at the LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building, Fertilization by Martin Kaltenböck, Semantic Web Company (SWC)
State of Play presentation at the LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public Sector Contracts by Vojtěch Svátek (UEP)
State of Play presentation at the LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Information as Linked Data by Irena Irina Bolychevsky, OKFN
State of Play presentation at the LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web by Amar-Djalil MEZAOUR,Dassault Systèmes Exalead.
More from LOD2 Creating Knowledge out of Interlinked Data (20)
4. WP2 Storing and Querying Very Large Knowledge Bases Goal: enabling large-scale, feature-rich & enterprise-ready Linked Data management solutions Database Partners in LOD2: CWI: Leading open sourceanalytics RDBMS OpenLink: LeadingLinked data deployment platform TechnologicalExcellence: Creating and publishing metrics for choosing RDF solutions Bringing Column Store Technology for Business Intelligence on RDF Ground-breaking database innovations for RDF stores (Dynamic Query optimization, Adaptive Caching of Joins, Optimized Graph Processing, Cluster/Cloud scalability)
5. WP2 Linked Open Data for real in your Apps Business Advantages: Enrichyourapplicationwith (free & rich) Linked Open Data RDF store technology has 10x lowerdeploymentcoststhanrelational for ragged data TechnologicalFlexibility: DeliverSchema-LastFlexibility and Inference at Relational Data WarehouseCost and Performance Grow as you go: the LOD2 platform dynamicallyadapts to yourusagepatterns and structure of your data Integrate, resolve, alignanything: Schema, instanceidentity Rich Features for complex Applications: Advanced SPARQL and SQL query processing SPARQL and SQL Federation Full Text, Geospatial, Text Search Scale-Outon Clusters, Replication
6.
7. Task 2.1: State of the Art, Evaluation & Benchmarking This task reviews the state of the art in RDF and relational analytics databases and creates a laboratory with the leading products of both categories installed. This can serve as a testing and benchmarking resource for constantly measuring the project's progress against the baseline of the best in the market. Benchmarking in LOD2 serves two purposes: measuring the relative cost of RDF versus equivalent relational functionality and measuring RDF performance in applications which are RDF's home terrain, e.g. integration of highly heterogeneous, "ragged" content with alignment at preprocessing/run time by rules and machine learning approaches. For the first case, we can use TPC H and its star schema derivative (SSBM). For the second case, new benchmarks need to be developed, encompassing different functionality.
8. Task 2.1: State of the Art, Evaluation & Benchmarking The benchmarks will be developed primarily during the first year, with work on integration quality metrics extending over the second year. The benchmarks will be run and results published at each milestone of the project. Huge data size scalability (e.g. trillion triples) is expected to require a cluster, most feasibly temporary deployment in a cloud system, and the goal of the DB work in LOD2 is to reduce the cost of deployment as much as possible, by devising techniques that reduce the memory requirements of large RDF deployments. We currently envision Oracle 11g R2, BigOWLIM, YARS, Vertica, AllegroGraph, VectorWise and MonetDB to be deployed in the LOD2 benchmarking laboratory. As benchmarks we envision TPC-H, LUBM, UOMB, BSBM, SP2Bench and, SSBM; and as described above propose the creation of a new benchmark patterned after social networking data.
13. BSBM V3 ResultsBenchmarked at FUB: 4store 1.1.2 Garlik http://4store.org/ BigData r4169 SYSTAP LLC http://www.systap.com/bigdata.htm BigOwlim 3.4.3129 OntoText http://www.ontotext.com/owlim/ Jena TDB 0.8.9 openjena.org http://www.openjena.org/TDB/ Fuseki 0.1.0 openjena.org http://openjena.org/wiki/Fuseki Virtuoso 7.0 OpenLink http://virtuoso.openlinksw.com/ Main new conclusions: we ran into several technical problems for BI. To give the store vendors time to fix and optimize their stores we considered running the tests again in about three or four months. For the next test runs we will also modify query 4, because of its quadratic complexity and therefore bad scalability characteristics.
14.
15. using new or upcoming releases (not yet public)
16. using properly tuned settings and hardware to their solution
34. 12. Return all posts about an event (e.g., Unrest in Tunisia) in 10 recent days. 13. Show all posts about a specific location, e.g., Egypt, in 10 recent days (use the information from dbpedia for identify the location, e.g., Cairo is the capital of Egypt, Tahrir square is in Cairo) 14. Find number of inactive users: all users activated for at least 30 days but did not have any post or all users that do not have any more post for 60 days. 15. Show all photos posted by friends of a user that she was tagged. 6. Show the list of a user's top-10 close friends: Sort the friendship according to the number of photos that a user and her friend are both tagged, then according to number of user’s tags for each friends 17. Find top-10 friends or all friends of friends of you that have common interest (Based on the similarity between the tags in your posts and tags in their posts) 18. What are the current hottest events/problems? (Get the hash tags from posts and order by the number of their appearances in 10 recent days) 19. Which area is the most active area? (Order by the total number of posts in each location in 5 recent days) 20. Return the top-10 locations that have the fastest growth in the number of users. (Count the number of people joined before 10 days and those joined during the 10 recent days, and then, compute the developing rate). Interactive Query Mix (2/2)
41. Add/Remove a comment 6. Remove all the tags to a user from the pictures or posts of her friends 7. Remove all friends of a user who do not have any interaction with her 8. Send group invitation message to top-10 close friends of a user (Write a post of the wall of these users). Update query mix
42. 1. The fastest propagating ideas The topic with the most users who have joined in the last day 2. Wildfire Find the first mentions of a concept in the last day such that it is not mentioned before and occurs in more than 10% of new posts in groups involved with politics. 3. Product advertisement Where and when to advertize Hello Kitty? 4. Challengers Which fictional entities are challengers of Hello Kitty? 5. Potential clients Who are iPhone users or potential iPhone clients? 6. Associated product People who consider/mention about iPhone also mention about which products? 7. Product lifetime When it is the time for releasing the information about the new iPhone version? 8. Troublemakers and duplicates Finding troublemakers and duplicated identities based on behavior patterns 9. Application accounts Accounts created only for applications, e.g., for playing games, in SNs 10. Expert finding Find a user who is expert in computer science and have friends who are expert in Maths and Physics 11. New idol User whose fan page has the fastest increment on the number of members during the last 30 days. Analysis Query Mix