Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications

•Download as ODP, PDF•

0 likes•915 views

Presentation at JIST 2012 -- I forgot to add a link to http://en.wikipedia.org/wiki/Knowledge_extraction I mentioned it during the presentation, because some of their output would be compatible with SPARQL

Technology

Creating Knowledge out of Interlinked Data
JIST 2012 – Page 1 http://lod2.eu

Improving the Performance of the
DL-Learner SPARQL Component for
Semantic Web Applications
Didier Cherix, Sebastian Hellmann, Jens Lehmann

http://slideshare.net/kurzum

http://dl-learner.org
http://lod2.eu

AKSW, Universität Leipzig
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

JIST 2012 – Page 2 http://lod2.eu

Motivation: 2007 - 2012

DL-Learner was developed in parallel to DBpedia at University Leipzig since 2007

DL-Learner is a tool for learning concepts in Description Logics (DLs) from user-
provided examples.

Worked very well for small to medium sized data sets, e.g. Carcinogenesis an other
ML problems from the UCI ML repository

Limit is the capacity of current OWL-DL reasoners

Challenge was (and is) to do reasoning-based, supervized Machine Learning on
the DBpedia Dataset (> 200 Mio triples) or larger datasets

JIST 2012 – Page 3 http://lod2.eu

Introduction DL-Learner

JIST 2012 – Page 4 http://lod2.eu

Introduction DL-Learner

Very large search space

Reasoner instance checks

JIST 2012 – Page 5 http://lod2.eu

Introduction DL-Learner

JIST 2012 – Page 6 http://lod2.eu

Introduction DL-Learner
DL-Learner heavily relies on instance checks for machine learning, so the OWL
Reasoner is the bottle neck

Underlying idea:
Only select relevant data for the Machine Learning Problem based on user-given
examples

→ Reduces the amount of triples that have to be given to a reasoner
→ Reduces complexity and size of the OWL schema

Brute-force approach:
Load all data into the OWL Reasoner, then do instance checks
→ infeasible for Dbpedia

Iterative approach (old component):
Iterate over all instances and fetch the data recursively
→ inefficient even with caching

JIST 2012 – Page 7 http://lod2.eu

Introduction DL-Learner

JIST 2012 – Page 8 http://lod2.eu

Introduction DL-Learner

JIST 2012 – Page 9 http://lod2.eu

Introduction DL-Learner

JIST 2012 – Page 10 http://lod2.eu

Introduction DL-Learner

JIST 2012 – Page 11 http://lod2.eu

Introduction DL-Learner

JIST 2012 – Page 12 http://lod2.eu

Introduction DL-Learner

Challenge:
What is the most efficient way to retrieve such a fragment?

JIST 2012 – Page 13 http://lod2.eu

Improvements of the New Component

• Step 1: Indexing the T-Box:
• Download the OWL Schema and index it in memory
• either via SPARQL or OWL file

JIST 2012 – Page 14 http://lod2.eu

Improvements of the New Component

• Step 2: A-Box Queries

Parameter recursion depth:
Retrieve newly discovered bindings to ?o until a certain depth is reached.

JIST 2012 – Page 15 http://lod2.eu

Improvements of the New Component

• Step 3: Typing the retrieved instances

JIST 2012 – Page 16 http://lod2.eu

Improvements of the New Component

• Step 4: T-Box Index:
All “relevant” T-Box information is added via the index to the fragment.
For each class already in the fragment. all superclasses and their
equivalentClass axioms are added

JIST 2012 – Page 17 http://lod2.eu

Benchmarking - Speed

For each class in DBpedia Ontology:
- 30 instances as positives
- 30 negatives from a sister class

JIST 2012 – Page 18 http://lod2.eu

Benchmarking – F-Measure on the training data

70% of the results for each class
had an F-measure of 90-100% on the training data

JIST 2012 – Page 19 http://lod2.eu

SPARQL Retrieval Component Impact

• DL-Learner – http://dl-learner.org
• DBpedia Navigator
• Tiger Corpus Navigator
• AutoSPARQL - http://autosparql.dl-learner.org/
• HANNE – http://hanne.aksw.org
• ORE - http://aksw.org/Projects/ORE

Sebastian Hellmann, Jens Lehmann und Sören Auer:
Learning of OWL Class Descriptions on Very Large Knowledge Bases
In: International Journal on Semantic Web and Information Systems, 2009

Web Applications
Active Learning → User Interaction and Feedback

JIST 2012 – Page 20 http://lod2.eu

Future Work

• Research Paper in Session 4b (tomorrow at 15:10)
Navigation-induced Knowledge Engineering by Example
• Caching + more sophisticated options
• Large scale learning problems

http://slideshare.net/kurzum

Homepage: http://dl-learner.org
Source code:
http://sourceforge.net/projects/dl-learner/

JIST 2012 – Page 21 http://lod2.eu

Example

Sebastian Hellmann, Jens Lehmann, Jörg Unbehauen, Claus Stadler, Thanh Nghia Lam und Markus
Strohmaier: Navigation-induced Knowledge Engineering by Example
In: JIST 2012

JIST 2012 – Page 22 http://lod2.eu

Example

Sebastian Hellmann, Jens Lehmann und Sören Auer:
Learning of OWL Class Descriptions on Very Large Knowledge Bases
In: International Journal on Semantic Web and Information Systems, 2009

JORUM is an online repository for learning and teaching materials started as a research project in 2002 and now provides a service to further and higher education institutions. It uses the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to allow its content to be harvested by other systems, but does not do any harvesting itself. Content comes from various sources, including JISC projects, and is harvested back into the Connect learning and teaching portal.

Closing the Gap: Data Models for Documentary Linguistics

Baden Hughes

This document discusses challenges in managing linguistic data electronically and proposes formal data encoding models. It notes the increasing amounts of linguistic data from fieldwork and issues with disparate encoding formats. Recently developed models address lexicons, interlinear texts, paradigms, syntactic trees, and annotation standards. These new models enable new types of data exploration and manipulation while reducing barriers to use. They may affect linguistic analysis by making some types easier and discovering new possibilities and challenges.

Jorum: Increasing Access to Institutional e-Learning

Adrian Stevenson

The document summarizes JORUM, a JISC-funded project to create an online repository for learning and teaching materials from UK higher education institutions. The repository will allow users to search, preview, borrow and upload learning objects and other digital resources. It will use standards like IEEE Learning Object Metadata and IMS Content Packaging to facilitate sharing and reuse of educational content. Potential issues include ensuring high-quality metadata and addressing licensing and legal questions around sharing copyrighted materials.

Merging Models with the Epsilon Merging Language - A Decade Later

Dimitris Kolovos

Map of the CETIS metadata and digital repository interoperability domain

Phil Barker

LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair

LOD2 Creating Knowledge out of Interlinked Data

Embedding Metadata In Word Processing Documents

Jim Downing

The document discusses embedding metadata and semantics in word processing documents in a way that ensures interoperability. It proposes using microformats like styles, tables, and links encoded in the documents. Styles are seen as the best approach as they are simple, schema-agnostic, extensible and don't require any specialized software. Toolbars are also proposed to make applying the microformats easy for authors. Examples shown include encoding author and affiliation information as well as encoding chemistry data and entities. The goal is to enable semantic and rich documents while working within real-world constraints of current word processors and document formats.

This document discusses leveraging linked open data and semantic web technologies for natural language processing. It proposes three ways: 1) Using linked open data as background knowledge for NLP tasks. 2) Using RDF and ontologies to integrate different NLP tools and approaches. 3) Making NLP output available on the web through standards like the NLP Interchange Format to combine NLP results with the larger web of data. The goal is to close the "semantic gap" between isolated NLP systems and the wealth of linguistic knowledge available in the linked open data cloud.

Introduction to LDL 2012

Sebastian Hellmann

This document provides an overview of a presentation on representing and connecting language data and metadata using linked data. It discusses the technological background of linked data and the collaborative research opportunities it provides for linguistics. It also outlines prospects for using linked data in linguistics by connecting annotated corpora, lexical-semantic resources, and linguistic databases to build a linguistic linked open data cloud.

Tool collection as linkeddata

Sebastian Hellmann

The document discusses LOD2 Deliverable 3.1.1, which will publish a survey of tools concerned with knowledge extraction from structured sources. The data will be collected in a Linked Data and SPARQL enabled OntoWiki to allow for sustainability through crowd-sourcing updates and maintenance. The OntoWiki has been deployed at http://data.lod2.eu/2011/tools/ and will need further fine-tuning over the next several weeks. The data is available as Linked Data or through SPARQL queries at the provided URL.

Lider Reference Model ld4lt session March, 3rd, 2015

Sebastian Hellmann

The document describes a reference architecture for a linguistic linked data ecosystem. It proposes standards and best practices for publishing, linking, and accessing multilingual data as linked open data. The key components of the architecture include publishing and hosting linguistic linked data, metadata standards, vocabularies for describing different resource types, linking of open and closed data, discovery layers, and semantic web service composition. The architecture supports decentralization, interoperability, and the development of language technologies and analytics services over linked data.

NIF - Version 1.0 - 2011/10/23

Sebastian Hellmann

NLP2RDF Wortschatz and Linguistic LOD draft

Sebastian Hellmann

This document summarizes Sebastian Hellmann's PhD thesis on integrating natural language processing (NLP) data, tools, and applications with RDF and OWL. The thesis proposes creating datasets in RDF to facilitate data integration and linking. It describes converting Wiktionary and the Wortschatz corpus to RDF to create a linguistic linked data web. Standardized formats like POWLA are discussed for representing corpora on the web. The thesis also covers knowledge acquisition from resources like the Tiger Corpus Navigator and ontology learning from text using techniques like LExO.

Linked Data for Abbreviations and Segmentation

Sebastian Hellmann

Navigation-induced Knowledge Engineering by Example

Sebastian Hellmann

This document summarizes a presentation on Navigation-induced Knowledge Engineering (NKE). NKE starts by interpreting user navigation behavior to infer ontological concepts. It supports users in interactively refining example concepts. Users can then save refined concepts for later retrieval. The presentation introduces the current NKE prototype and DL-Learner tool, shows mockups of potential interfaces, and evaluates NKE's ability to learn concepts from Wikipedia categories compared to keyword searches. Future work includes integrating NKE into applications and creating interfaces for different user groups.

LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE

LOD2 Creating Knowledge out of Interlinked Data

LOD2 Webinar Series: CubeViz

LOD2 Creating Knowledge out of Interlinked Data

(http://lod2.eu/BlogPost/webinar-series) In this Webinar Michael Martin presents CubeViz - a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF. This vocabulary is compatible with SDMX and increasingly being adopted. Based on the vocabulary and the encoded Data Cube, CubeViz is generating a facetted browsing widget that can be used to filter interactively observations to be visualized in charts. Based on the selected structure, CubeViz offer beneficiary chart types and options which can be selected by users. If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!

Integrating NLP using Linked Data

Sebastian Hellmann

The document discusses using linked data and the NLP Interchange Format (NIF) to improve interoperability in natural language processing. It describes NIF as an RDF/OWL-based format that aims to allow interoperability between NLP tools, resources, and annotations. The document outlines several use cases for NIF, including part-of-speech tagging and entity linking, and evaluates its adoption and impact on lowering barriers to NLP tool integration and reuse. It encourages more ontology creators and developers to join the NLP2RDF community and use NIF.

NIF 2.0 Tutorial: Content Analysis and the Semantic Web

Sebastian Hellmann

This tutorial is held by Sebastian Hellmann from the NLP2RDF Group at AKSW: The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. NIF consists of specifications, ontologies and software (overview), which are combined under the version identifier “NIF 2.0″. Links: http://nlp2rdf.org http://persistence.uni-leipzig.org/nlp2rdf/

NoTube: Models & Semantics

MODUL Technology GmbH

The document provides an overview of the activities and direction of Work Package 1 (WP1) of the NoTube project. WP1 focuses on developing shared datasets and services to support data integration and use case scenarios. In year 3, WP1 has moved from a single data warehouse to a more distributed model and is planning for sustainability beyond the project lifetime. WP1 datasets and services are being used to support other work packages and end-to-end demonstrations. WP1 also conducts outreach activities to promote metadata sharing and adoption of standards.

LOD2 Webinar: UnifiedViews

LOD2 Creating Knowledge out of Interlinked Data

UnifiedViews is a joint project currently maintained by Semantic Web Company (SWC) and Semantica.cz (Semantica.cz). It has been mainly developed by Charles University in Prague as a student project called ODCleanStore (version 2). It is based on the experience SWC obtained with the LOD Management Suite (LODMS) used in WP7 and ODCleansStore (version 1) developed by Charles University in Prague for the WP9a use case of the LOD2 FP7 project. In the next stack release of the LOD2 stack, UnifiedViews will replace LODMS as an ETL tool in the stack and the tool has already been adopted in other projects. In the webinar we will give a brief overview of the UnifiedViews project (Helmut Nagy). The main part will be a presentation of the tool and it's capabilities (Tomas Knap)

LOD2 Webinar Series: 3rd relase of the Stack

LOD2 Creating Knowledge out of Interlinked Data

http://lod2.eu/BlogPost/webinar-series This webinar in the course of the LOD2 webinar series will present the release 3.0 of the LOD2 stack, which contains updates to *) Virtuoso 7 [Openlink]: the original row store of the Virtuoso 6 universal server has now been replaced by a column store, increasing the performance of SPARQL queries significantly, the store is now up to three times as fast as the previous major version. Linked Open Data Manager Suite [SWC]: the 'lodms' application allows the user to quickly set up pipelines for transforming linked data through the use of its many extensions. It also allows operations for extracting rdf from other types of data. *) dbpedia-spotlight-ui [ULEI]: a graphical user interface component that allows the user to use a remote DBpedia spotlight instance to annotate a text with DBpedia concepts. *) sparqlify [ULEI]: a scalable SPARQL-SQL rewriter, allowing you to query an SQL database as if it were a triple store. *) SIREn [DERI]: a Lucene plugin that allows you to efficiently index and query RDF, as well as any textual document with an arbitrary amount of metadata fields. *) CubeViz [ULEI]: CubeViz allows visualization of the Data Cube linked data representation of statistical data. It has support for the more advanced DataCube features, such as slices. It also allows the selection of a remote SPARQL endpoint and export of a modified cube. *) R2R [UMA]: the R2R mapping API is now included directly into the lod2 demonstrator application, allowing users to experience the full effect of the R2R semantic mapping language through a graphical user interface. *) ontowiki-csvimport [ULEI]: an OntoWiki extension that transforms CSV files to RDF. The extension can create Data Cubes that can be visualized by CubeViz. If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series!

LOD2 Plenary Vienna 2012: WP4 - Reuse, Interlinking and Knowledge Fusion

LOD2 Creating Knowledge out of Interlinked Data

LOD2 Webinar Series: LOD2 in information and publishing industry

LOD2 Creating Knowledge out of Interlinked Data

This webinar in the course of the LOD2 webinar series will present the implications of Linked Open Data and Semantic Web Technologies in the information and publishing industry. The publishing industry is struggling with too much information on the one hand and too less resources to bring meaning to this information on the other hand. As an industrial use case partner in LOD2, Wolters Kluwer Deutschland GmbH investigates in detail, how LOD and Semantic Web have the potential to solve this critical issue for their business. The presentation will show what parts of the LOD2 stack are used within the use case and what challenges had to be addressed in the last two years. Interesting future areas like natural language processing will also be mentioned. The topics covered are relevant for any industry that deals with a lot of data and documents, not only publishing. This series will provide a monthly webinar about Linked (Open) Data tools and services around the LOD2 project, the LOD2 Stack and the Linked Open Data Life Cycle, also in the form of 3rd party tools. Please find continuously updated information here: http://lod2.eu/BlogPost/webinar-series

Learning Outcomes & Learner Achievements Management in Higher Ed

Jad Najjar

Linked Open Data Visualization

Laura Po

Free Webinar: LOD2 Stack - 1st release

LOD2 Creating Knowledge out of Interlinked Data

The LOD2 project aims to improve tools for publishing Linked Open Data. It comprises researchers and companies from 12 countries who are developing an integrated stack of Linked Data tools. The stack demonstrates the benefits of Linked Data in media/publishing, corporate intranets, and eGovernment. It provides monthly webinars on tools for acquiring, editing, composing, and publishing Linked Data.

NIF 2.0 draft for Pisa

Sebastian Hellmann

LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair

LOD2 Creating Knowledge out of Interlinked Data

Viewers also liked

Linked Data in Linguistics for NLP and Web Annotation

Sebastian Hellmann

Introduction to LDL 2012

Sebastian Hellmann

Tool collection as linkeddata

Sebastian Hellmann

Lider Reference Model ld4lt session March, 3rd, 2015

Sebastian Hellmann

NIF - Version 1.0 - 2011/10/23

Sebastian Hellmann

NLP2RDF Wortschatz and Linguistic LOD draft

Sebastian Hellmann

Linked Data for Abbreviations and Segmentation

Sebastian Hellmann

Viewers also liked (7)

Linked Data in Linguistics for NLP and Web Annotation

Introduction to LDL 2012

Tool collection as linkeddata

Lider Reference Model ld4lt session March, 3rd, 2015

NIF - Version 1.0 - 2011/10/23

NLP2RDF Wortschatz and Linguistic LOD draft

Linked Data for Abbreviations and Segmentation

Similar to Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications

Navigation-induced Knowledge Engineering by Example

Sebastian Hellmann

LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE

LOD2 Creating Knowledge out of Interlinked Data

LOD2 Webinar Series: CubeViz

LOD2 Creating Knowledge out of Interlinked Data

Integrating NLP using Linked Data

Sebastian Hellmann

NIF 2.0 Tutorial: Content Analysis and the Semantic Web

Sebastian Hellmann

NoTube: Models & Semantics

MODUL Technology GmbH

LOD2 Webinar: UnifiedViews

LOD2 Creating Knowledge out of Interlinked Data

LOD2 Webinar Series: 3rd relase of the Stack

LOD2 Creating Knowledge out of Interlinked Data

LOD2 Plenary Vienna 2012: WP4 - Reuse, Interlinking and Knowledge Fusion

LOD2 Creating Knowledge out of Interlinked Data

LOD2 Webinar Series: LOD2 in information and publishing industry

LOD2 Creating Knowledge out of Interlinked Data

Learning Outcomes & Learner Achievements Management in Higher Ed

Jad Najjar

Linked Open Data Visualization

Laura Po

Free Webinar: LOD2 Stack - 1st release

LOD2 Creating Knowledge out of Interlinked Data

NIF 2.0 draft for Pisa

Sebastian Hellmann

LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair

LOD2 Creating Knowledge out of Interlinked Data

Pal gov.tutorial2.session14.lab rdf-dataintegration

Mustafa Jarrar

This document provides information about a practical session on integrating and fusing heterogeneous data using RDF. Students will work in groups of two, with each group containing students from different universities. Each group will construct three hypothetical student record databases based on different university data schemes. They will populate the databases with sample data and then integrate and fuse all the data into a single RDF dataset. Students are expected to use existing ontologies and write SPARQL queries on the integrated RDF data. The final deliverable will include snapshots of the original databases, the RDF mappings, the integrated RDF dataset, executed SPARQL queries and results.

Pal gov.tutorial2.session16.lab rd-fa

Mustafa Jarrar

Pal gov.tutorial2.session8.lab owl

Mustafa Jarrar

This document provides information about a practical session on building an ontology for a national student registry in Palestine. The session aims to have students specify a shared ontology in OWL that universities can use to exchange student profile data in RDF format. Each student will work individually to extend their existing RDF model with OWL constructs to better define and constrain the semantics of the exchanged data. Students will then present their OWL model and build example RDF data files from two universities to validate the model.

LOD2 Webinar Series: Zemanta / Open refine

LOD2 Creating Knowledge out of Interlinked Data

This webinar in the course of the LOD2 webinar series will present Zemanta and its LODRefine - a LOD-enabled version of OpenRefine (previously Google Refine), which is a part of the LOD2 stack. LODRefine extends cleansing and linking functionalities of OpenRefine by providing means to reconcile and augment your data with DBpedia or any other SPARQL endpoint, extract named entities using Zemanta API, export data in one of the RDF formats, and recently also to exploit available crowdsourcing services. In webinar we will demonstrate several task which demonstrate the ease of use and versatility of LODRefine. If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series: http://lod2.eu/BlogPost/webinar-series

Milleks meile õpitehnoloogia standardid?

Hans Põldoja

Similar to Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications (20)

Navigation-induced Knowledge Engineering by Example

LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE

LOD2 Webinar Series: CubeViz

Integrating NLP using Linked Data

NIF 2.0 Tutorial: Content Analysis and the Semantic Web

NoTube: Models & Semantics

LOD2 Webinar: UnifiedViews

LOD2 Webinar Series: 3rd relase of the Stack

LOD2 Plenary Vienna 2012: WP4 - Reuse, Interlinking and Knowledge Fusion

LOD2 Webinar Series: LOD2 in information and publishing industry

Learning Outcomes & Learner Achievements Management in Higher Ed

Linked Open Data Visualization

Free Webinar: LOD2 Stack - 1st release

NIF 2.0 draft for Pisa

LOD2: State of Play WP3A - Knowledge Base Creation, Enrichment and Repair

Pal gov.tutorial2.session14.lab rdf-dataintegration

Pal gov.tutorial2.session16.lab rd-fa

Pal gov.tutorial2.session8.lab owl

LOD2 Webinar Series: Zemanta / Open refine

Milleks meile õpitehnoloogia standardid?

More from Sebastian Hellmann

KEDL DBpedia 2019

Sebastian Hellmann

Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann

Hellmann keynote TKE (2016), Challenges, Approaches and Future Work for Linguistic Linked Open Data (LLOD) While the Linguistic Linked Open Data (LLOD) Cloud (http://linguistic-lod.org/) has evolved beyond expectations - thanks to the effort of a vibrant community - overall progress has to be seen under a more scrutinizing light. Initial challenges which have been formulated by Christian Chiarcos, Sebastian Nordhoff and me as early as 2011[1][2] have been discussed extensively in the LDL, MLODE and NLP & DBpedia workshop series and in several W3C community groups. In particular, the LIDER FP7 project (http://www.lider-project.eu/) - originally conceived to tackle these challenges and build a Linguistic Linked Open Data Cloud - rather gave them more shape and uncovered that there is yet quite a long road ahead to solve problems such as proper metadata, contextualisation of knowledge, data quality, hosting, open licensing and provenance, timely updated network links, knowledge integration and interoperability on the largest possible scale - the Web. The invited talk attempts to give a full account of these abovementioned challenges and presents and critically evaluates pertinent efforts and approaches including evolving standards such as the NLP Interchange Format (NIF)[3][4], DataID[5], SHACL[6], lemon[7] and the LIDER guidelines[8] as well as practical services such as LingHub[9], LODVader[10], RDFUnit[11] (just to mention a few). As a glimmer of hope, the talk will conclude with the recent efforts of the DBpedia community to coordinate the creation of a public data infrastructure for a large, multilingual, semantic knowledge graph, which is, of course, not a panacean golden hammer, but a potential step in the right direction to bridge the gap between language and knowledge. ________________ [1] Towards a Linguistic Linked Open Data cloud : The Open Linguistics Working Group (http://www.atala.org/IMG/pdf/Chiarcos-TAL52-3.pdf ) Christian Chiarcos, Sebastian Hellmann, and Sebastian Nordhoff. TAL 52(3):245 - 275 (2011) [2] Linked Data in Linguistics. Representing Language Data and Metadata (http://www.springer.com/computer/ai/book/978-3-642-28248-5 ) Christian Chiarcos, Sebastian Nordhoff, and Sebastian Hellmann (Eds.). Springer, Heidelberg, (2012) [3] http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core [4] https://www.w3.org/community/ld4lt/ [5] http://wiki.dbpedia.org/projects/dbpedia-dataid [6] http://w3c.github.io/data-shapes/shacl/ [7] https://www.w3.org/2016/05/ontolex/ [8] http://www.lider-project.eu/guidelines [9] http://linghub.lider-project.eu/ [10] http://lodvader.aksw.org/ [11] http://aksw.org/Projects/RDFUnit

DBpedia/association Introduction The Hague 12.2.2016

Sebastian Hellmann

DBpedia is a crowd-sourced effort to extract structured data from Wikipedia and Wikidata. It provides a public SPARQL endpoint to query this multi-domain, multilingual dataset. The DBpedia Association was founded in 2014 as a non-profit to oversee DBpedia and aims to improve uptime, data quality, and integration with other sources. It relies on funding and contributions from members to achieve goals like 99.99% uptime across languages and domains. The document promotes joining the DBpedia Association and participating in future events like a DBpedia meeting at the SEMANTiCS 2016 conference.

LD4LT Roadmap session 19_02_2015

Sebastian Hellmann

This document outlines the key points of the LIDER Roadmap, which was developed through workshops with hundreds of stakeholders to outline a vision for linked data and content analytics over the next 3, 5, and 10 years. The roadmap addresses challenges and opportunities in several sectors, including global customer engagement, content publishing and delivery, marketing and customer relationship management, and the public sector/civil society. Cross-cutting issues like localization and translation, data privacy, licensing, and developing linguistic linked data ecosystems are also discussed.

DBpedia: A Public Data Infrastructure for the Web of Data

Sebastian Hellmann

The document discusses the DBpedia project, which extracts structured data from Wikipedia to build a multilingual knowledge graph. It describes DBpedia's goals of making this data openly available and supporting its community. The DBpedia Association is being formed as a non-profit to oversee the infrastructure and support contributors. Funding will come from donations and sponsorships. Upcoming events include the DBpedia Community Meeting coinciding with the SEMANTiCS conference in September.

NIF 2.0 Phd thesis intermediate report

Sebastian Hellmann

The document discusses the NLP Interchange Format (NIF) 2.0, which aims to achieve interoperability between NLP tools, resources, and annotations. It notes that NIF 2.0 will be published within 6-8 weeks and is highly likely to become the de facto standard for modeling RDF tool output in NLP. The document analyzes problems in the current NLP landscape such as heterogeneity of technologies, formats, languages, and lack of open collaboration and standards. It presents NIF as a solution to these problems by defining text normalization, an RDF-based core ontology, modular ontologies, and infrastructure for validation, hosting, and adoption. Evaluation approaches and a positive impact and reception of N

Thesis presentation

Sebastian Hellmann

This document summarizes a research project aiming to formalize text for machines in a transparent, efficient, and web-compatible way. It introduces the areas of scientific focus, including defining contexts and strings as URIs, formalizing this in OWL, and implications for natural language processing frameworks. The goal is for the representation to be understandable by machines and support tasks like ambiguity resolution. The evaluation plan involves comparing expressivity and performance to other models, and measuring developer adoption through a mailing list. The work seeks to reduce the "semantic gap" between data and questions by clarifying structural and representational issues.

NIF - NLP Interchange Format

Sebastian Hellmann

The document discusses NIF (NLP Interchange Format), which aims to integrate natural language processing (NLP) tools via a common output format. NIF addresses issues with current NLP pipelines by allowing tools to be combined ad-hoc. It represents NLP annotations as URIs in RDF to allow merging of output. Ontologies provide interfaces to integrate different layers of annotation. The goals are to make NLP component interchangeable and reuse past annotations.

More from Sebastian Hellmann (8)

KEDL DBpedia 2019

Linguistic Linked Open Data, Challenges, Approaches, Future Work

DBpedia/association Introduction The Hague 12.2.2016

LD4LT Roadmap session 19_02_2015

DBpedia: A Public Data Infrastructure for the Web of Data

NIF 2.0 Phd thesis intermediate report

Thesis presentation

NIF - NLP Interchange Format

Recently uploaded

leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...

alexjohnson7307

WeTestAthens: Postman's AI & Automation Techniques

Postman

Trusted Execution Environment for Decentralized Process Mining

LucaBarbaro3

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

akankshawande

June Patch Tuesday

Ivanti

Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.

Digital Marketing Trends in 2024 | Guide for Staying Ahead

Wask

https://www.wask.co/ebooks/digital-marketing-trends-in-2024 Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.

5th LF Energy Power Grid Model Meet-up Slides

DanBrown980551

5th Power Grid Model Meet-up It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology. Power Grid Model The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services. Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability. Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization. What to expect For the upcoming meetup we are organizing, we have an exciting lineup of activities planned: -Insightful presentations covering two practical applications of the Power Grid Model. -An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024. -An interactive brainstorming session to discuss and propose new feature requests. -An opportunity to connect with fellow Power Grid Model enthusiasts and users.

TrustArc Webinar - 2024 Global Privacy Survey

TrustArc

How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024? In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores. See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe. This webinar will review: - The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey - The top challenges for privacy leaders, practitioners, and organizations in 2024 - Key themes to consider in developing and maintaining your privacy program

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/ DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen! Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell. Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten. Diese Themen werden behandelt - Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten - Wie funktionieren CCB- und CCX-Lizenzen wirklich? - Verstehen des DLAU-Tools und wie man es am besten nutzt - Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw. - Praxisbeispiele und Best Practices zum sofortigen Umsetzen

Driving Business Innovation: Latest Generative AI Advancements & Success Story

Safe Software

Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency. During the hour, we’ll take you through: Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board. Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes. Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI. We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI. This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!

Serial Arm Control in Real Time Presentation

tolgahangng

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

shyamraj55

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Tosin Akinosho

Monitoring and Managing Anomaly Detection on OpenShift Overview Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices. Key Topics Covered 1. Introduction to Anomaly Detection - Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems. 2. Understanding Edge (IoT) - Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source. 3. What is ArgoCD? - Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices. 4. Deployment Using ArgoCD for Edge Devices - Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD. 5. Introduction to Apache Kafka and S3 - Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions. 6. Viewing Kafka Messages in the Data Lake - Learn how to view and analyze Kafka messages stored in a data lake for better insights. 7. What is Prometheus? - Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices. 8. Monitoring Application Metrics with Prometheus - Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system. 9. What is Camel K? - Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes. 10. Configuring Camel K Integrations for Data Pipelines - Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow. 11. What is a Jupyter Notebook? - Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text. 12. Jupyter Notebooks with Code Examples - Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.

Energy Efficient Video Encoding for Cloud and Edge Computing Instances

Alpen-Adria-Universität

FREE A4 Cyber Security Awareness Posters-Social Engineering part 3

Data Hops

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

ScyllaDB

A Comprehensive Guide to DeFi Development Services in 2024

Intelisync

DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum. In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance. In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape. At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology. Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!

GraphRAG for Life Science to increase LLM accuracy

Tomaz Bratanic

Public CyberSecurity Awareness Presentation 2024.pptx

marufrahmanstratejm

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...

Alex Pruden

Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security. Paper Link: https://eprint.iacr.org/2024/257

Recently uploaded (20)

leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...

WeTestAthens: Postman's AI & Automation Techniques

Trusted Execution Environment for Decentralized Process Mining

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

June Patch Tuesday

Digital Marketing Trends in 2024 | Guide for Staying Ahead

5th LF Energy Power Grid Model Meet-up Slides

TrustArc Webinar - 2024 Global Privacy Survey

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

Driving Business Innovation: Latest Generative AI Advancements & Success Story

Serial Arm Control in Real Time Presentation

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Energy Efficient Video Encoding for Cloud and Edge Computing Instances

FREE A4 Cyber Security Awareness Posters-Social Engineering part 3

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

A Comprehensive Guide to DeFi Development Services in 2024

GraphRAG for Life Science to increase LLM accuracy

Public CyberSecurity Awareness Presentation 2024.pptx

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...

Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications

1. Creating Knowledge out of Interlinked Data JIST 2012 – Page 1 http://lod2.eu Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications Didier Cherix, Sebastian Hellmann, Jens Lehmann http://slideshare.net/kurzum http://dl-learner.org http://lod2.eu AKSW, Universität Leipzig LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

2. JIST 2012 – Page 2 http://lod2.eu Motivation: 2007 - 2012 DL-Learner was developed in parallel to DBpedia at University Leipzig since 2007 DL-Learner is a tool for learning concepts in Description Logics (DLs) from user- provided examples. Worked very well for small to medium sized data sets, e.g. Carcinogenesis an other ML problems from the UCI ML repository Limit is the capacity of current OWL-DL reasoners Challenge was (and is) to do reasoning-based, supervized Machine Learning on the DBpedia Dataset (> 200 Mio triples) or larger datasets

3. JIST 2012 – Page 3 http://lod2.eu Introduction DL-Learner

4. JIST 2012 – Page 4 http://lod2.eu Introduction DL-Learner Very large search space Reasoner instance checks

5. JIST 2012 – Page 5 http://lod2.eu Introduction DL-Learner

6. JIST 2012 – Page 6 http://lod2.eu Introduction DL-Learner DL-Learner heavily relies on instance checks for machine learning, so the OWL Reasoner is the bottle neck Underlying idea: Only select relevant data for the Machine Learning Problem based on user-given examples → Reduces the amount of triples that have to be given to a reasoner → Reduces complexity and size of the OWL schema Brute-force approach: Load all data into the OWL Reasoner, then do instance checks → infeasible for Dbpedia Iterative approach (old component): Iterate over all instances and fetch the data recursively → inefficient even with caching

7. JIST 2012 – Page 7 http://lod2.eu Introduction DL-Learner

8. JIST 2012 – Page 8 http://lod2.eu Introduction DL-Learner

9. JIST 2012 – Page 9 http://lod2.eu Introduction DL-Learner

10. JIST 2012 – Page 10 http://lod2.eu Introduction DL-Learner

11. JIST 2012 – Page 11 http://lod2.eu Introduction DL-Learner

12. JIST 2012 – Page 12 http://lod2.eu Introduction DL-Learner Challenge: What is the most efficient way to retrieve such a fragment?

13. JIST 2012 – Page 13 http://lod2.eu Improvements of the New Component • Step 1: Indexing the T-Box: • Download the OWL Schema and index it in memory • either via SPARQL or OWL file

14. JIST 2012 – Page 14 http://lod2.eu Improvements of the New Component • Step 2: A-Box Queries Parameter recursion depth: Retrieve newly discovered bindings to ?o until a certain depth is reached.

15. JIST 2012 – Page 15 http://lod2.eu Improvements of the New Component • Step 3: Typing the retrieved instances

16. JIST 2012 – Page 16 http://lod2.eu Improvements of the New Component • Step 4: T-Box Index: All “relevant” T-Box information is added via the index to the fragment. For each class already in the fragment. all superclasses and their equivalentClass axioms are added

17. JIST 2012 – Page 17 http://lod2.eu Benchmarking - Speed For each class in DBpedia Ontology: - 30 instances as positives - 30 negatives from a sister class

18. JIST 2012 – Page 18 http://lod2.eu Benchmarking – F-Measure on the training data 70% of the results for each class had an F-measure of 90-100% on the training data

19. JIST 2012 – Page 19 http://lod2.eu SPARQL Retrieval Component Impact • DL-Learner – http://dl-learner.org • DBpedia Navigator • Tiger Corpus Navigator • AutoSPARQL - http://autosparql.dl-learner.org/ • HANNE – http://hanne.aksw.org • ORE - http://aksw.org/Projects/ORE Sebastian Hellmann, Jens Lehmann und Sören Auer: Learning of OWL Class Descriptions on Very Large Knowledge Bases In: International Journal on Semantic Web and Information Systems, 2009 Web Applications Active Learning → User Interaction and Feedback

20. JIST 2012 – Page 20 http://lod2.eu Future Work • Research Paper in Session 4b (tomorrow at 15:10) Navigation-induced Knowledge Engineering by Example • Caching + more sophisticated options • Large scale learning problems http://slideshare.net/kurzum Homepage: http://dl-learner.org Source code: http://sourceforge.net/projects/dl-learner/

21. JIST 2012 – Page 21 http://lod2.eu Example Sebastian Hellmann, Jens Lehmann, Jörg Unbehauen, Claus Stadler, Thanh Nghia Lam und Markus Strohmaier: Navigation-induced Knowledge Engineering by Example In: JIST 2012

22. JIST 2012 – Page 22 http://lod2.eu Example Sebastian Hellmann, Jens Lehmann und Sören Auer: Learning of OWL Class Descriptions on Very Large Knowledge Bases In: International Journal on Semantic Web and Information Systems, 2009

Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications

Similar to Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications (20)

More from Sebastian Hellmann

More from Sebastian Hellmann (8)

Recently uploaded

Recently uploaded (20)

Improving the Performance of the DL-Learner SPARQL Component for Semantic Web Applications