This document outlines several use cases for data transformation and processing on data.gov.uk. It describes how XML data is transformed to RDF using XSLT with parameters. It also discusses on-the-fly data transformations and complex nested data processing pipelines that include multiple steps like data enrichment. The challenges of representing provenance for non-digital and heterogeneous data from different systems are also summarized.
Information Extraction in the TalkOfEurope Creative CampWim Peters
The CLARIN Talk of Europe Creative Camp event in March 2015 invited people to work on the EuroParliament data of the Talk of Europe data set (http://linkedpolitics.ops.few.vu.nl/home)
Our work during that event covers the conceptualization of the content of two data sets:
- English EuroParliament speeches from the Talk of Europe data set and
- UK Parliament speeches.
We performed term extraction, term organisation and the linking of terminology between these two data sets. the results were
Knowledge Patterns for the Web: extraction, transformation, and reuseAndrea Nuzzolese
KPs are an abstraction of frames as introduced by Fillmore and Minsky. KP discovery needs to address two main research problems: the heterogeneity of sources, formats and semantics in the Web (i.e., the knowledge soup problem) and the difficulty to draw relevant boundary around data that allows to capture the meaningful knowledge with respect to a certain context (i.e., the knowledge boundary problem). Hence, we introduce two methods that provide different solutions to these two problems by tackling KP discovery from two different perspectives: (i) the transformation of KP-like artifacts (i.e., top-down defined artifacts that can be compared to KPs, such as FrameNet frames or Ontology Design Patterns) to KPs formalized as OWL2 ontologies; (ii) the bottom-up extraction of KPs by analyzing how data are organized in Linked Data. The two methods address the knowledge soup and boundary problems in different ways. The first method provides a solution to the two aforementioned problems that is based on a purely syntactic transformation step of the original source to RDF followed by a refactoring step whose aim is to add semantics to RDF by select meaningful RDF triples. The second method allows to draw boundaries around RDF in Linked Data by analyzing type paths. A type path is a possible route through an RDF that takes into account the types associated to the nodes of a path. Unfortunately, type paths are not always available. In fact, Linked Data is a knowledge soup because of the heterogeneous semantics of its datasets and because of the limited intentional as well as extensional coverage of ontologies (e.g., DBpedia ontology, YAGO) or other controlled vocabularies (e.g., SKOS, FOAF, etc.). Thus, we propose a solution for enriching Linked Data with additional axioms (e.g., rdf:type axioms) by exploiting the natural language available for example in annotations (e.g. rdfs:comment) or in corpora on which datasets in Linked Data are grounded (e.g. DBpedia is grounded on Wikipedia). Then we present K∼ore, a software architec- ture conceived to be the basis for developing KP discovery systems and designed according to two software architectural styles, i.e, the Component-based and REST. K∼ore is the architectural binding of a set of tools, i.e., K∼tools, which implements the methods for KP transformation and extraction. Finally we provide an example of reuse of KP based on Aemoo, an exploratory search tool which exploits KPs for performing entity summarization.
The Open Knowledge Extraction Challenge focuses on the production of new knowledge aimed at either populating and enriching existing knowledge bases or creating new ones. This means that the defined tasks focus on extracting concepts, individuals, properties, and statements that not necessarily exist already in a target knowledge base, and on representing them according to Semantic Web standard in order to be directly injected in linked datasets and their ontologies. The OKE challenge, has the ambition to advance a reference framework for research on Knowledge Extraction from text for the Semantic Web by re-defining a number of tasks (typically from information and knowledge extraction) by taking into account specific SW requirements. The Challenge is open to everyone from industry and academia.
Information Extraction in the TalkOfEurope Creative CampWim Peters
The CLARIN Talk of Europe Creative Camp event in March 2015 invited people to work on the EuroParliament data of the Talk of Europe data set (http://linkedpolitics.ops.few.vu.nl/home)
Our work during that event covers the conceptualization of the content of two data sets:
- English EuroParliament speeches from the Talk of Europe data set and
- UK Parliament speeches.
We performed term extraction, term organisation and the linking of terminology between these two data sets. the results were
Knowledge Patterns for the Web: extraction, transformation, and reuseAndrea Nuzzolese
KPs are an abstraction of frames as introduced by Fillmore and Minsky. KP discovery needs to address two main research problems: the heterogeneity of sources, formats and semantics in the Web (i.e., the knowledge soup problem) and the difficulty to draw relevant boundary around data that allows to capture the meaningful knowledge with respect to a certain context (i.e., the knowledge boundary problem). Hence, we introduce two methods that provide different solutions to these two problems by tackling KP discovery from two different perspectives: (i) the transformation of KP-like artifacts (i.e., top-down defined artifacts that can be compared to KPs, such as FrameNet frames or Ontology Design Patterns) to KPs formalized as OWL2 ontologies; (ii) the bottom-up extraction of KPs by analyzing how data are organized in Linked Data. The two methods address the knowledge soup and boundary problems in different ways. The first method provides a solution to the two aforementioned problems that is based on a purely syntactic transformation step of the original source to RDF followed by a refactoring step whose aim is to add semantics to RDF by select meaningful RDF triples. The second method allows to draw boundaries around RDF in Linked Data by analyzing type paths. A type path is a possible route through an RDF that takes into account the types associated to the nodes of a path. Unfortunately, type paths are not always available. In fact, Linked Data is a knowledge soup because of the heterogeneous semantics of its datasets and because of the limited intentional as well as extensional coverage of ontologies (e.g., DBpedia ontology, YAGO) or other controlled vocabularies (e.g., SKOS, FOAF, etc.). Thus, we propose a solution for enriching Linked Data with additional axioms (e.g., rdf:type axioms) by exploiting the natural language available for example in annotations (e.g. rdfs:comment) or in corpora on which datasets in Linked Data are grounded (e.g. DBpedia is grounded on Wikipedia). Then we present K∼ore, a software architec- ture conceived to be the basis for developing KP discovery systems and designed according to two software architectural styles, i.e, the Component-based and REST. K∼ore is the architectural binding of a set of tools, i.e., K∼tools, which implements the methods for KP transformation and extraction. Finally we provide an example of reuse of KP based on Aemoo, an exploratory search tool which exploits KPs for performing entity summarization.
The Open Knowledge Extraction Challenge focuses on the production of new knowledge aimed at either populating and enriching existing knowledge bases or creating new ones. This means that the defined tasks focus on extracting concepts, individuals, properties, and statements that not necessarily exist already in a target knowledge base, and on representing them according to Semantic Web standard in order to be directly injected in linked datasets and their ontologies. The OKE challenge, has the ambition to advance a reference framework for research on Knowledge Extraction from text for the Semantic Web by re-defining a number of tasks (typically from information and knowledge extraction) by taking into account specific SW requirements. The Challenge is open to everyone from industry and academia.
Methods and experiences in cultural heritage enhancementFrancesca Tomasi
Paper selezionato presentato a W3C LOD2014 con F. Ciotti, M. Lana, D. Magro, S. Peroni, F. Vitali
“Linked Open Data: where are we?” Archivio Centrale dello Stato (Roma 20 febbraio 2014).
Healthcare Data Management using Domain Specific Languages for Metadata Manag...David Milward
A talk looking at Human factors of DSL Usage, tool support for DSL users, benefits for using DSLs. In essence an experience report on the usage of DSL's in the development of a metadata management toolkit, which focuses on DSL development in general
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgJindřich Mynarz
The presentation describes a tool for validating and previewing instances of Schema.org JobPosting described in structured data markup embedded in web pages. The validator and preview was developed to assist users of Schema.org to produce data of better quality. In this way, it tries to enhance usability of a part of Schema.org covering the domain of job postings. The paper discusses implementation of the tool and design of its validation rules based on SPARQL 1.1. Results of experimental validation of a job posting corpus harvested from the Web are presented. Among other findings, the results indicate that publishers of Schema.org JobPosting data often misunderstand precedence rules employed by markup parsers and that they ignore case-sensitivity of vocabulary names.
An exploration of a possible pipeline for RDF datasets from Timbuctoo instances to the digital archive EASY.
- Get, verify, ingest archive and disseminate (linked) data and metadata.
- What are the implications for an archive: serving linked data over (longer periods of) time
- Practical stuff.
Talk at 3th Keystone Training School - Keyword Search in Big Linked Data - Institute for Software Technology and Interactive Systems, TU Wien, Austria, 2017
Methods and experiences in cultural heritage enhancementFrancesca Tomasi
Paper selezionato presentato a W3C LOD2014 con F. Ciotti, M. Lana, D. Magro, S. Peroni, F. Vitali
“Linked Open Data: where are we?” Archivio Centrale dello Stato (Roma 20 febbraio 2014).
Healthcare Data Management using Domain Specific Languages for Metadata Manag...David Milward
A talk looking at Human factors of DSL Usage, tool support for DSL users, benefits for using DSLs. In essence an experience report on the usage of DSL's in the development of a metadata management toolkit, which focuses on DSL development in general
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgJindřich Mynarz
The presentation describes a tool for validating and previewing instances of Schema.org JobPosting described in structured data markup embedded in web pages. The validator and preview was developed to assist users of Schema.org to produce data of better quality. In this way, it tries to enhance usability of a part of Schema.org covering the domain of job postings. The paper discusses implementation of the tool and design of its validation rules based on SPARQL 1.1. Results of experimental validation of a job posting corpus harvested from the Web are presented. Among other findings, the results indicate that publishers of Schema.org JobPosting data often misunderstand precedence rules employed by markup parsers and that they ignore case-sensitivity of vocabulary names.
An exploration of a possible pipeline for RDF datasets from Timbuctoo instances to the digital archive EASY.
- Get, verify, ingest archive and disseminate (linked) data and metadata.
- What are the implications for an archive: serving linked data over (longer periods of) time
- Practical stuff.
Talk at 3th Keystone Training School - Keyword Search in Big Linked Data - Institute for Software Technology and Interactive Systems, TU Wien, Austria, 2017
Viaggio alla scoperta della Malattia di Crohn e della Colite UlcerosaAntonio Maria Ricci
Non sempre è facile spiegare ai giovani pazienti cosa siano le MICI (malattie infiammatorie croniche intestinali: morbo di Chron e retto colite ulcerosa) e come si possano affrontare al meglio.
Per aiutare i più piccoli e i loro genitori arriva il fumetto in stile manga “Viaggio alla scoperta della Malattia di Crohn e della Colite Ulcerosa”, promosso dall’associazione AMICI.
http://goo.gl/dIkMtq
Manifesto in difesa dei bambini migranti nel Mar Mediterraneo, società itali...Antonio Maria Ricci
Manifesto della Società Italiana di Pediatria e delle associazioni scientifiche e
professionali dell’area pediatrica per una mobilitazione generale in difesa dei
bambini migranti nel Mar Mediterraneo
Préventica lyon management et risques psychosociauxPresenceConseil
Le management : victime et facteur de risques psychosociaux"
- Les managers : population à risque et risquée pour les populations
- Les facteurs de risques psychosociaux
- Le management, victime des risques psychosociaux
- Le management, facteur de risques psychosociaux
Instruction on how to use Google Forms, as well as how to DEPLOY Google Forms in a classroom. Instruction on Quizlet and ideas for Powerpoint Karaoke (PPTK) using all three tools.
Building RESTful Applications with ODataTodd Anglin
Applications today are expected to expose their data and consume data-centric services via REST. In this session we discuss what REST is and have an overview of WCF Data Services and see how we can REST enable your data using the Open Data Protocol (OData). Then you will learn how to leverage existing skills related to Visual Studio, LINQ and data access to customize the behavior, control-flow, security model and experience of your data service. We will then see how to enable data-binding to traditional ASP.NET controls as well as Silverlight and Excel PowerPivot. We’ll then turn to consuming SharePoint and other OData based applications in .NET as well as from a non-Microsoft client. This is a very demo intensive session.
Structured Dynamics provides 'ontology-driven applications'. Our product stack is geared to enable the semantic enterprise. The products are premised on preserving and leveraging existing information assets in an incremental, low-risk way. SD's products span from converters to authoring environments to Web services middleware and to eventual ontologies and user interfaces and applications.
The Open Data Protocol (OData) is an open protocol for sharing data. It provides a way to break down data silos and increase the shared value of data by creating an ecosystem in which data consumers can interoperate with data producers in a way that is far more powerful than currently possible, enabling more applications to make sense of a broader set of data. Every producer and consumer of data that participates in this ecosystem increases its overall value.
OData is consistent with the way the Web works – it makes a deep commitment to URIs for resource identification and commits to an HTTP-based, uniform interface for interacting with those resources (just like the Web). This commitment to core Web principles allows OData to enable a new level of data integration and interoperability across a broad range of clients, servers, services, and tools.
Stefano Nativi presents the RDA.
Workshop title: Organising high-quality research data management services
Workshop abstract:
Open science needs high quality data management where researchers can create, use and share data according to well defined standards and practices. this is one of the pillars of Open Science. In the data management landscape we find quite a few organisations that aim at achieving this, however to get it right, a collaboration is called for where all can play a suitable role and present this in a consistent way to the researcher.
The proposed workshop brings together representatives of standard organisation (RDA), eInfrastructures (EUDAT) and Libraries (LIBER) that together can organise the high quality data management for research.
DAY 1 - PARALLEL SESSION 2
http://opensciencefair.eu/workshops/organising-high-quality-research-data-management-services
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Access Control for Linked Data: Past, Present and FutureSabrina Kirrane
In recent years we have seen significant advances in the technology used to both publish and consume, structured data using the existing web infrastructure, commonly referred to as the Linked Data Web. However, in order to support the next generation of e-business applications on top of Linked Data suitable forms of access control need to be put in place. In this talk we will examine the various access control models, standards and policy languages, and the different access control enforcement strategies for the Resource Description Framework (the data model underpinning the Linked Data Web). We propose a set of access control requirements that can be used to categorise existing access control strategies and identify a number of challenges that still need to be overcome.
This presentation was provided by Vinod Chachra of VTLS Inc. during the NISO event "Next Generation Discovery Tools: New Tools, Aging Standards," held March 27 - March 28, 2008.
1. Open Provenance Model Tutorial Session 4: Use cases from data.gov.uk Jun Zhao University of Oxford [email_address]
2.
3.
4. XML -> RDF XSLT Processor XSLT Parameter Binding XSLT Stylesheet XSLT Template input output RDF File Who, when, which version, how Contributed by Jeni Tennison
5. XSLT Processor input output RDF File XSLT Parameter Binding XSLT Stylesheet XSLT Template Downloaded from; Unzipped from, etc Made accessible Who, when, which version, how Contributed by Jeni Tennison
6. On-the-fly Transformation Data transformation wrapper Who, when, which version, how Contributed by Stuart Williams http://mytransportatio.db/j10
7. Complex Data Creation Pipeline GATE Pipeline GateXMLRegressionTransformation GateXMLRdfaTransformation RdfaRdfXmlTransformation Courtesy of Paul Appleby from TSO (Data Enrichment Service)
8. Complex Data Creation Pipeline GATE Pipeline GateXMLRegressionTransformation GateXMLRdfaTransformation RdfaRdfXmlTransformation Document Reset PR ANNIE English Tokeniser ANNIE English Splitter ANNIE POS Tagger Data.gov.uk Morphological Analyzer Data.gov.uk Flexible Roof Gazetteer Data.gov.uk Generic Gazeteer GATE Noun Phrase Chunker Data.gov.uk Generic Transducer TSO Coreference Courtesy of Paul Appleby from TSO (Data Enrichment Service)
9. wasGeneratedBy wasGeneratedBy wasGeneratedBy hasParentProcess iterationOfProcess Level 1: Provenance of execution at higher level Level 0: Provenance of execution at detailed level Services used by executions Artifacts followed wasDerivedFrom A data collection wasTriggeredBy wasTriggeredBy accessedService
10.
11.
12.
13. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)
Editor's Notes
Tooling: Converting & generating; Linked data API; Syndication & merging; Validation; Predicate-based services; Data enrichment services; Visualisations
Stylesheet could import another stylesheet One stylesheet could be included in another stylesheet
Stylesheet could import another stylesheet One stylesheet could be included in another stylesheet