Open data is a crucial prerequisite for inventing and disseminating the innovative practices needed for agricultural development. To be usable, data must not just be open in principle—i.e., covered by licenses that allow re-use. Data must also be published in a technical form that allows it to be integrated into a wide range of applications. The webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data cloud.
This webinar describes the technical solutions adopted by a widely diverse global network of agricultural research institutes for publishing research results. The talk focuses on AGRIS, a central and widely-used resource linking agricultural datasets for easy consumption, and AgriDrupal, an adaptation of the popular, open-source content management system Drupal optimized for producing and consuming linked datasets.
Agricultural research institutes in developing countries share many of the constraints faced by libraries and other documentation centers, and not just in developing countries: institutions are expected to expose their information on the Web in a re-usable form with shoestring budgets and with technical staff working in local languages and continually lured by higher-paying work in the private sector. Technical solutions must be easy to adopt and freely available.
Open data is a crucial prerequisite for inventing and disseminating the innovative practices needed for agricultural development. To be usable, data must not just be open in principle—i.e., covered by licenses that allow re-use. Data must also be published in a technical form that allows it to be integrated into a wide range of applications. The webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data cloud.
This webinar describes the technical solutions adopted by a widely diverse global network of agricultural research institutes for publishing research results. The talk focuses on AGRIS, a central and widely-used resource linking agricultural datasets for easy consumption, and AgriDrupal, an adaptation of the popular, open-source content management system Drupal optimized for producing and consuming linked datasets.
Agricultural research institutes in developing countries share many of the constraints faced by libraries and other documentation centers, and not just in developing countries: institutions are expected to expose their information on the Web in a re-usable form with shoestring budgets and with technical staff working in local languages and continually lured by higher-paying work in the private sector. Technical solutions must be easy to adopt and freely available.
Library Linked Data in Latvia - #LIBER2014 posterUldis Bojars
Library Linked Data poster at the LIBER-2014 conference in Riga, Latvia: http://liber2014.wp.lnb.lv/programme/posters/abstracts-and-biographies/#LibraryLinkedDatainLatvia
The slides show what is linked data and how we experiment with linked data in the area of legislative documents (in Czech Republic).
Download the slides for detailed embedded comments.
Libraries around the world have a long tradition of maintaining authority files to assure the consistent presentation and indexing of names. As library authority files have become available online, the authority data has become accessible -- and many have been published as Linked Open Data (LOD) -- but names in one library authority file typically had no link to corresponding records for persons and organizations in other library authority files. After a successful experiment in matching the Library of Congress/NACO authority file with the German National Library's authority file, an online system called the Virtual International Authority File was developed to facilitate sharing by ingesting, matching, and displaying the relations between records in multiple authority files.
The Virtual International Authority File (VIAF) has grown from three source files in 2007 to more than two dozen files today. The system harvests authority records, enhances them with bibliographic information and brings them together into clusters when it is confident the records describe the same identity. Although the most visible part of VIAF is a HTML interface, the API beneath it supports a linked data view of VIAF with URIs representing the identities themselves, not just URIs for the clusters. It supports names for person, corporations, geographic entities, works, and expressions. With English, French, German, Spanish interfaces (and a Japanese in process), the system is used around the world, with over a million queries per day.
Speaker
Thomas Hickey is Chief Scientist at OCLC where he helped found OCLC Research. Current interests include metadata creation and editing systems, authority control, parallel systems for bibliographic processing, and information retrieval and display. In addition to implementing VIAF, his group looks into exploring Web access to metadata, identification of FRBR works and expressions in WorldCat, the algorithmic creation of authorities, and the characterization of collections. He has an undergraduate degree in Physics and a Ph.D. in Library and Information Science.
This presentation was given by Ted Lawless of Thomson Reuters during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016.
Case study: Towards a linked digital collection of Latvian Cultural HeritageUldis Bojars
A talk about the Linked Digital Collection pilot project "Rainis and Aspazija" presented at WHiSe 2016 (1st Workshop on Humanities in the Semantic Web).
Paper: http://ceur-ws.org/Vol-1608/paper-04.pdf
Project: http://runa.lnb.lv/
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsMatteoBelcao
A theorical & practical comparison between the currently most used open-source Knowledge Graphs: DBpedia, Wikidata, Yago
Practical explaination of how to query each Knwlwdge Graph with SPARQL and the sandboxes
Towards digitizing scholarly communicationSören Auer
Slides of the VIVO 2016 Conference keynote: Despite the availability of ubiquitous connectivity and information technology, scholarly communication has not changed much in the last hundred years: research findings are still encoded in and decoded from linear, static articles and the possibilities of digitization are rarely used. In this talk, we will discuss strategies for digitizing scholarly communication. This comprises in particular: the use of machine-readable, dynamic content; the description and interlinking of research artifacts using Linked Data; the crowd-sourcing of multilingual
educational and learning content. We discuss the relation of these developments to research information systems and how they could become part of an open ecosystem for scholarly communication.
As described in the April NISO/DCMI webinar by Dan Brickley, schema.org is a search-engine initiative aimed at helping webmasters use structured data markup to improve the discovery and display of search results. Drupal 7 makes it easy to markup HTML pages with schema.org terms, allowing users to quickly build websites with structured data that can be understood by Google and displayed as Rich Snippets.
Improved search results are only part of the story, however. Data-bearing documents become machine-processable once you find them. The subject matter, important facts, calendar events, authorship, licensing, and whatever else you might like to share become there for the taking. Sales reports, RSS feeds, industry analysis, maps, diagrams and process artifacts can now connect back to other data sets to provide linkage to context and related content. The key to this is the adoption standards for both the data model (RDF) and the means of weaving it into documents (RDFa). Drupal 7 has become the leading content platform to adopt these standards.
This webinar will describe how RDFa and Drupal 7 can improve how organizations publish information and data on the Web for both internal and external consumption. It will discuss what is required to use these features and how they impact publication workflow. The talk will focus on high-level and accessible demonstrations of what is possible. Technical people should learn how to proceed while non-technical people will learn what is possible.
This presentation was delivered by Beacher Wiggins of the Library of Congress during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016.
Existing data management approaches assume control over schema, data and data generation, which is not the case in open, de-centralised environments such as the Web. The lack of control means that there are social processes necessary to generate 'ordo ab chao' and hence a new life cycle model is necessary.
Based on our experience in Linked Data publishing and consumption over the past years, we have identify involved parties and fundamental phases, which provide for a multitude of so called Linked Data life cycles.
If you want to hear me speak to the slides, you might want to check out the following videos on YouTube:
Part 1: http://www.youtube.com/watch?v=AFJSMKv5s3s
Part 2: http://www.youtube.com/watch?v=G6YJSZdXOsc
Part 3: http://www.youtube.com/watch?v=OagzNpDEPJg
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
This presentation was provided by Jackie Shieh of The Smithsonian Libraries, during the NISO webinar "Implementing Linked Library Data," held on November 13, 2019.
This presentation was provided by Jean Godby of The OCLC Online Computer Library Center, during the NISO webinar "Implementing Linked Library Data," held on November 13, 2019.
Linked data for Enterprise Data IntegrationSören Auer
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
This tutorial explains the Data Web vision, some preliminary standards and technologies as well as some tools and technological building blocks developed by AKSW research group from Universität Leipzig.
Structured Dynamics provides 'ontology-driven applications'. Our product stack is geared to enable the semantic enterprise. The products are premised on preserving and leveraging existing information assets in an incremental, low-risk way. SD's products span from converters to authoring environments to Web services middleware and to eventual ontologies and user interfaces and applications.
Library Linked Data in Latvia - #LIBER2014 posterUldis Bojars
Library Linked Data poster at the LIBER-2014 conference in Riga, Latvia: http://liber2014.wp.lnb.lv/programme/posters/abstracts-and-biographies/#LibraryLinkedDatainLatvia
The slides show what is linked data and how we experiment with linked data in the area of legislative documents (in Czech Republic).
Download the slides for detailed embedded comments.
Libraries around the world have a long tradition of maintaining authority files to assure the consistent presentation and indexing of names. As library authority files have become available online, the authority data has become accessible -- and many have been published as Linked Open Data (LOD) -- but names in one library authority file typically had no link to corresponding records for persons and organizations in other library authority files. After a successful experiment in matching the Library of Congress/NACO authority file with the German National Library's authority file, an online system called the Virtual International Authority File was developed to facilitate sharing by ingesting, matching, and displaying the relations between records in multiple authority files.
The Virtual International Authority File (VIAF) has grown from three source files in 2007 to more than two dozen files today. The system harvests authority records, enhances them with bibliographic information and brings them together into clusters when it is confident the records describe the same identity. Although the most visible part of VIAF is a HTML interface, the API beneath it supports a linked data view of VIAF with URIs representing the identities themselves, not just URIs for the clusters. It supports names for person, corporations, geographic entities, works, and expressions. With English, French, German, Spanish interfaces (and a Japanese in process), the system is used around the world, with over a million queries per day.
Speaker
Thomas Hickey is Chief Scientist at OCLC where he helped found OCLC Research. Current interests include metadata creation and editing systems, authority control, parallel systems for bibliographic processing, and information retrieval and display. In addition to implementing VIAF, his group looks into exploring Web access to metadata, identification of FRBR works and expressions in WorldCat, the algorithmic creation of authorities, and the characterization of collections. He has an undergraduate degree in Physics and a Ph.D. in Library and Information Science.
This presentation was given by Ted Lawless of Thomson Reuters during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016.
Case study: Towards a linked digital collection of Latvian Cultural HeritageUldis Bojars
A talk about the Linked Digital Collection pilot project "Rainis and Aspazija" presented at WHiSe 2016 (1st Workshop on Humanities in the Semantic Web).
Paper: http://ceur-ws.org/Vol-1608/paper-04.pdf
Project: http://runa.lnb.lv/
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsMatteoBelcao
A theorical & practical comparison between the currently most used open-source Knowledge Graphs: DBpedia, Wikidata, Yago
Practical explaination of how to query each Knwlwdge Graph with SPARQL and the sandboxes
Towards digitizing scholarly communicationSören Auer
Slides of the VIVO 2016 Conference keynote: Despite the availability of ubiquitous connectivity and information technology, scholarly communication has not changed much in the last hundred years: research findings are still encoded in and decoded from linear, static articles and the possibilities of digitization are rarely used. In this talk, we will discuss strategies for digitizing scholarly communication. This comprises in particular: the use of machine-readable, dynamic content; the description and interlinking of research artifacts using Linked Data; the crowd-sourcing of multilingual
educational and learning content. We discuss the relation of these developments to research information systems and how they could become part of an open ecosystem for scholarly communication.
As described in the April NISO/DCMI webinar by Dan Brickley, schema.org is a search-engine initiative aimed at helping webmasters use structured data markup to improve the discovery and display of search results. Drupal 7 makes it easy to markup HTML pages with schema.org terms, allowing users to quickly build websites with structured data that can be understood by Google and displayed as Rich Snippets.
Improved search results are only part of the story, however. Data-bearing documents become machine-processable once you find them. The subject matter, important facts, calendar events, authorship, licensing, and whatever else you might like to share become there for the taking. Sales reports, RSS feeds, industry analysis, maps, diagrams and process artifacts can now connect back to other data sets to provide linkage to context and related content. The key to this is the adoption standards for both the data model (RDF) and the means of weaving it into documents (RDFa). Drupal 7 has become the leading content platform to adopt these standards.
This webinar will describe how RDFa and Drupal 7 can improve how organizations publish information and data on the Web for both internal and external consumption. It will discuss what is required to use these features and how they impact publication workflow. The talk will focus on high-level and accessible demonstrations of what is possible. Technical people should learn how to proceed while non-technical people will learn what is possible.
This presentation was delivered by Beacher Wiggins of the Library of Congress during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016.
Existing data management approaches assume control over schema, data and data generation, which is not the case in open, de-centralised environments such as the Web. The lack of control means that there are social processes necessary to generate 'ordo ab chao' and hence a new life cycle model is necessary.
Based on our experience in Linked Data publishing and consumption over the past years, we have identify involved parties and fundamental phases, which provide for a multitude of so called Linked Data life cycles.
If you want to hear me speak to the slides, you might want to check out the following videos on YouTube:
Part 1: http://www.youtube.com/watch?v=AFJSMKv5s3s
Part 2: http://www.youtube.com/watch?v=G6YJSZdXOsc
Part 3: http://www.youtube.com/watch?v=OagzNpDEPJg
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
This presentation was provided by Jackie Shieh of The Smithsonian Libraries, during the NISO webinar "Implementing Linked Library Data," held on November 13, 2019.
This presentation was provided by Jean Godby of The OCLC Online Computer Library Center, during the NISO webinar "Implementing Linked Library Data," held on November 13, 2019.
Linked data for Enterprise Data IntegrationSören Auer
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
This tutorial explains the Data Web vision, some preliminary standards and technologies as well as some tools and technological building blocks developed by AKSW research group from Universität Leipzig.
Structured Dynamics provides 'ontology-driven applications'. Our product stack is geared to enable the semantic enterprise. The products are premised on preserving and leveraging existing information assets in an incremental, low-risk way. SD's products span from converters to authoring environments to Web services middleware and to eventual ontologies and user interfaces and applications.
The Semantic Web and Libraries in the United States: Experimentation and Achi...New York University
This presentation reflects the paper titled "The Semantic Web and Libraries in the United States: Experimentation and Achievements," published in the proceedings of 75th IFLA General Conference and Assembly, Satellite Meeting: Emerging Trends in Technology: Libraries between Web 2.0, Semantic Web and Search Technology 8/19-20/2009, in Florence, Italy, presented by Sharon Yang, Rider University, Yanyi Lee, Wagner College, and Amanda Xu, St. John's University. Here is the URL to the full paper: http://www.ifla2009satelliteflorence.it/meeting3/program/assets/SharonYang.pdf
Presentation at ELAG 2011, European Library Automation Group Conference, Prague, Czech Republic. 25th May 2011
http://elag2011.techlib.cz/en/815-lifting-the-lid-on-linked-data/
Similar to Legislative data portals and linked data quality (20)
Although RDF can be considered the corner stone of semantic web and knowledge graphs, it has not been embraced by everyday programmers and software architects who want to safely create and access well-structured data. There is a lack of common tools and methodologies that are available in more conventional settings to improve data quality by defining schemas that can later be validated. Two technologies have recently been proposed for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). In the talk, we will briefly introduce both technologies using some examples and compare them. We will also present some challenges and applications related with RDF data shapes.
Talk given at: KTH Royal Institute of Technology, School of Industrial Engineering and Management, Mechatronics Division, 7th February, 2020
Although RDF is a corner stone of semantic web and knowledge graphs, it has not been embraced by everyday programmers and software architects who need to safely create and access well-structured data. There is a lack of common tools and methodologies that are available in more conventional settings to improve data quality by defining schemas that can later be validated. Two technologies have recently been proposed for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). In the talk, we will review the history and motivation of both technologies. We will also and enumerate some challenges and future work with regards to RDF validation.
Como publicar los datos: datos abiertos y enlazados
Charla impartida en Jornadas Open Data y Transparencia: Ayuntamiento de Oviedo
11 de septiembre de 2017
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Runway Orientation Based on the Wind Rose Diagram.pptx
Legislative data portals and linked data quality
1. Legislative data portals and
linked data quality
Jose Emilio Labra Gayo
WESO
WEb Semantics Oviedo
University of Oviedo, Spain
2. About me…
In 2004, founded WESO (Web Semantics Oviedo) research group
Goal: Practical applications of semantic technologies
Several domains: e-Government, e-Health,...
2 books:
"Web semántica" (in Spanish), 2012
"Validating RDF data", 2017
…and software:
SHaclEX (Scala library, implements ShEx & SHACL)
RDFShape (RDF playground)
WikiShape (Wikidata playground)
HTML version: http://book.validatingrdf.com
Examples: https://github.com/labra/validatingRDFBookExamples
3. About this talk...
Legislative linked data portals
Chilean National library of Congress
Presented at ESWC'19*
Linked data quality
RDF description and validation
ShEx & SHACL overview
1st part 2nd part
It will be divided in 2 parts
4. Legislative linked data portals
Processing the History of the Law at Chile
Francisco Cifuentes Silva
Library of Congress, Chile
PhD Student
WESO research group
Jose Emilio Labra Gayo
WESO research group
University of Oviedo, Spain
More information:
Cifuentes-Silva F., Labra Gayo J.E. (2019) Legislative Document Content Extraction Based on Semantic Web Technologies. In:
Hitzler P. et al. (eds) The Semantic Web. ESWC 2019. Lecture Notes in Computer Science, vol 11503. Springer, Cham
5. Chilean Library of Congress
In Spanish: BCN (Biblioteca del Congreso Nacional de Chile)
Political
powers
ExecutiveJudiciaryLegislative
Independent body inside the Legislative power
Advices the parliament and gives services to citizens
http://www.bcn.cl
6. 2 projects at library of congress (BCN)
History of the Law
Parliamentary work
7. History of the Law (LeyChile)
Collect all documents generated during a law legislative process
Phases:
An initiative sees life as a draft bill
Subject to debates
Validity time (it is published)
Modifications, additions,...
Derogation
Goal:
Capture the spirit of the law
Traceability
https://www.bcn.cl/historiadelaley
8. Parliamentary work
Collect all legislative activity by each Member of Parliament
Retrieve all interventions made
Parliamentary motion
Session journal
Commission report
Ordered and categorised
https://www.bcn.cl/laborparlamentaria/
9. Both projects adopted semantic technologies
Some initial reasons:
Semantic technologies considered one pillar of strategic plan (in 2014)
Innovative action to generate new products
Improve interoperability mechanisms
Sem. Web aligned well with open & public data
10. Which semantic technologies?
Text mining and content enrichment
Entity extraction
Topic identification
Automatic markup
Classification
Machine readable info
XML & URIs
RDF
Ontologies
Linked Open Data
12. Linked Open
Data
Query DB
Workflow overview
National library
Legislative documents
• Some docs in paper (requires OCR)
• Text documents
Automatic
XML
marker
SVN repository
Akoma-Ntoso
XML editor &
tools
Publishing
(RDF extraction
From Akoma-Ntoso)
Services
layer
Content
portals
14. Automatic XML marker
Text
Entity Type
MediatorLegal Knowledge
Base
Entity Type URI Structural
marker
Internal XML
representation
Converter
XML
AKN
Text
Text
Named Entity
Recognizer 4 phases
15. 1. Named Entity Recognizer
Detection of entities & types of entities
Web service implementing the Stanford NER with a CRF classifier
Evaluation in production: detects 97% entities
Type Some examples # of entities
Person Salvador Allende, Sebastián Piñera 5.139
Organization Ministerio de Salud, SERNATUR 2.848
Location Valparaíso, Santiago de Chile 1.251
Document Ley 20.000, Diario de sesión nº 12 732.497
Role Senador, Diputado, Alcalde 428
Events Nacimiento de Eduardo Frei, Sesión Nº 23 14.389
Law Boletín 11536-04, Prohíbe fumar en espacios cerrados 12.737
Dates 27 de febrero de 2010, el próximo año, ... 20.632
Text
Entity Type
Text
Named Entity
Recognizer
16. 2. Mediator
Entity linking and disambiguation
Text similarity algorithms
Based on Apache Lucene
In-house development
- Use of context information to narrow
list of candidates
- Custom filters and association
heuristics
- Specialized web services
Entity Type
Mediator
Legal Knowledge
Base
Entity Type URI
Text
Text
17. 3. Structural marker
Detect structures in the text
Titles, subtitles, paragraphs, sections,...
Special structure for debates: participation
Regular expressions + custom rules
Entity Type URI
Structural
marker
Internal XML
representation
Text
18. 4. XML converter to Akom-Ntoso
Programmatic approach
Internal XML representation similar to DOM
Each node converted to text in AKN-XML
Internal XML
representation
Converter
XML
AKN
19. Human edition of AKN-Documents
Quality assurance by human analysts
They review the generated XML documents
2 editors:
Ad-hoc XML editor
Commercial editor: LegisPro (Xcential)
20. Linked data generation
The pilot project (2011) carefully defined a stable URI model
URIs have been maintained since them
URIs = IDs in the whole system
URIs are dereferentiable
Content negotiation
Custom linked data browser
Documentation (in Spanish)
http://datos.bcn.cl/es/documentacion
21. AKN2RDF
RDF extraction from Akoma-Ntoso XML
● Custom-made converter (XSL discarded for perceived complexity)
● Each XML tag implemented in one Class
● Extracted data saved into multiple databases (Relational and RDF)
22. Linked data generation
Source: AKN XML documents
Linked data browser (WESO-DESH)
Target: RDF data
http://datos.bcn.cl/recurso/cl/documento/579095/http://datos.bcn.cl/recurso/cl/documento/579095.xml
24. Content delivery
Web portals using Open Source Technologies
CMS (Typo3)
Python/Java
Varnish
Apache Lucene
REST Web service layers which connect to RDF triplestore and DB
Data exports to PDF, Doc and XML formats
URIs of parliamentary profiles = URIs in triplestore
25. History of the Law portal
https://www.bcn.cl/historiadelaley
Links to
Members of
Parliament
Each article
has a link
Different
versions
of a law
26. History of the Law portal
https://www.bcn.cl/historiadelaley
Compare
different
versions
32. Regions mentioned by law
Result of a legislative hackathon
http://datos.bcn.cl/global-legislative-hackathon-2016/Hackaton/www/html/master.html
In 2010 there was an
Earthquake in BioBio region
33. Statistics
24.368 documents (nov. 2018)
Number of RDF triples: 28 millions
According to Google analytics
Average browsing time: 2min 26s
Visits received 331,481 (nov. 2016-2017) 476,241 (nov. 2017-2018)
34. ...and some findings...
Question: why are there some valleys?
Dictatorship time
Session attendance by year
RDF triples generated by year
35. Lessons learnt
RDF granularity & inference trade-off
RDF statements + inference (high running times...queries that didn't terminate)
A priori inferred triples added to triple store (high response times for large docs)
Small subset of RDF triples (structural parts of docs and metadata)
Performance problems in XML editor browsing long docs (>1000pages)
Low SPARQL endpoint usage by external apps
If we could start again, I would recommend ShEx
Personal note: These kind of projects led to my interest in ShEx
36. Conclusions & future projects
Well designed URIs can act as a perfect glue for interoperability
Automatic workflow pipelines help long-term survival of LD-based projects
SPARQL endpoint since 2011
Future projects on top of existing ones
National Budget as Linked data
Diana Project: Members of Parliament linked to social network analysis
New portal: User customization & recommender systems
38. RDF, the good parts...
RDF as an integration language
RDF as a lingua franca for semantic web and linked data
RDF flexibility & integration
Data can be adapted to multiple environments
Open and reusable data by default
RDF for knowledge representation
RDF data stores & SPARQL
39. RDF, the other parts
Consuming & producing RDF
Multiple syntaxes: Turtle, RDF/XML, JSON-LD, ...
Embedding RDF in HTML
Describing and validating RDF content
Producer Consumer
40. Why describe & validate RDF?
For producers
Developers can understand the contents they are going to produce
Ensure they produce the expected structure
Advertise and document the structure
Generate interfaces
For consumers
Understand the contents
Verify the structure before processing it
Query generation & optimization
Producer Consumer
Shapes
42. ShEx and SHACL
2013 RDF Validation Workshop
Conclusions of the workshop:
There is a need of a higher level, concise language for RDF Validation
ShEx initially proposed (v 1.0)
2014 W3c Data Shapes WG chartered
2017 SHACL accepted as W3C recommendation
2017 ShEx 2.0 released as Community group draft
2019 ShEx adopted by Wikidata
43. Short intro to ShEx
ShEx (Shape Expressions Language)
Concise and human-readable language for RDF validation & description
Syntax similar to SPARQL, Turtle
Semantics inspired by regular expressions & RelaxNG
2 syntaxes: Compact and RDF/JSON-LD
Official info: http://shex.io
Semantics: http://shex.io/shex-semantics/, primer: http://shex.io/shex-primer
44. Simple example
Nodes conforming to <User> shape must:
• Be IRIs
• Have exactly one schema:name with a value of type xsd:string
• Have zero or more schema:knows whose values conform to <User>
prefix schema: <http://schema.org/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
<User> IRI {
schema:name xsd:string ;
schema:knows @<User> *
}
Prefix declarations
as in
Turtle/SPARQL
51. Some challenges and perspectives
Theoretical foundations of ShEx/SHACL
Generating shapes from data
Validation Usability
RDF Stream validation
Schema ecosystems
Wikidata
Solid
52. Theoretical foundations of ShEx/SHACL
Conversion between ShEx and SHACL
SHaclEX library converts subsets of both
Challenges
Recursion and negation
Performance and algorithmic complexity
Detect useful subsets of the languages
Convert to SPARQL
Schema/data mapping Shapes Shapes
ShEx SHACL
53. Generating Shapes from Data
Useful use case in practice
Knowledge Graph summarization
Some prototypes:
ShExer, RDFShape, ShapeArchitect
Try it with RDFShape:
https://tinyurl.com/y8pjcbyf
Shape Expression
generated for
wd:Q51613194
Shapes
RDF data
infer
55. RDF Stream validation
Validation of RDF streams
Challenges:
Incremental validation
Named graphs
Addition/removal of triples
Sensor
Stream
RDF
data
Shapes
Validator
Stream
Validation
results
56. Schema ecosystems: Wikidata
In May, 2019, Wikidata announced ShEx adoption
New namespace for schemas
Example: https://www.wikidata.org/wiki/EntitySchema:E2
It opens lots of opportunities/challenges
Schema evolution and comparison
57. Schema ecosystems: Solid project
SOLID (SOcial Linked Data): Promoted by Tim Berners-Lee
Goal: Re-decentralize the Web
Separate data from apps
Give users more control about their data
Internally using linked data & RDF
Shapes needed for interoperability
Data
pod
Shape
A
Shape
C
Shape
B
App
1
App
2
App
3"...I just can’t stop thinking about shapes.", Ruben Verborgh
https://ruben.verborgh.org/blog/2019/06/17/shaping-linked-data-apps/
58. Conclusions
Explicit schemas (shapes) can improve linked data quality
2 languages proposed: ShEx/SHACL
Towards an ecosystem of shapes for data portals
New challenges and opportunities
59. More info
About ShEx
ShEx by Example (slides):
https://figshare.com/articles/ShExByExample_pptx/6291464
ShEx chapter from Validating RDF data book:
http://book.validatingrdf.com/bookHtml010.html
About SHACL
SHACL by example (slides):
https://figshare.com/articles/SHACL_by_example/6449645
SHACL chapter at Validating RDF data book
http://book.validatingrdf.com/bookHtml011.html
Comparing ShEx and SHACL
http://book.validatingrdf.com/bookHtml013.html