SlideShare a Scribd company logo
Legislative document content extraction based
on Semantic Web technologies
A use case about processing the History of the Law at Chile
Francisco Cifuentes Silva
Library of Congress, Chile
PhD Student
WESO research group
Jose Emilio Labra Gayo
WESO research group
University of Oviedo, Spain
Chilean Library of Congress
In Spanish: BCN (Biblioteca del Congreso Nacional de Chile)
Political
powers
ExecutiveJudiciaryLegislative
Independent body inside the Legislative power
Advices the parliament and gives services to citizens
http://www.bcn.cl
2 projects at library of congress (BCN)
History of the Law
Parliamentary work
History of the Law (LeyChile)
Collect all documents generated during a law legislative process
Phases:
An initiative sees life as a draft bill
Subject to debates
Validity time (it is published)
Modifications, additions,...
Derogation
Goal:
Capture the spirit of the law
Traceability
https://www.bcn.cl/historiadelaley
Parliamentary work
Collect all legislative activity by each Member of Parliament
Retrieve all interventions made
Parliamentary motion
Session journal
Commission report
Ordered and categorised
https://www.bcn.cl/laborparlamentaria/
Both projects adopted semantic technologies
Some initial reasons:
Semantic technologies considered one pillar of strategic plan (in 2014)
Innovative action to generate new products
Improve interoperability mechanisms
Sem. Web aligned well with open & public data
Which semantic technologies?
Text mining and content enrichment
Entity extraction
Topic identification
Automatic markup
Classification
Machine readable info
XML & URIs
RDF
Ontologies
Linked Open Data
Workflow pipelines
3 main steps
Automatic XML Marker
RDF & Linked data generation
Content delivery
Linked Open
Data
Query DB
Workflow overview
National library
Legislative documents
• Paper (requires OCR)
• Text documents
Automatic
XML
marker
SVN repository
Akoma-Ntoso
XML editor &
tools
Publishing
(RDF extraction
From Akoma-Ntoso)
Services
layer
Content
portals
Automatic XML marker
Source: Text Target: XML following Akoma-Ntoso
Automatic XML marker
Text
Entity Type
MediatorLegal Knowledge
Base
Entity Type URI Structural
marker
Internal XML
representation
Converter
XML
AKN
Text
Text
Named Entity
Recognizer 4 phases
1. Named Entity Recognizer
Detection of entities & types of entities
Web service implementing the Stanford NER with a CRF classifier
Evaluation in production: detects 97% entities
Type Some examples # of entities
Person Salvador Allende, Sebastián Piñera 5.139
Organization Ministerio de Salud, SERNATUR 2.848
Location Valparaíso, Santiago de Chile 1.251
Document Ley 20.000, Diario de sesión nº 12 732.497
Role Senador, Diputado, Alcalde 428
Events Nacimiento de Eduardo Frei, Sesión Nº 23 14.389
Law Boletín 11536-04, Prohíbe fumar en espacios cerrados 12.737
Dates 27 de febrero de 2010, el próximo año, ... 20.632
Text
Entity Type
Text
Named Entity
Recognizer
2. Mediator
Entity linking and disambiguation
Text similarity algorithms
Based on Apache Lucene
In-house development
- Use of context information to narrow
list of candidates
- Custom filters and association
heuristics
- Specialized web services
Entity Type
Mediator
Legal Knowledge
Base
Entity Type URI
Text
Text
3. Structural marker
Detect structures in the text
Titles, subtitles, paragraphs, sections,...
Special structure for debates: participation
Regular expressions + custom rules
Entity Type URI
Structural
marker
Internal XML
representation
Text
4. XML converter to Akom-Ntoso
Programmatic approach
Internal XML representation similar to DOM
Each node converted to text in AKN-XML
Internal XML
representation
Converter
XML
AKN
Human edition of AKN-Documents
Quality assurance by human analysts
They review the generated XML documents
2 editors:
Ad-hoc XML editor
Commercial editor: LegisPro (Xcential)
Linked data generation
The pilot project (2011) carefully defined a stable URI model
URIs have been maintained since them
URIs = IDs in the whole system
URIs are dereferentiable
Content negotiation
Custom linked data browser
Documentation (in Spanish)
http://datos.bcn.cl/es/documentacion
AKN2RDF
RDF extraction from Akoma-Ntoso XML
● Custom-made converter (XSL discarded for perceived complexity)
● Each XML tag implemented in one Class
● Extracted data saved into multiple databases (Relational and RDF)
Linked data generation
Source: AKN XML documents
Linked data browser (WESO-DESH)
Target: RDF data
http://datos.bcn.cl/recurso/cl/documento/579095/http://datos.bcn.cl/recurso/cl/documento/579095.xml
SPARQL endpoint
RDF triples are published as a public SPARQL endpoint
Number of norms by municipality
Content delivery
Web portals using Open Source Technologies
CMS (Typo3)
Python/Java
Varnish
Apache Lucene
REST Web service layers which connect to RDF triplestore and DB
Data exports to PDF, Doc and XML formats
URIs of parliamentary profiles = URIs in triplestore
History of the Law portal
https://www.bcn.cl/historiadelaley
Links to
Members of
Parliament
Each article
has a link
Different
versions
of a law
History of the Law portal
https://www.bcn.cl/historiadelaley
Compare
different
versions
Parliamentary Work
https://www.bcn.cl/laborparlamentaria
Show
participation of
each Member of
Parliament
Some experimental visualizations
Relationships between laws
Historical Parliament
Parliamentary genealogy (family relationships)
Regions mentioned in laws (legislative hackathon)
Links between laws
Historical parliament
http://datos.bcn.cl/visualizaciones/genealogia-parlamentaria/
Parliamentary genealogy
http://datos.bcn.cl/visualizaciones/genealogia-parlamentaria/consulta.jsp
Regions mentioned by law
Result of a legislative hackathon
http://datos.bcn.cl/global-legislative-hackathon-2016/Hackaton/www/html/master.html
In 2010 there was an
Earthquake in BioBio region
Some statistics
24.368 documents (nov. 2018)
Number of RDF triples: 28 millions
According to Google analytics
Average browsing time: 2min 26s
Visits received 331,481 (nov. 2016-2017)  476,241 (nov. 2016-2017)
And some findings...
Question: why are there some valleys?
Dictatorship time
Session attendance by year
RDF triples generated by year
Some lessons learnt
RDF granularity & inference trade-off
RDF statements + inference (high running times...queries that didn't terminate)
A priori inferred triples added to triple store (high response times for large docs)
Small subset of RDF triples (structural parts of docs and metadata)
Performance problems in XML editor browsing long docs (>1000pages)
Low SPARQL endpoint usage by external apps
If we could start again, I would recommend ShEx
Personal note: These kind of data portals led to my interest in ShEx
Conclusions & future projects
Well designed URIs can act as a perfect glue for interoperability
Automatic workflow pipelines help long-term survival of LD-based projects
SPARQL endpoint since 2011
Future projects on top of existing ones
National Budget as Linked data
Diana Project: Members of Parliament linked to social network analysis
New portal: User customization & recommender systems
End of presentation
Acknowledgements:
David Vilches, Eridan Otto, Christian Sifaqui

More Related Content

What's hot

SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
MatteoBelcao
 
Enabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityEnabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperability
Irina Bolychevsky
 
Snac webinar v3
Snac webinar v3Snac webinar v3
Snac webinar v3
Brian Tingle
 
Building NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham CousinsBuilding NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham Cousins
Connected Data World
 
UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...
The-National-Archives
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
sopekmir
 
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceBIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
OCLC
 
Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Raf Buyle
 
Stahmer-9-Jun15-final
Stahmer-9-Jun15-finalStahmer-9-Jun15-final
Wacker-4-june15
Wacker-4-june15Wacker-4-june15
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2
Jenel Farrell
 
Moving to the network level: discovery and disclosure
Moving to the network level:discovery and disclosureMoving to the network level:discovery and disclosure
Moving to the network level: discovery and disclosure
lisld
 
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
National Information Standards Organization (NISO)
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
Cason Snow
 
Linked data MLA 2015
Linked data MLA 2015Linked data MLA 2015
Linked data MLA 2015
Cason Snow
 
Linked Data MLA 2015
Linked Data MLA 2015Linked Data MLA 2015
Linked Data MLA 2015
Cason Snow
 
Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...
National Information Standards Organization (NISO)
 
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
National Information Standards Organization (NISO)
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
Connected Data World
 
Semantic web
Semantic webSemantic web
Semantic web
Myungjin Lee
 

What's hot (20)

SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
 
Enabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityEnabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperability
 
Snac webinar v3
Snac webinar v3Snac webinar v3
Snac webinar v3
 
Building NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham CousinsBuilding NextGen Enterprise data platforms | Graham Cousins
Building NextGen Enterprise data platforms | Graham Cousins
 
UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...UKAD forum 2013: What is an API and what might the Discovery API mean for con...
UKAD forum 2013: What is an API and what might the Discovery API mean for con...
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceBIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
 
Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...Open standards for linked organisations | meeting Estonia - Flemish Governmen...
Open standards for linked organisations | meeting Estonia - Flemish Governmen...
 
Stahmer-9-Jun15-final
Stahmer-9-Jun15-finalStahmer-9-Jun15-final
Stahmer-9-Jun15-final
 
Wacker-4-june15
Wacker-4-june15Wacker-4-june15
Wacker-4-june15
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2
 
Moving to the network level: discovery and disclosure
Moving to the network level:discovery and disclosureMoving to the network level:discovery and disclosure
Moving to the network level: discovery and disclosure
 
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
Shieh "Enabling Descriptive Data to be Linked at the Smithsonian Libraries"
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
Linked data MLA 2015
Linked data MLA 2015Linked data MLA 2015
Linked data MLA 2015
 
Linked Data MLA 2015
Linked Data MLA 2015Linked Data MLA 2015
Linked Data MLA 2015
 
Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...Godby "'What are the 'entities that matter?' And how much should we say about...
Godby "'What are the 'entities that matter?' And how much should we say about...
 
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Semantic web
Semantic webSemantic web
Semantic web
 

Similar to Legislative document content extraction based on Semantic Web technologies

Publishing web content tailored to audiences / Liberando contenido a la med...
Publishing  web content tailored to  audiences / Liberando contenido a la med...Publishing  web content tailored to  audiences / Liberando contenido a la med...
Publishing web content tailored to audiences / Liberando contenido a la med...
congresochile
 
Lex school 2011
Lex school 2011Lex school 2011
Lex school 2011
Christian Sifaqui
 
The ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and EvaluationThe ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and Evaluation
samossummit
 
Workshop on "Legislative XML
Workshop on "Legislative XMLWorkshop on "Legislative XML
Workshop on "Legislative XML
Marcelo Gomes Freire
 
Collecter 04
Collecter 04Collecter 04
Collecter 04
Christian Sifaqui
 
Roles of the Chilean Library of Congress
Roles of the Chilean Library of CongressRoles of the Chilean Library of Congress
Roles of the Chilean Library of Congresscongresochile
 
E resources for law libraries
E resources for law librariesE resources for law libraries
E resources for law libraries
Kishor Satpathy
 
Presentación para USM
Presentación para USMPresentación para USM
Presentación para USM
Christian Sifaqui
 
eGov2017 Blockchain Technology
eGov2017 Blockchain TechnologyeGov2017 Blockchain Technology
eGov2017 Blockchain Technology
Vestforsk.no
 
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
Building a Legal Taxonomy &  Thesaurus: The Palestinian ExperienceBuilding a Legal Taxonomy &  Thesaurus: The Palestinian Experience
Building a Legal Taxonomy & Thesaurus: The Palestinian ExperienceJamil Salem
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Benoit Pauwels
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...ULB - Bibliothèques
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Peter Neish
 
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, JiscUKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG: connecting the knowledge community
 
Limitreal
LimitrealLimitreal
Limitreal
Connie Rinaldo
 
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin BlockchainBeyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Vestforsk.no
 
Information Technology and Legal Education_
Information Technology and Legal Education_Information Technology and Legal Education_
Information Technology and Legal Education_
Kamlesh Singh
 
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
eraser Juan José Calderón
 
Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)
Brian Huffman
 

Similar to Legislative document content extraction based on Semantic Web technologies (20)

Publishing web content tailored to audiences / Liberando contenido a la med...
Publishing  web content tailored to  audiences / Liberando contenido a la med...Publishing  web content tailored to  audiences / Liberando contenido a la med...
Publishing web content tailored to audiences / Liberando contenido a la med...
 
Lex school 2011
Lex school 2011Lex school 2011
Lex school 2011
 
The ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and EvaluationThe ManyLaws Platform. Workshop: Demo Application and Evaluation
The ManyLaws Platform. Workshop: Demo Application and Evaluation
 
Workshop on "Legislative XML
Workshop on "Legislative XMLWorkshop on "Legislative XML
Workshop on "Legislative XML
 
Collecter 04
Collecter 04Collecter 04
Collecter 04
 
Introduction to uk legislation
Introduction to uk legislationIntroduction to uk legislation
Introduction to uk legislation
 
Roles of the Chilean Library of Congress
Roles of the Chilean Library of CongressRoles of the Chilean Library of Congress
Roles of the Chilean Library of Congress
 
E resources for law libraries
E resources for law librariesE resources for law libraries
E resources for law libraries
 
Presentación para USM
Presentación para USMPresentación para USM
Presentación para USM
 
eGov2017 Blockchain Technology
eGov2017 Blockchain TechnologyeGov2017 Blockchain Technology
eGov2017 Blockchain Technology
 
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
Building a Legal Taxonomy &  Thesaurus: The Palestinian ExperienceBuilding a Legal Taxonomy &  Thesaurus: The Palestinian Experience
Building a Legal Taxonomy & Thesaurus: The Palestinian Experience
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011
 
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, JiscUKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
 
Limitreal
LimitrealLimitreal
Limitreal
 
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin BlockchainBeyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
Beyond Bitcoin - Enabling Smart Government Using the Bitcoin Blockchain
 
Information Technology and Legal Education_
Information Technology and Legal Education_Information Technology and Legal Education_
Information Technology and Legal Education_
 
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
Impact of Technological Blockchain Paradigm on the Movement of Intellectual P...
 
Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)Statutes, Cases, & Codes, Oh My (MN)
Statutes, Cases, & Codes, Oh My (MN)
 

More from Jose Emilio Labra Gayo

Publicaciones de investigación
Publicaciones de investigaciónPublicaciones de investigación
Publicaciones de investigación
Jose Emilio Labra Gayo
 
Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctorado
Jose Emilio Labra Gayo
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
Jose Emilio Labra Gayo
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
Jose Emilio Labra Gayo
 
Wikidata
WikidataWikidata
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
Jose Emilio Labra Gayo
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
Jose Emilio Labra Gayo
 
Introducción a la Web Semántica
Introducción a la Web SemánticaIntroducción a la Web Semántica
Introducción a la Web Semántica
Jose Emilio Labra Gayo
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
Jose Emilio Labra Gayo
 
2017 Tendencias en informática
2017 Tendencias en informática2017 Tendencias en informática
2017 Tendencias en informática
Jose Emilio Labra Gayo
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
Jose Emilio Labra Gayo
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
Jose Emilio Labra Gayo
 
19 javascript servidor
19 javascript servidor19 javascript servidor
19 javascript servidor
Jose Emilio Labra Gayo
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazados
Jose Emilio Labra Gayo
 
16 Alternativas XML
16 Alternativas XML16 Alternativas XML
16 Alternativas XML
Jose Emilio Labra Gayo
 
XSLT
XSLTXSLT
XPath
XPathXPath
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el Servidor
Jose Emilio Labra Gayo
 
RDF validation tutorial
RDF validation tutorialRDF validation tutorial
RDF validation tutorial
Jose Emilio Labra Gayo
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
Jose Emilio Labra Gayo
 

More from Jose Emilio Labra Gayo (20)

Publicaciones de investigación
Publicaciones de investigaciónPublicaciones de investigación
Publicaciones de investigación
 
Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctorado
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Wikidata
WikidataWikidata
Wikidata
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introducción a la Web Semántica
Introducción a la Web SemánticaIntroducción a la Web Semántica
Introducción a la Web Semántica
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
2017 Tendencias en informática
2017 Tendencias en informática2017 Tendencias en informática
2017 Tendencias en informática
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
19 javascript servidor
19 javascript servidor19 javascript servidor
19 javascript servidor
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazados
 
16 Alternativas XML
16 Alternativas XML16 Alternativas XML
16 Alternativas XML
 
XSLT
XSLTXSLT
XSLT
 
XPath
XPathXPath
XPath
 
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el Servidor
 
RDF validation tutorial
RDF validation tutorialRDF validation tutorial
RDF validation tutorial
 
RDF Validation Future work and applications
RDF Validation Future work and applicationsRDF Validation Future work and applications
RDF Validation Future work and applications
 

Recently uploaded

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

Legislative document content extraction based on Semantic Web technologies

  • 1. Legislative document content extraction based on Semantic Web technologies A use case about processing the History of the Law at Chile Francisco Cifuentes Silva Library of Congress, Chile PhD Student WESO research group Jose Emilio Labra Gayo WESO research group University of Oviedo, Spain
  • 2. Chilean Library of Congress In Spanish: BCN (Biblioteca del Congreso Nacional de Chile) Political powers ExecutiveJudiciaryLegislative Independent body inside the Legislative power Advices the parliament and gives services to citizens http://www.bcn.cl
  • 3. 2 projects at library of congress (BCN) History of the Law Parliamentary work
  • 4. History of the Law (LeyChile) Collect all documents generated during a law legislative process Phases: An initiative sees life as a draft bill Subject to debates Validity time (it is published) Modifications, additions,... Derogation Goal: Capture the spirit of the law Traceability https://www.bcn.cl/historiadelaley
  • 5. Parliamentary work Collect all legislative activity by each Member of Parliament Retrieve all interventions made Parliamentary motion Session journal Commission report Ordered and categorised https://www.bcn.cl/laborparlamentaria/
  • 6. Both projects adopted semantic technologies Some initial reasons: Semantic technologies considered one pillar of strategic plan (in 2014) Innovative action to generate new products Improve interoperability mechanisms Sem. Web aligned well with open & public data
  • 7. Which semantic technologies? Text mining and content enrichment Entity extraction Topic identification Automatic markup Classification Machine readable info XML & URIs RDF Ontologies Linked Open Data
  • 8. Workflow pipelines 3 main steps Automatic XML Marker RDF & Linked data generation Content delivery
  • 9. Linked Open Data Query DB Workflow overview National library Legislative documents • Paper (requires OCR) • Text documents Automatic XML marker SVN repository Akoma-Ntoso XML editor & tools Publishing (RDF extraction From Akoma-Ntoso) Services layer Content portals
  • 10. Automatic XML marker Source: Text Target: XML following Akoma-Ntoso
  • 11. Automatic XML marker Text Entity Type MediatorLegal Knowledge Base Entity Type URI Structural marker Internal XML representation Converter XML AKN Text Text Named Entity Recognizer 4 phases
  • 12. 1. Named Entity Recognizer Detection of entities & types of entities Web service implementing the Stanford NER with a CRF classifier Evaluation in production: detects 97% entities Type Some examples # of entities Person Salvador Allende, Sebastián Piñera 5.139 Organization Ministerio de Salud, SERNATUR 2.848 Location Valparaíso, Santiago de Chile 1.251 Document Ley 20.000, Diario de sesión nº 12 732.497 Role Senador, Diputado, Alcalde 428 Events Nacimiento de Eduardo Frei, Sesión Nº 23 14.389 Law Boletín 11536-04, Prohíbe fumar en espacios cerrados 12.737 Dates 27 de febrero de 2010, el próximo año, ... 20.632 Text Entity Type Text Named Entity Recognizer
  • 13. 2. Mediator Entity linking and disambiguation Text similarity algorithms Based on Apache Lucene In-house development - Use of context information to narrow list of candidates - Custom filters and association heuristics - Specialized web services Entity Type Mediator Legal Knowledge Base Entity Type URI Text Text
  • 14. 3. Structural marker Detect structures in the text Titles, subtitles, paragraphs, sections,... Special structure for debates: participation Regular expressions + custom rules Entity Type URI Structural marker Internal XML representation Text
  • 15. 4. XML converter to Akom-Ntoso Programmatic approach Internal XML representation similar to DOM Each node converted to text in AKN-XML Internal XML representation Converter XML AKN
  • 16. Human edition of AKN-Documents Quality assurance by human analysts They review the generated XML documents 2 editors: Ad-hoc XML editor Commercial editor: LegisPro (Xcential)
  • 17. Linked data generation The pilot project (2011) carefully defined a stable URI model URIs have been maintained since them URIs = IDs in the whole system URIs are dereferentiable Content negotiation Custom linked data browser Documentation (in Spanish) http://datos.bcn.cl/es/documentacion
  • 18. AKN2RDF RDF extraction from Akoma-Ntoso XML ● Custom-made converter (XSL discarded for perceived complexity) ● Each XML tag implemented in one Class ● Extracted data saved into multiple databases (Relational and RDF)
  • 19. Linked data generation Source: AKN XML documents Linked data browser (WESO-DESH) Target: RDF data http://datos.bcn.cl/recurso/cl/documento/579095/http://datos.bcn.cl/recurso/cl/documento/579095.xml
  • 20. SPARQL endpoint RDF triples are published as a public SPARQL endpoint Number of norms by municipality
  • 21. Content delivery Web portals using Open Source Technologies CMS (Typo3) Python/Java Varnish Apache Lucene REST Web service layers which connect to RDF triplestore and DB Data exports to PDF, Doc and XML formats URIs of parliamentary profiles = URIs in triplestore
  • 22. History of the Law portal https://www.bcn.cl/historiadelaley Links to Members of Parliament Each article has a link Different versions of a law
  • 23. History of the Law portal https://www.bcn.cl/historiadelaley Compare different versions
  • 25. Some experimental visualizations Relationships between laws Historical Parliament Parliamentary genealogy (family relationships) Regions mentioned in laws (legislative hackathon)
  • 29. Regions mentioned by law Result of a legislative hackathon http://datos.bcn.cl/global-legislative-hackathon-2016/Hackaton/www/html/master.html In 2010 there was an Earthquake in BioBio region
  • 30. Some statistics 24.368 documents (nov. 2018) Number of RDF triples: 28 millions According to Google analytics Average browsing time: 2min 26s Visits received 331,481 (nov. 2016-2017)  476,241 (nov. 2016-2017)
  • 31. And some findings... Question: why are there some valleys? Dictatorship time Session attendance by year RDF triples generated by year
  • 32. Some lessons learnt RDF granularity & inference trade-off RDF statements + inference (high running times...queries that didn't terminate) A priori inferred triples added to triple store (high response times for large docs) Small subset of RDF triples (structural parts of docs and metadata) Performance problems in XML editor browsing long docs (>1000pages) Low SPARQL endpoint usage by external apps If we could start again, I would recommend ShEx Personal note: These kind of data portals led to my interest in ShEx
  • 33. Conclusions & future projects Well designed URIs can act as a perfect glue for interoperability Automatic workflow pipelines help long-term survival of LD-based projects SPARQL endpoint since 2011 Future projects on top of existing ones National Budget as Linked data Diana Project: Members of Parliament linked to social network analysis New portal: User customization & recommender systems
  • 34. End of presentation Acknowledgements: David Vilches, Eridan Otto, Christian Sifaqui