SlideShare a Scribd company logo
1 of 24
1Fundación CTIC, Departamento de I+D+i
CTIC Foundation, R & D Department
University of Oviedo, Spain
Searching over Public
Administration Legal Documents
using Ontologies
Diego Berrueta Luis PoloJose E. Labra
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Motivation
General Aim
Features of public administration documents
Lots of documents are published everyday
Concrete domains: Law, administration, etc.
Poor effectiveness of syntactic approaches
Our approach:
We applied Semantic web tools and ontologies in this domain
A search tool with ontology based query expansion
General aim:
Bridge the gap between public administration and citizens
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Motivation
BOPA (1)
BOPA (Boletín Oficial del Principado de Asturias)
Official Journal of the Principality of Asturias, Spain
Asturias:
Northern region of Spain
Approx. 1,1 million people
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Motivation
BOPA (2)
The BOPA contains every announcement related
with Asturias public administration
Laws, resolutions, public auctions, etc.
General interest to every citizen
Specialized jargon
The previous system:
PDF and plain HTML
Simple syntactic Search tool
Chronological and by title
Some statistics:
14.314 articles in 23.100 pages (2005)
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Motivation
Some restrictions
We could not interfere with the established process
We were allowed to work only with the generated data
Adding tags to each document manually = not practical
Not enough personal and lots of documents
Error prone
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Ontology based search
Ontologies
We defined ontologies for our problem domain
A team of 3 domain experts was employed
Ontologies were developed in a modular way
One upper legal & administrative Ontology
Several domain ontologies
We defined micro-ontologies for several contexts
Micro-ontologies import the upper ontology
Examples: public-employment, subventions, etc.
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Ontology based search
Synsets
Each concept has a synonym set (synset) associated
In practice, we use 2 types of synsets:
Input synset: Terms used in common language
Allow to find a concept from a query term
Output synset: Legal and administrative terms
Allow to find a document that contains a concept
Example:
Concept: Holiday
Input synsets: “holiday”, “vacation”, “break”
Output synsets: “holiday” “vacation”
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Ontology Based Search
Search Process overview
Query: “holidays of janitors”
Ontology
Terms: holiday janitor
Match terms
with concepts
Results
BOPA
4. Apply enriched query
---------
------
-------
------
2. Spread
activation
Word Weight
janitor 1.0
holiday 1.0
vacation 1.0
staff 0.5
collective agreement 0.5
work day 0.5
legal contract 0.75
3. Obtain list of words
and weights
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Ontology Based Search
Step 1. Match terms to concepts
Query sentence is parsed to obtain terms
Remove non-content words like “the”, “but”, etc.
Remove suffixes of words
Terms are matched against input synsets of ontology
concepts
A context (micro-ontology) is determined for the query
If several concepts belong to different contexts, we take the
common one
In case of several choices, the user can select one
Search can be restricted to that context
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Ontology Based Search
Step 2. Spreading activation
We use the spreading activation algorithm
It returns a weighted list of related concepts
wij = weight of relation between nodes i and j
∆ = set of activation nodes
Initialized to initial query concepts
Implemented as a queue ordered by activation value
Θ = set of output concepts
Nmin = Minimum activation level
while ∆ ≠ ∅ and Nk > Nmin
extract a node nk from ∆
add nk to Θ
For each node ni such that wki > 0
Ni = Ni + wki Nk
Add ni to ∆
endWhile
return Θ Spreading activation algorithm (simplified)
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Ontology Based Search
Step 2. Spreading Activation
Example of Spread activation
holiday
collective
agreement work day
legal
contract
janitor
staff
0.5
0.5
0.5
0.5
0.5
0.5
1
1
collective
agreement work day
staff
0.5
0.5
0.5
legal
contract
0.75 Concept Weight
janitor 1.0
holiday 1.0
staff 0.5
collective agreement 0.5
work day 0.5
legal contract 0.75
Output concepts
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Ontology Based Search
Step 3. Obtain words from concepts
For each concept, we obtain the representative words
The words are obtained from the output synsets
Concept Weight
janitor 1.0
holiday 1.0
staff 0.5
collective agreement 0.5
work day 0.5
legal contract 0.75
Output concepts Word list
Word Weight
janitor 1.0
holiday 1.0
vacation 1.0
staff 0.5
collective agreement 0.5
work day 0.5
legal contract 0.75
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Ontology Based Search
Step 4. Apply syntactic search
Syntactic Search
We used Apache Lucene search engine
Search expression:
Search results are ordered according to relevance level
“janitor”^1 “holiday”^1 “vacation”^1 “staff”^.5
(“collective agreement”~1)^.5 (“work day”~1)^.5
(“legal contract”~1)^.75
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Other Services
Syntactic search
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Other Services
Semantic search
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Other Services
Alert subscription service
Query subscription
Users can subscribe to a query
A notification can be send when new results for that query
appear
Email and SMS alerts can be send to registered users
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Other Services
Concept Browser
User interface developed in SVG
Navigational representation of the Ontology concepts
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Other Services
Simple Explanation module
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
System Development
Architecture and Methodology
J2EE application
Data access tier
XML Native Database: XIndice
Relational database
Syntactical indexes managed by Lucene
Original web server where BOPA is still published
Business tier
Several vertical subsystems (low coupling)
Search engines, syntactical analysis, ontology processing, etc.
Functionality exported through web services
Main User interface developed with the Struts framework
Development methodology
We used some agile development techniques
Pair programming
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
System Development
Recovering information from HTML
A web crawler fetches HTML pages
The pages are not valid HTML
JTidy transforms bad formed HTML to well formed
XHTML
A chain of XSLT transformations allows to extract
information in XML format
We use a custom XML vocabulary
Static HTML data is completed with database data
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
System Development
Application integration
Interoperability has been a design goal
All the functionality can be accessed by web services
5 clients have been implemented
Standard Web application front end
VoiceXML interface
Desktop .Net client
Desktop Python client
Java client using SWT libraries for Eclipse
We export a RSS channel which is updated daily with
the contents of new bulletins
Bulletin contents are exported in HTML, PDF and RDF
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
System Development
Evaluation of search results
At this moment, we are logging user activity and we
collect different statistics
How many times the first result is selected, etc....
Developed framework for testing quality of results
Difficulties to define good search results
We apply it mainly for regression tests
We are planning to do medium scale tests with real
users in short term
Motivation Ontology based search Other services System development Conclusions
CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain
Conclusions
Conclusions and Future work
Practical application using a knowledge based approach
The prototype has been adopted by the Public Administration
department
It will be part of a production environment
Future work
Extend the search to the whole public administration site
Practical evaluation of the approach
Usability, search results, etc.
Extend Context-Theory
Allow more flexibility in context definition
Relationship between contexts and spread-activated nodes
Allow users feedback (tags)
Ontology development and reusability
Incorporate other ontologies (DOLCE, SUMO, etc.)
Develop a general legal and administrative ontology
Digitization project of past documents
BOPA started to be published before 1900
Motivation Ontology based search Other services System development
24Fundación CTIC, Departamento de I+D+i
The end

More Related Content

Similar to Searching over Public Administration Legal Documents using Ontologies

The Brazilian Innovation Portal:a virtual space for collaboration between ind...
The Brazilian Innovation Portal:a virtual space for collaboration between ind...The Brazilian Innovation Portal:a virtual space for collaboration between ind...
The Brazilian Innovation Portal:a virtual space for collaboration between ind...Roberto C. S. Pacheco
 
Tenix Engineering Conference 06 V3
Tenix Engineering Conference 06 V3Tenix Engineering Conference 06 V3
Tenix Engineering Conference 06 V3futureshocked
 
Tenix Engineering Conference 06 Web Version
Tenix Engineering Conference 06 Web VersionTenix Engineering Conference 06 Web Version
Tenix Engineering Conference 06 Web Versionfutureshocked
 
Building and Communicating Evidence of Effectiveness in OER through Collectiv...
Building and Communicating Evidence of Effectiveness in OER through Collectiv...Building and Communicating Evidence of Effectiveness in OER through Collectiv...
Building and Communicating Evidence of Effectiveness in OER through Collectiv...Robert Farrow
 
What Does Openness Mean to the Web Manager?
What Does Openness Mean to the Web Manager?What Does Openness Mean to the Web Manager?
What Does Openness Mean to the Web Manager?lisbk
 
Learning analytics inside government
Learning analytics inside governmentLearning analytics inside government
Learning analytics inside governmentmwebbjisc
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovationopen_phacts
 
How Semantic Technology Helps Researchers
How Semantic Technology Helps ResearchersHow Semantic Technology Helps Researchers
How Semantic Technology Helps ResearchersDarrell W. Gunter
 
Author's workflow and the role of open access
Author's workflow and the role of open accessAuthor's workflow and the role of open access
Author's workflow and the role of open accessPaola Gargiulo
 
Open Source Basics
Open Source BasicsOpen Source Basics
Open Source BasicsRoss Gardler
 
Open Government Data (approaches, concerns and barriers, lessons learned)
Open Government Data (approaches, concerns and barriers, lessons learned)Open Government Data (approaches, concerns and barriers, lessons learned)
Open Government Data (approaches, concerns and barriers, lessons learned)Open Data @ CTIC
 
COMPUTER BASICS
 COMPUTER BASICS COMPUTER BASICS
COMPUTER BASICSRajat More
 
DC10 Walter Ganz - keynote - The challenge of testing innovative services
DC10 Walter Ganz - keynote - The challenge of testing innovative servicesDC10 Walter Ganz - keynote - The challenge of testing innovative services
DC10 Walter Ganz - keynote - The challenge of testing innovative servicesJaak Vlasveld
 
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...EOSC-hub project
 
Toward CERIF-ScienTI Cooperation and Interoperability
Toward CERIF-ScienTI Cooperation and InteroperabilityToward CERIF-ScienTI Cooperation and Interoperability
Toward CERIF-ScienTI Cooperation and InteroperabilityRoberto C. S. Pacheco
 
THOR Ambassador Webinar
THOR Ambassador WebinarTHOR Ambassador Webinar
THOR Ambassador WebinarMaaike Duine
 

Similar to Searching over Public Administration Legal Documents using Ontologies (20)

The Brazilian Innovation Portal:a virtual space for collaboration between ind...
The Brazilian Innovation Portal:a virtual space for collaboration between ind...The Brazilian Innovation Portal:a virtual space for collaboration between ind...
The Brazilian Innovation Portal:a virtual space for collaboration between ind...
 
Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
 
Tenix Engineering Conference 06 V3
Tenix Engineering Conference 06 V3Tenix Engineering Conference 06 V3
Tenix Engineering Conference 06 V3
 
Tenix Engineering Conference 06 Web Version
Tenix Engineering Conference 06 Web VersionTenix Engineering Conference 06 Web Version
Tenix Engineering Conference 06 Web Version
 
Building and Communicating Evidence of Effectiveness in OER through Collectiv...
Building and Communicating Evidence of Effectiveness in OER through Collectiv...Building and Communicating Evidence of Effectiveness in OER through Collectiv...
Building and Communicating Evidence of Effectiveness in OER through Collectiv...
 
What Does Openness Mean to the Web Manager?
What Does Openness Mean to the Web Manager?What Does Openness Mean to the Web Manager?
What Does Openness Mean to the Web Manager?
 
Learning analytics inside government
Learning analytics inside governmentLearning analytics inside government
Learning analytics inside government
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 
How Semantic Technology Helps Researchers
How Semantic Technology Helps ResearchersHow Semantic Technology Helps Researchers
How Semantic Technology Helps Researchers
 
Ethical solutions services
Ethical solutions servicesEthical solutions services
Ethical solutions services
 
Author's workflow and the role of open access
Author's workflow and the role of open accessAuthor's workflow and the role of open access
Author's workflow and the role of open access
 
Open Source Basics
Open Source BasicsOpen Source Basics
Open Source Basics
 
Lio Presentation 12 08
Lio Presentation 12 08Lio Presentation 12 08
Lio Presentation 12 08
 
Open Government Data (approaches, concerns and barriers, lessons learned)
Open Government Data (approaches, concerns and barriers, lessons learned)Open Government Data (approaches, concerns and barriers, lessons learned)
Open Government Data (approaches, concerns and barriers, lessons learned)
 
COMPUTER BASICS
 COMPUTER BASICS COMPUTER BASICS
COMPUTER BASICS
 
DC10 Walter Ganz - keynote - The challenge of testing innovative services
DC10 Walter Ganz - keynote - The challenge of testing innovative servicesDC10 Walter Ganz - keynote - The challenge of testing innovative services
DC10 Walter Ganz - keynote - The challenge of testing innovative services
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
 
Toward CERIF-ScienTI Cooperation and Interoperability
Toward CERIF-ScienTI Cooperation and InteroperabilityToward CERIF-ScienTI Cooperation and Interoperability
Toward CERIF-ScienTI Cooperation and Interoperability
 
THOR Ambassador Webinar
THOR Ambassador WebinarTHOR Ambassador Webinar
THOR Ambassador Webinar
 

More from Jose Emilio Labra Gayo

Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctoradoJose Emilio Labra Gayo
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapesJose Emilio Labra Gayo
 
Legislative data portals and linked data quality
Legislative data portals and linked data qualityLegislative data portals and linked data quality
Legislative data portals and linked data qualityJose Emilio Labra Gayo
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesJose Emilio Labra Gayo
 
Legislative document content extraction based on Semantic Web technologies
Legislative document content extraction based on Semantic Web technologiesLegislative document content extraction based on Semantic Web technologies
Legislative document content extraction based on Semantic Web technologiesJose Emilio Labra Gayo
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosJose Emilio Labra Gayo
 
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorJose Emilio Labra Gayo
 

More from Jose Emilio Labra Gayo (20)

Publicaciones de investigación
Publicaciones de investigaciónPublicaciones de investigación
Publicaciones de investigación
 
Introducción a la investigación/doctorado
Introducción a la investigación/doctoradoIntroducción a la investigación/doctorado
Introducción a la investigación/doctorado
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
 
Legislative data portals and linked data quality
Legislative data portals and linked data qualityLegislative data portals and linked data quality
Legislative data portals and linked data quality
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Wikidata
WikidataWikidata
Wikidata
 
Legislative document content extraction based on Semantic Web technologies
Legislative document content extraction based on Semantic Web technologiesLegislative document content extraction based on Semantic Web technologies
Legislative document content extraction based on Semantic Web technologies
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introducción a la Web Semántica
Introducción a la Web SemánticaIntroducción a la Web Semántica
Introducción a la Web Semántica
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
2017 Tendencias en informática
2017 Tendencias en informática2017 Tendencias en informática
2017 Tendencias en informática
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
19 javascript servidor
19 javascript servidor19 javascript servidor
19 javascript servidor
 
Como publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazadosComo publicar datos: hacia los datos abiertos enlazados
Como publicar datos: hacia los datos abiertos enlazados
 
16 Alternativas XML
16 Alternativas XML16 Alternativas XML
16 Alternativas XML
 
XSLT
XSLTXSLT
XSLT
 
XPath
XPathXPath
XPath
 
Arquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el ServidorArquitectura de la Web y Computación en el Servidor
Arquitectura de la Web y Computación en el Servidor
 

Recently uploaded

General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Recently uploaded (20)

General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Searching over Public Administration Legal Documents using Ontologies

  • 1. 1Fundación CTIC, Departamento de I+D+i CTIC Foundation, R & D Department University of Oviedo, Spain Searching over Public Administration Legal Documents using Ontologies Diego Berrueta Luis PoloJose E. Labra
  • 2. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Motivation General Aim Features of public administration documents Lots of documents are published everyday Concrete domains: Law, administration, etc. Poor effectiveness of syntactic approaches Our approach: We applied Semantic web tools and ontologies in this domain A search tool with ontology based query expansion General aim: Bridge the gap between public administration and citizens Motivation Ontology based search Other services System development Conclusions
  • 3. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Motivation BOPA (1) BOPA (Boletín Oficial del Principado de Asturias) Official Journal of the Principality of Asturias, Spain Asturias: Northern region of Spain Approx. 1,1 million people Motivation Ontology based search Other services System development Conclusions
  • 4. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Motivation BOPA (2) The BOPA contains every announcement related with Asturias public administration Laws, resolutions, public auctions, etc. General interest to every citizen Specialized jargon The previous system: PDF and plain HTML Simple syntactic Search tool Chronological and by title Some statistics: 14.314 articles in 23.100 pages (2005) Motivation Ontology based search Other services System development Conclusions
  • 5. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Motivation Some restrictions We could not interfere with the established process We were allowed to work only with the generated data Adding tags to each document manually = not practical Not enough personal and lots of documents Error prone Motivation Ontology based search Other services System development Conclusions
  • 6. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Ontology based search Ontologies We defined ontologies for our problem domain A team of 3 domain experts was employed Ontologies were developed in a modular way One upper legal & administrative Ontology Several domain ontologies We defined micro-ontologies for several contexts Micro-ontologies import the upper ontology Examples: public-employment, subventions, etc. Motivation Ontology based search Other services System development Conclusions
  • 7. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Ontology based search Synsets Each concept has a synonym set (synset) associated In practice, we use 2 types of synsets: Input synset: Terms used in common language Allow to find a concept from a query term Output synset: Legal and administrative terms Allow to find a document that contains a concept Example: Concept: Holiday Input synsets: “holiday”, “vacation”, “break” Output synsets: “holiday” “vacation” Motivation Ontology based search Other services System development Conclusions
  • 8. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Ontology Based Search Search Process overview Query: “holidays of janitors” Ontology Terms: holiday janitor Match terms with concepts Results BOPA 4. Apply enriched query --------- ------ ------- ------ 2. Spread activation Word Weight janitor 1.0 holiday 1.0 vacation 1.0 staff 0.5 collective agreement 0.5 work day 0.5 legal contract 0.75 3. Obtain list of words and weights Motivation Ontology based search Other services System development Conclusions
  • 9. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Ontology Based Search Step 1. Match terms to concepts Query sentence is parsed to obtain terms Remove non-content words like “the”, “but”, etc. Remove suffixes of words Terms are matched against input synsets of ontology concepts A context (micro-ontology) is determined for the query If several concepts belong to different contexts, we take the common one In case of several choices, the user can select one Search can be restricted to that context Motivation Ontology based search Other services System development Conclusions
  • 10. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Ontology Based Search Step 2. Spreading activation We use the spreading activation algorithm It returns a weighted list of related concepts wij = weight of relation between nodes i and j ∆ = set of activation nodes Initialized to initial query concepts Implemented as a queue ordered by activation value Θ = set of output concepts Nmin = Minimum activation level while ∆ ≠ ∅ and Nk > Nmin extract a node nk from ∆ add nk to Θ For each node ni such that wki > 0 Ni = Ni + wki Nk Add ni to ∆ endWhile return Θ Spreading activation algorithm (simplified) Motivation Ontology based search Other services System development Conclusions
  • 11. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Ontology Based Search Step 2. Spreading Activation Example of Spread activation holiday collective agreement work day legal contract janitor staff 0.5 0.5 0.5 0.5 0.5 0.5 1 1 collective agreement work day staff 0.5 0.5 0.5 legal contract 0.75 Concept Weight janitor 1.0 holiday 1.0 staff 0.5 collective agreement 0.5 work day 0.5 legal contract 0.75 Output concepts Motivation Ontology based search Other services System development Conclusions
  • 12. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Ontology Based Search Step 3. Obtain words from concepts For each concept, we obtain the representative words The words are obtained from the output synsets Concept Weight janitor 1.0 holiday 1.0 staff 0.5 collective agreement 0.5 work day 0.5 legal contract 0.75 Output concepts Word list Word Weight janitor 1.0 holiday 1.0 vacation 1.0 staff 0.5 collective agreement 0.5 work day 0.5 legal contract 0.75 Motivation Ontology based search Other services System development Conclusions
  • 13. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Ontology Based Search Step 4. Apply syntactic search Syntactic Search We used Apache Lucene search engine Search expression: Search results are ordered according to relevance level “janitor”^1 “holiday”^1 “vacation”^1 “staff”^.5 (“collective agreement”~1)^.5 (“work day”~1)^.5 (“legal contract”~1)^.75 Motivation Ontology based search Other services System development Conclusions
  • 14. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Other Services Syntactic search Motivation Ontology based search Other services System development Conclusions
  • 15. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Other Services Semantic search Motivation Ontology based search Other services System development Conclusions
  • 16. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Other Services Alert subscription service Query subscription Users can subscribe to a query A notification can be send when new results for that query appear Email and SMS alerts can be send to registered users Motivation Ontology based search Other services System development Conclusions
  • 17. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Other Services Concept Browser User interface developed in SVG Navigational representation of the Ontology concepts Motivation Ontology based search Other services System development Conclusions
  • 18. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Other Services Simple Explanation module Motivation Ontology based search Other services System development Conclusions
  • 19. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain System Development Architecture and Methodology J2EE application Data access tier XML Native Database: XIndice Relational database Syntactical indexes managed by Lucene Original web server where BOPA is still published Business tier Several vertical subsystems (low coupling) Search engines, syntactical analysis, ontology processing, etc. Functionality exported through web services Main User interface developed with the Struts framework Development methodology We used some agile development techniques Pair programming Motivation Ontology based search Other services System development Conclusions
  • 20. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain System Development Recovering information from HTML A web crawler fetches HTML pages The pages are not valid HTML JTidy transforms bad formed HTML to well formed XHTML A chain of XSLT transformations allows to extract information in XML format We use a custom XML vocabulary Static HTML data is completed with database data Motivation Ontology based search Other services System development Conclusions
  • 21. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain System Development Application integration Interoperability has been a design goal All the functionality can be accessed by web services 5 clients have been implemented Standard Web application front end VoiceXML interface Desktop .Net client Desktop Python client Java client using SWT libraries for Eclipse We export a RSS channel which is updated daily with the contents of new bulletins Bulletin contents are exported in HTML, PDF and RDF Motivation Ontology based search Other services System development Conclusions
  • 22. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain System Development Evaluation of search results At this moment, we are logging user activity and we collect different statistics How many times the first result is selected, etc.... Developed framework for testing quality of results Difficulties to define good search results We apply it mainly for regression tests We are planning to do medium scale tests with real users in short term Motivation Ontology based search Other services System development Conclusions
  • 23. CTIC FoundationCTIC Foundation University of Oviedo, SpainUniversity of Oviedo, Spain Conclusions Conclusions and Future work Practical application using a knowledge based approach The prototype has been adopted by the Public Administration department It will be part of a production environment Future work Extend the search to the whole public administration site Practical evaluation of the approach Usability, search results, etc. Extend Context-Theory Allow more flexibility in context definition Relationship between contexts and spread-activated nodes Allow users feedback (tags) Ontology development and reusability Incorporate other ontologies (DOLCE, SUMO, etc.) Develop a general legal and administrative ontology Digitization project of past documents BOPA started to be published before 1900 Motivation Ontology based search Other services System development

Editor's Notes

  1. La ontología está diseñada para permitir su escalabilidad, reusabilidad, mantenimiento y un correcto funcionamiento de las técnicas de activación de conceptos. ontología administrativa: ontología de dominio: “comunidad de propietarios”. El diseño es modular. La ontología de dominio está construida a partir de varias microontologías. Animación 2 asociado conocimiento lingüístico a la ontología. Construimos un vocabulario controlado de forma que podamos asociar a nuestra ontología un conjunto de sinóminos. Parecido a wordnet pero con una semántica propia.