BUILDING THE LEGAL KNOWLEDGE GRAPH
FOR SMART COMPLIANCE SERVICES IN
MULTILINGUAL EUROPE
http://lynx-project.eu/
Lynx - Compliance made easy
Legal Knowledge Graph for Multilingual Compliance Services
Webinar: Lynx Services Platform (LySP) - Part 2: The Services
18/02/2021, 10.30am-11.30am CET
Agenda
• Introduction & the Lynx project - 5’
Martin Kaltenböck (Co-Founder and CFO of Semantic Web Company, SWC)
• Lynx Services Platform: The Services - Introduction - 5’
Artem Revenko (Director of Research & Innovation, Semantic Web Company)
• Lynx Services Platform: The Services in Detail - 40’
María Navas-Loro (Ontology Engineering Group, Artificial Intelligence Department, UPM), Julian Moreno
(Researcher at DFKI), Ruben Martinez (Manager Customer Service, Tilde), Pablo Calleja (Ontology Engineering
Group, Artificial Intelligence Department, UPM), Ilan Kernerman (CEO at Lexicala by K Dictionaries), Christian
Sageder (CEO at Cybly).
• Questions & Answers - 10’
The Lynx project
ICT14-2016-2017 (IA) Innovation action
Pillar: Industrial Leadership
Work Programme Year: H2020-2016-2017
Work Programme Part: Information and Communication Technologies
TOPIC : Big Data PPP: cross-sectorial and cross-lingual data integration and
experimentation
Duration: 40 months
Start date: 1st December 2017
Estimated Project Cost: €3,638,065.00
Requested EU Contribution: €2,959,247.52
Project Officer: Johan BODENKAMP/Pierre-Paul SONDAG
Our Aim
Our Mission
Smart services
to better manage
compliance
LKG of
European legal
and regulatory
open data
Multilingual and
multi-jurisdictional
data
Lynx Services
16 Services:
• 7 Enrichment
• 5 Annotation
• 2 Conversion
• 4 Search and Information Retrieval
• 2 Vocabulary
• 3 Platform
https://lynx-project.eu/doc/api/
Annotation Services
Temporal Expression Service (1)
Finds the following types of expressions:
• DATE: April, 23/05, in 1998.
• TIME: At 2 o’clock, 5pm.
• SET: every Thursday, twice a month.
• DURATION: two days and a half, three years.
• INTERVALS (ongoing): From 3rd of April to 6th May.
Temporal Expression Service (2)
Once the previous expressions are found, they are normalized.
(...) In 19981 it increased exponentially; that summer2 (...)
(1) → 1998
(2) → summer of 1998 (1998-SU)
Temporal Expression Service (3)
Languages covered:
● Legal focused ruled-based approaches:
○ Spanish
○ English
○ German
● Standard external tool for:
○ Italian
○ Dutch
For more information, please check:
https://www.youtube.com/watch?v=6-CwPal2ArE
Named Entity Recognition Service
• Four model families:
• General Domain:
• Statistical language models (EN, DE)
• BERT based Neural Networks (EN, DE, ES)
• Legal Domain (DE):
• Conditional Random Fields (CRF)
• Bilateral Long Short Term Memory Neural Networks (BiLSTM)
• Corpus: German court decisions
• 67,000 sentences and 54,000 entities
• 7 coarse-grained classes and 19 fine-grained classes
German Named Entity Corpus
Geolocation Service
● Three approaches:
• Statistical language model
• Trained with a specific German and English corpus
• 17 fine-grained classes
• Dictionary based approach
• Spanish dictionary of companies
• Rule-based approach
• Regular expressions for recognition of addresses
English and German Geo. Corpus
Rule-based approach
Entity Linking
Link a target (“Jaguar”) in a context to
the correct entity in a knowledge base.
Assumption: All senses of the target are
present in the knowledge base.
Usually suitable for large knowledge
bases, for example DBpedia, WordNet.
Relax assumption -> decide if a target
should be linked to some entity in
knowledge base.
Suitable for smaller enterprise
knowledge graphs.
Conversion Services
Machine Translation Service
Language challenges in the digital environment
Machine Translation - Benefits
1 Internal & External multilingual communication
Improve the organizations communication culture, starting from your internal team to
speaking the language of the customer
2 Increase translation productivity by 35%
Provide immediate human-like translations, facilitate processes of large
volume text translation
3 Enter new markets
Scale your business, move content quickly and enter new markets as fast as
possible while reducing the time and capital spent on projects
Machine Translation - in Lynx
- External service: use directly from most up-to-date cloud
platform with Neural MT technology & terminology capabilities.
Regular technological updates
- Source Document, Text and Annotation translation
- Use case specific - contracts, labor law, renewable energy
(trained on Lynx partners documents and identified sources)
Extractive Summarization Service
● Selection of relevant sentences
• TF-IDF
• Encode documents and calculate weights
for sentences using TF-IDF
• Centroids and composability of word
embeddings
• Extract keywords and concepts
• Composing embeddings
• Created centroid (document‘s)
• Project sentence in embedding space
• Relevance scores (distance to centroid)
Abstractive Summarization Service
● Based on Neural Networks
and Transformer encoders
Search and Information Retrieval
Cross-Lingual Search (1)
• Full text search in multi lingual corporas
• APIs for
• Add / Delete Lynx Documents to the search index, a Lynx Document
Part is its on document in the index
• Search documents / parts
• Possibility of complex search queries
• AND, OR, NOT, MUST, NEAR, (), Phrases,
• Filters for metadata
• Search term will be translated to the language of the corpora
based on the targeted jurisdiction
Cross-Lingual Search (2)
Example: Maternity leave Spain AND Austria
detect
language
detect
jurisdiction(s)
to query +
language
create a
AST
(abstract
syntax tree)
translate
and
expand
query
query Index
annotation
query
● GEO
● NER
● EL
● Translation
● Dictionary
● Terminology
English Austria, Spain ● Austria, Spain, EU
● German, Spanish,
English
● permiso por
maternidad
● Karenzzeit
● Maternity leave
(licencia de maternidad AND metadata.jurisdiction:ES) OR
(Karenzzeit AND metadata.jurisdiction:AT) OR
(maternity leave AND metadata.jurisdiction:EU) OR
Search and Information Retrieval (1)
http://lkg.lynx-project.eu/
• Web Portal & RESTful API
• Relies on an Elasticsearch DCM
• Manages parts of documents as
independent documents
Search and Information Retrieval (2)
Parameters of search query
• words/terms
• collection
• jurisdiction
• language
• part of another document
• rows
• ...
Evaluation
• Gold standard created by CuatreCasas
• Spanish worker’s statute document
• 152 questions (en/es) with answers
(sections)
• Achieved >85% of accuracy
• Experimentation with:
• stems, synonyms, term extraction
Vocabulary Services
Dictionary Services
Domain-independent lexical data
• formats: XML, JSON, JSON-LD
• endpoints: SPARQL, Lexicala API
• languages:
Dutch | English | German | Spanish
Dictionary Services: Entry Components
• headwords and expressions, inflections and variants
• phonetic transcription (IPA) and alternative script
• part of speech, grammatical gender and number
• subcategorization and valency
• definitions, sense indication and disambiguation
• examples of usage
• synonyms, antonyms, domains, context
• range of application, register, sentiment, geo usage
• translations
Dictionary Services: Sample Entry
Dictionary Services: RDF Pipeline (1)
• data modelled with OntoLex, adhering to lexicog module
• XML → JSON → JSON LD conversion pipeline
• incremental approach
• mapping XML paths to corresponding Linked Data
element
• URI naming strategy established
• implementation of the model
• validation
Dictionary Services: RDF Pipeline (2)
Dictionary Services: Sample Query
• A response to querying all lexical
senses linked to the RDF entry
:LexiconEN/bow-n, gathering the
information originating from the
different homographs as well as from
other resources in which bow is given
as a translation.
• The query currently results in 56
possible senses in different
languages of bow as an English noun
across the Global series.
Terminology Service
● Corpus based terminologies per pilot:
Labour Law, Contracts, Industrial Standards
● Multilingual, disambiguated knowledge
retrieved from the LLOD
● Languages covered:
○ Dutch
○ English
○ German
○ Spanish
Avaliable at: http://lkg.lynx-project.eu/kos
Lynx Webinar Series
• Webinar 1: Lynx overall introduction
When: 10.12.2020, 10.30am CET (1 hour)
Recording: https://youtube.com/playlist?list=PLxa__IZYjIaiGbl3a-PyK3DqNhhMdnnHv
• Webinar 2: 3 Business Cases on top of the Lynx Legal Knowledge Graph
When: 14.1.2021, 10.30am CET
Recording: https://youtube.com/playlist?list=PLxa__IZYjIaiDL2O22ureD_nLmtgRq9LB
• Webinar 3: The Lynx Services Platform (LySP) - Part 1: Overview
When: 11/02/2021, 11.30am CET
Recording: https://youtube.com/playlist?list=PLxa__IZYjIahhiSXoJbVyxv_iAliExH5e
• Webinar 4: The Lynx Services Platform (LySP) - Part 2: The Services
When: 18/02/2021, 10.30am CET
Recording: https://youtube.com/playlist?list=PLxa__IZYjIaiv5MeV7uZsujv-MOi6SE-a
CONTACTS
CONSORTIUM
Please raise your
questions now….
http://lynx-project.eu/

Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services

  • 1.
    BUILDING THE LEGALKNOWLEDGE GRAPH FOR SMART COMPLIANCE SERVICES IN MULTILINGUAL EUROPE http://lynx-project.eu/ Lynx - Compliance made easy Legal Knowledge Graph for Multilingual Compliance Services Webinar: Lynx Services Platform (LySP) - Part 2: The Services 18/02/2021, 10.30am-11.30am CET
  • 2.
    Agenda • Introduction &the Lynx project - 5’ Martin Kaltenböck (Co-Founder and CFO of Semantic Web Company, SWC) • Lynx Services Platform: The Services - Introduction - 5’ Artem Revenko (Director of Research & Innovation, Semantic Web Company) • Lynx Services Platform: The Services in Detail - 40’ María Navas-Loro (Ontology Engineering Group, Artificial Intelligence Department, UPM), Julian Moreno (Researcher at DFKI), Ruben Martinez (Manager Customer Service, Tilde), Pablo Calleja (Ontology Engineering Group, Artificial Intelligence Department, UPM), Ilan Kernerman (CEO at Lexicala by K Dictionaries), Christian Sageder (CEO at Cybly). • Questions & Answers - 10’
  • 3.
    The Lynx project ICT14-2016-2017(IA) Innovation action Pillar: Industrial Leadership Work Programme Year: H2020-2016-2017 Work Programme Part: Information and Communication Technologies TOPIC : Big Data PPP: cross-sectorial and cross-lingual data integration and experimentation Duration: 40 months Start date: 1st December 2017 Estimated Project Cost: €3,638,065.00 Requested EU Contribution: €2,959,247.52 Project Officer: Johan BODENKAMP/Pierre-Paul SONDAG
  • 5.
  • 6.
    Our Mission Smart services tobetter manage compliance LKG of European legal and regulatory open data Multilingual and multi-jurisdictional data
  • 7.
    Lynx Services 16 Services: •7 Enrichment • 5 Annotation • 2 Conversion • 4 Search and Information Retrieval • 2 Vocabulary • 3 Platform https://lynx-project.eu/doc/api/
  • 8.
  • 9.
    Temporal Expression Service(1) Finds the following types of expressions: • DATE: April, 23/05, in 1998. • TIME: At 2 o’clock, 5pm. • SET: every Thursday, twice a month. • DURATION: two days and a half, three years. • INTERVALS (ongoing): From 3rd of April to 6th May.
  • 10.
    Temporal Expression Service(2) Once the previous expressions are found, they are normalized. (...) In 19981 it increased exponentially; that summer2 (...) (1) → 1998 (2) → summer of 1998 (1998-SU)
  • 11.
    Temporal Expression Service(3) Languages covered: ● Legal focused ruled-based approaches: ○ Spanish ○ English ○ German ● Standard external tool for: ○ Italian ○ Dutch For more information, please check: https://www.youtube.com/watch?v=6-CwPal2ArE
  • 12.
    Named Entity RecognitionService • Four model families: • General Domain: • Statistical language models (EN, DE) • BERT based Neural Networks (EN, DE, ES) • Legal Domain (DE): • Conditional Random Fields (CRF) • Bilateral Long Short Term Memory Neural Networks (BiLSTM) • Corpus: German court decisions • 67,000 sentences and 54,000 entities • 7 coarse-grained classes and 19 fine-grained classes
  • 13.
  • 14.
    Geolocation Service ● Threeapproaches: • Statistical language model • Trained with a specific German and English corpus • 17 fine-grained classes • Dictionary based approach • Spanish dictionary of companies • Rule-based approach • Regular expressions for recognition of addresses
  • 15.
    English and GermanGeo. Corpus
  • 16.
  • 17.
    Entity Linking Link atarget (“Jaguar”) in a context to the correct entity in a knowledge base. Assumption: All senses of the target are present in the knowledge base. Usually suitable for large knowledge bases, for example DBpedia, WordNet. Relax assumption -> decide if a target should be linked to some entity in knowledge base. Suitable for smaller enterprise knowledge graphs.
  • 18.
  • 19.
    Machine Translation Service Languagechallenges in the digital environment
  • 20.
    Machine Translation -Benefits 1 Internal & External multilingual communication Improve the organizations communication culture, starting from your internal team to speaking the language of the customer 2 Increase translation productivity by 35% Provide immediate human-like translations, facilitate processes of large volume text translation 3 Enter new markets Scale your business, move content quickly and enter new markets as fast as possible while reducing the time and capital spent on projects
  • 21.
    Machine Translation -in Lynx - External service: use directly from most up-to-date cloud platform with Neural MT technology & terminology capabilities. Regular technological updates - Source Document, Text and Annotation translation - Use case specific - contracts, labor law, renewable energy (trained on Lynx partners documents and identified sources)
  • 22.
    Extractive Summarization Service ●Selection of relevant sentences • TF-IDF • Encode documents and calculate weights for sentences using TF-IDF • Centroids and composability of word embeddings • Extract keywords and concepts • Composing embeddings • Created centroid (document‘s) • Project sentence in embedding space • Relevance scores (distance to centroid)
  • 23.
    Abstractive Summarization Service ●Based on Neural Networks and Transformer encoders
  • 24.
  • 25.
    Cross-Lingual Search (1) •Full text search in multi lingual corporas • APIs for • Add / Delete Lynx Documents to the search index, a Lynx Document Part is its on document in the index • Search documents / parts • Possibility of complex search queries • AND, OR, NOT, MUST, NEAR, (), Phrases, • Filters for metadata • Search term will be translated to the language of the corpora based on the targeted jurisdiction
  • 26.
    Cross-Lingual Search (2) Example:Maternity leave Spain AND Austria detect language detect jurisdiction(s) to query + language create a AST (abstract syntax tree) translate and expand query query Index annotation query ● GEO ● NER ● EL ● Translation ● Dictionary ● Terminology English Austria, Spain ● Austria, Spain, EU ● German, Spanish, English ● permiso por maternidad ● Karenzzeit ● Maternity leave (licencia de maternidad AND metadata.jurisdiction:ES) OR (Karenzzeit AND metadata.jurisdiction:AT) OR (maternity leave AND metadata.jurisdiction:EU) OR
  • 27.
    Search and InformationRetrieval (1) http://lkg.lynx-project.eu/ • Web Portal & RESTful API • Relies on an Elasticsearch DCM • Manages parts of documents as independent documents
  • 28.
    Search and InformationRetrieval (2) Parameters of search query • words/terms • collection • jurisdiction • language • part of another document • rows • ... Evaluation • Gold standard created by CuatreCasas • Spanish worker’s statute document • 152 questions (en/es) with answers (sections) • Achieved >85% of accuracy • Experimentation with: • stems, synonyms, term extraction
  • 29.
  • 30.
    Dictionary Services Domain-independent lexicaldata • formats: XML, JSON, JSON-LD • endpoints: SPARQL, Lexicala API • languages: Dutch | English | German | Spanish
  • 31.
    Dictionary Services: EntryComponents • headwords and expressions, inflections and variants • phonetic transcription (IPA) and alternative script • part of speech, grammatical gender and number • subcategorization and valency • definitions, sense indication and disambiguation • examples of usage • synonyms, antonyms, domains, context • range of application, register, sentiment, geo usage • translations
  • 32.
  • 33.
    Dictionary Services: RDFPipeline (1) • data modelled with OntoLex, adhering to lexicog module • XML → JSON → JSON LD conversion pipeline • incremental approach • mapping XML paths to corresponding Linked Data element • URI naming strategy established • implementation of the model • validation
  • 34.
  • 35.
    Dictionary Services: SampleQuery • A response to querying all lexical senses linked to the RDF entry :LexiconEN/bow-n, gathering the information originating from the different homographs as well as from other resources in which bow is given as a translation. • The query currently results in 56 possible senses in different languages of bow as an English noun across the Global series.
  • 36.
    Terminology Service ● Corpusbased terminologies per pilot: Labour Law, Contracts, Industrial Standards ● Multilingual, disambiguated knowledge retrieved from the LLOD ● Languages covered: ○ Dutch ○ English ○ German ○ Spanish Avaliable at: http://lkg.lynx-project.eu/kos
  • 37.
    Lynx Webinar Series •Webinar 1: Lynx overall introduction When: 10.12.2020, 10.30am CET (1 hour) Recording: https://youtube.com/playlist?list=PLxa__IZYjIaiGbl3a-PyK3DqNhhMdnnHv • Webinar 2: 3 Business Cases on top of the Lynx Legal Knowledge Graph When: 14.1.2021, 10.30am CET Recording: https://youtube.com/playlist?list=PLxa__IZYjIaiDL2O22ureD_nLmtgRq9LB • Webinar 3: The Lynx Services Platform (LySP) - Part 1: Overview When: 11/02/2021, 11.30am CET Recording: https://youtube.com/playlist?list=PLxa__IZYjIahhiSXoJbVyxv_iAliExH5e • Webinar 4: The Lynx Services Platform (LySP) - Part 2: The Services When: 18/02/2021, 10.30am CET Recording: https://youtube.com/playlist?list=PLxa__IZYjIaiv5MeV7uZsujv-MOi6SE-a
  • 38.
    CONTACTS CONSORTIUM Please raise your questionsnow…. http://lynx-project.eu/