A Novel Method and Architecture for Law Processing, Utilising High Performance Computing Infrastructures
Yannis Charalabidis, University of the Aegean, Greece
Michalis Loutsaris, University of the Aegean, Greece
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
A Novel Method and Architecture for Law Processing, Utilising High Performance Computing Infrastructures
1. A Novel Method and Architecture for Law
Processing, Utilising High Performance Computing
Infrastructures
Yannis Charalabidis, University of the Aegean, Greece – yannisx@aegean.gr
Michalis Loutsaris, University of the Aegean, Greece – mloutsaris@aegean.gr
Samos, July 2019
2. 2
Presentation Structure
• The Manylaws Processing Flow & Outputs: a novel method for extracting data,
relations and meaning from the law
• The most important processing steps explained
• The Manylaws Architecture, for allowing parallel processing over High
Performance Computing infrastructures
3. 3
The size of the problem (or why we need HPC)
The information to be acquired, through internet only, and primarily through web services communication where
available, contains:
• All the legal artefacts published by the European Parliament, the European Commission, the EU Council
(EURlex, EUDOR)
• All the legal artefacts published by the 28 local parliaments, as national laws, in English and /or other language
• News published in EU member states, concerning legal events (e.g. law publication, draft law deliberation, EU
directive publication)
• Other administration-generated content (e.g. local communications, regulations)
• Other citizen-generated relevant content (e.g. blogs, newsletters, social media posts)
We estimate that the above database will contain more than 1 trillion words in 21 different languages,
corresponding to about 10 million “volumes” of classical books, when another 5,000 such “volumes” will be added
for study, on a daily basis.
5. 5
Legal Text Mining – The Manylaws Outputs
• Law Acquisition (Get with bulk, Get from API / crawler)
• Law Preprocessing (Rapidminer Trigger, Convert PDF to Text)
• Metadata Extraction (Get the title of the law, Get the number of the law, Get the year of the law, Get the
topic of the law, Get 10 more attributes from Law Source)
• Law Decomposition (Extract Sections, Extract Parts, Extract Chapters, Extract Articles, Extract Paragraphs,
Extract Sub Paragraphs, Extract Clauses, Extract Sentences)
• Law Correlation (Extract Laws Number, Extract Presidential Decrees Identifier, Extract Ministerial Decrees
Identifier, Extract article number of Constitution, Extract Circular Identifier, Extract Regulation Identifier,
Extract Act of Legislative Content Identifier, Extract Directive Number)
Law Acquisition
Law
Preprocessing
Metadata
Extraction
Law
Decomposition
Law Correlation
12. Extract other metadata via 2 ways:
1. Extraction of PDF File metadata using
Python (such as Author, Creation Date etc.)
2. Extraction of PDF metadata using
Rapidminer (such as Pages, file size etc.)
12
Metadata Extraction (3/3) – Other
19. Converting JSON File to XML file is an easy procedure
MongoDB -> saves the json
Relational DB -> saves the tables
File Repository for XML Files
19
Output Data