Enriching Neural Networks with Legal Knowledge

AimeLaw at ALQAC 2021: Enriching Neural
Network Models with
Legal-Domain Knowledge
Nguyen Manh Duc Tuan
Toyo University
November 12, 2020
Ngo Quang Huy
Aimesoft JSC, Vietnam
The 13th IEEE International Conference on
Knowledge and Systems Engineering (KSE 2021)
Nguyen Anh Duong
Pham Quang Nhat Minh

Table of contents
2
■ Introduction
■ Methods
■ Experiments & Results
■ Conclusion

Overview of Our Approaches
3
■ Used traditional Information Retrieval models,
pre-trained language models, legal domain
knowledge
■ Propose a data augmentation method and text
matching method in Task 2: Legal Textual
Entailment
◻ Based on analysing structural characteristics of legal
documents
■ First prize in Task 2 (72.16%), ranked second in Task
1 (80.61% of F2) and Task 3 (64.77% of accuracy)

Main Findings
4
■ Task 1 - Legal document retrieval:
◻ Combining lexical matching model with supporting
model (BERT + CNN, Domain Invariant) improves the
accuracy of document retrieval
■ Task 2 - Legal textual entailment:
◻ Augmenting more training data from law articles helps
tackling the data shortage problem.
◻ Using the most relevant part of an article to the input
query improved the accuracy of legal textual entailment

Task 1: Legal Document Retrieval
5
■ Task components:
◻ Questions
◻ Set of law articles
■ Objective: Automatically retrieving relevant law
articles with respect to the input question
■ Following Nguyen et al., we combine two models
◻ Lexical matching model (BM25)
◻ Supporting model

Proposed Approaches
6
■ Models that can be complementary to the hard
lexical matching (BM25)
◻ Supporting model capture features that are distinct from
those captured by the lexical matching
■ Proposed two support models:
◻ Domain Invariant
◻ Deep CNN

Domain Invariant
7
■ Three main components:
◻ Feature Extractor
◻ Domain Classifier (Id of the law)
◻ Classifier (relevant or not)
■ Training objective:
◻ No discriminative information about the domain
◻ Keeping meaningful information for the classification task

Deep CNN
8
● Using BERT to encode
candidate article and
question
● Using various CNN layers to
extract higher
representations
● Final representations of
article and question are
concatenated

Task 2: Legal Textual Entailment
9
■ Input: question/statement & its relevant articles
■ Output: Yes/No
■ Example:
Statement: Chỉ những hành vi pháp lý đơn phương làm
thay đổi quyền, nghĩa vụ dân sự mới được coi là giao
dịch dân sự.
Relevant articles:
Giao dịch dân sự
Giao dịch dân sự là hợp đồng hoặc hành vi pháp lý đơn
phương làm phát sinh, thay đổi hoặc chấm dứt quyền,
nghĩa vụ dân sự.
⇒ No (The statement is false based on the content of
legal articles)

Proposed Methods
10
Three main components
■ Data Augmentation
■ Text Matching
■ Fine-tuning BERT

Data Augmentation
11
■ By utilizing structural features of a Vietnamese law
article to generate a positive instance:
◻ concatenate each consequence part in clauses with
every condition that followed the consequence
◻ rewrite clauses that do not include any point
■ By utilizing BM25 in Task 1 to generate negative
samples
⇒ Finally, obtain 4237 training samples.

12
Examples of Generated Questions

Text Matching
13
Hút thuốc là hành vi bị nghiêm cấm trong cơ sở giáo
dục. (Smoking is a prohibited act in educational
institutions)
Các hành vi bị nghiêm cấm trong cơ sở giáo
dục (Prohibited acts in educational
institutions)
1. Xúc phạm nhân phẩm, danh dự, xâm
phạm thân thể nhà giáo, cán bộ, người lao
động của cơ sở giáo dục và người học.
(1. Infringing on dignity and honor,
infringing upon the body of teachers,
officials and employees of educational
institutions and learners.)
2. Xuyên tạc nội dung giáo dục. (2.
Misrepresenting of educational content.)
3. Gian lận trong học tập, kiểm tra, thi,
tuyển sinh. (3. Cheating in study, test,
exam, enrollment.)
4. Hút thuốc; uống rượu, bia; gây rối an
ninh, trật tự. (4. Smoking; drinking beer;
disrupting security and order.)
...
0.3
0.2
0.6
Các hành vi bị nghiêm cấm trong cơ sở giáo
dục (Prohibited acts in educational
institutions)
4. Hút thuốc; uống rượu, bia; gây rối an ninh,
trật tự. (4. Smoking; drinking beer; disrupting
security and order.)

14
Example of Text Matching Result

15
Fine-tuning BERT
Legal Entailment as
sentence pair classification
■ Pair the question with the
matched clauses
■ Insert [CLS] and [SEP]
■ Concatenate vectors of 4
last hidden states ⇒
embedding vector of the
sequence pair

Task 3: Legal Question Answering
16
■ Input: question/statement
■ Output: Yes/No
■ Example:
Statement: Chỉ những hành vi pháp lý đơn phương
làm thay đổi quyền, nghĩa vụ dân sự mới được coi
là giao dịch dân sự.
⇒ No

Our Approach
17
■ Combine Task 1 and Task 2 with a slight difference of
the legal textual entailment model.
Legal Query Legal Document
Retrieval
Relevant
Articles
Law Article Data
Legal Textual
Entailment
Legal Query
If there is at least one relevant article
entail the legal query, then the legal
query is TRUE

Experiments and Results: Task 1
18
Run Accuracy Rank
(1) Only BM25 78.42% #7
(2) BM25+DANN 80.61% #2
(3) BM25+CNN 80.61% #2

19
■ Divided augmented data into training and
development subsets
◻ 3813 samples for training, 424 samples for validation
■ Extra experiment: used whole data for training
◻ Obtained 72.16% of accuracy of the private test set
Run Accuracy Rank
(1) BERT, lr = 2e-5 68.89% #1
(2) BERT, lr = 1e-4 67.61% #3
(3) Domain Variant Model 43% #8

20
■ Train the model on 80% of the original training data
■ Max length: 256 at inference phase, 512 at training
Run Accuracy Rank
(1) BM25 + Text Matching 64.77% #2
(2) BM25, Domain Variant Model 61.36% #4
(3) BM25, Deep CNN 61.36% #4

Conclusion
21
■ Our systems are based on:
◻ Traditional approaches (BM25, cosine similarity, tf-idf)
◻ Deep learning models (pre-trained language models)
◻ Legal-domain-knowledge-based data augmentation
techniques
■ Our proposed data augmentation and text matching
methods can be applied to other legal text
processing tasks in other languages rather than
Vietnamese.

Thank you very much for listening!
22

Enriching Neural Networks with Legal Knowledge

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Enriching Neural Networks with Legal Knowledge

Similar to Enriching Neural Networks with Legal Knowledge (20)

More from Minh Pham

More from Minh Pham (12)

Recently uploaded

Recently uploaded (20)

Enriching Neural Networks with Legal Knowledge