SlideShare a Scribd company logo
AimeLaw at ALQAC 2021: Enriching Neural
Network Models with
Legal-Domain Knowledge
Nguyen Manh Duc Tuan
Toyo University
November 12, 2020
Ngo Quang Huy
Aimesoft JSC, Vietnam
The 13th IEEE International Conference on
Knowledge and Systems Engineering (KSE 2021)
Nguyen Anh Duong
Aimesoft JSC, Vietnam
Pham Quang Nhat Minh
Aimesoft JSC, Vietnam
Table of contents
2
■ Introduction
■ Methods
■ Experiments & Results
■ Conclusion
Overview of Our Approaches
3
■ Used traditional Information Retrieval models,
pre-trained language models, legal domain
knowledge
■ Propose a data augmentation method and text
matching method in Task 2: Legal Textual
Entailment
◻ Based on analysing structural characteristics of legal
documents
■ First prize in Task 2 (72.16%), ranked second in Task
1 (80.61% of F2) and Task 3 (64.77% of accuracy)
Main Findings
4
■ Task 1 - Legal document retrieval:
◻ Combining lexical matching model with supporting
model (BERT + CNN, Domain Invariant) improves the
accuracy of document retrieval
■ Task 2 - Legal textual entailment:
◻ Augmenting more training data from law articles helps
tackling the data shortage problem.
◻ Using the most relevant part of an article to the input
query improved the accuracy of legal textual entailment
Task 1: Legal Document Retrieval
5
■ Task components:
◻ Questions
◻ Set of law articles
■ Objective: Automatically retrieving relevant law
articles with respect to the input question
■ Following Nguyen et al., we combine two models
◻ Lexical matching model (BM25)
◻ Supporting model
Proposed Approaches
6
■ Models that can be complementary to the hard
lexical matching (BM25)
◻ Supporting model capture features that are distinct from
those captured by the lexical matching
■ Proposed two support models:
◻ Domain Invariant
◻ Deep CNN
Domain Invariant
7
■ Three main components:
◻ Feature Extractor
◻ Domain Classifier (Id of the law)
◻ Classifier (relevant or not)
■ Training objective:
◻ No discriminative information about the domain
◻ Keeping meaningful information for the classification task
Deep CNN
8
● Using BERT to encode
candidate article and
question
● Using various CNN layers to
extract higher
representations
● Final representations of
article and question are
concatenated
Task 2: Legal Textual Entailment
9
■ Input: question/statement & its relevant articles
■ Output: Yes/No
■ Example:
Statement: Chỉ những hành vi pháp lý đơn phương làm
thay đổi quyền, nghĩa vụ dân sự mới được coi là giao
dịch dân sự.
Relevant articles:
Giao dịch dân sự
Giao dịch dân sự là hợp đồng hoặc hành vi pháp lý đơn
phương làm phát sinh, thay đổi hoặc chấm dứt quyền,
nghĩa vụ dân sự.
⇒ No (The statement is false based on the content of
legal articles)
Proposed Methods
10
Three main components
■ Data Augmentation
■ Text Matching
■ Fine-tuning BERT
Data Augmentation
11
■ By utilizing structural features of a Vietnamese law
article to generate a positive instance:
◻ concatenate each consequence part in clauses with
every condition that followed the consequence
◻ rewrite clauses that do not include any point
■ By utilizing BM25 in Task 1 to generate negative
samples
⇒ Finally, obtain 4237 training samples.
12
Examples of Generated Questions
Text Matching
13
Hút thuốc là hành vi bị nghiêm cấm trong cơ sở giáo
dục. (Smoking is a prohibited act in educational
institutions)
Các hành vi bị nghiêm cấm trong cơ sở giáo
dục (Prohibited acts in educational
institutions)
1. Xúc phạm nhân phẩm, danh dự, xâm
phạm thân thể nhà giáo, cán bộ, người lao
động của cơ sở giáo dục và người học.
(1. Infringing on dignity and honor,
infringing upon the body of teachers,
officials and employees of educational
institutions and learners.)
2. Xuyên tạc nội dung giáo dục. (2.
Misrepresenting of educational content.)
3. Gian lận trong học tập, kiểm tra, thi,
tuyển sinh. (3. Cheating in study, test,
exam, enrollment.)
4. Hút thuốc; uống rượu, bia; gây rối an
ninh, trật tự. (4. Smoking; drinking beer;
disrupting security and order.)
...
0.3
0.2
0.6
Các hành vi bị nghiêm cấm trong cơ sở giáo
dục (Prohibited acts in educational
institutions)
4. Hút thuốc; uống rượu, bia; gây rối an ninh,
trật tự. (4. Smoking; drinking beer; disrupting
security and order.)
14
Example of Text Matching Result
15
Fine-tuning BERT
Legal Entailment as
sentence pair classification
■ Pair the question with the
matched clauses
■ Insert [CLS] and [SEP]
■ Concatenate vectors of 4
last hidden states ⇒
embedding vector of the
sequence pair
Task 3: Legal Question Answering
16
■ Input: question/statement
■ Output: Yes/No
■ Example:
Statement: Chỉ những hành vi pháp lý đơn phương
làm thay đổi quyền, nghĩa vụ dân sự mới được coi
là giao dịch dân sự.
⇒ No
Our Approach
17
■ Combine Task 1 and Task 2 with a slight difference of
the legal textual entailment model.
Legal Query Legal Document
Retrieval
Relevant
Articles
Law Article Data
Legal Textual
Entailment
Legal Query
If there is at least one relevant article
entail the legal query, then the legal
query is TRUE
Experiments and Results: Task 1
18
Run Accuracy Rank
(1) Only BM25 78.42% #7
(2) BM25+DANN 80.61% #2
(3) BM25+CNN 80.61% #2
Experiments and Results: Task 2
19
■ Divided augmented data into training and
development subsets
◻ 3813 samples for training, 424 samples for validation
■ Extra experiment: used whole data for training
◻ Obtained 72.16% of accuracy of the private test set
Run Accuracy Rank
(1) BERT, lr = 2e-5 68.89% #1
(2) BERT, lr = 1e-4 67.61% #3
(3) Domain Variant Model 43% #8
Experiments and Results: Task 3
20
■ Train the model on 80% of the original training data
■ Max length: 256 at inference phase, 512 at training
Run Accuracy Rank
(1) BM25 + Text Matching 64.77% #2
(2) BM25, Domain Variant Model 61.36% #4
(3) BM25, Deep CNN 61.36% #4
Conclusion
21
■ Our systems are based on:
◻ Traditional approaches (BM25, cosine similarity, tf-idf)
◻ Deep learning models (pre-trained language models)
◻ Legal-domain-knowledge-based data augmentation
techniques
■ Our proposed data augmentation and text matching
methods can be applied to other legal text
processing tasks in other languages rather than
Vietnamese.
Thank you very much for listening!
22

More Related Content

What's hot

Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
Tatsuya Yokota
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
Louis (Yufeng) Wang
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
Dev Nath
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
hyunyoung Lee
 
How to make a presentation with LATEX? Introduction to BeamerPresentation ben...
How to make a presentation with LATEX? Introduction to BeamerPresentation ben...How to make a presentation with LATEX? Introduction to BeamerPresentation ben...
How to make a presentation with LATEX? Introduction to BeamerPresentation ben...
researchcenterm
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
Sangwoo Mo
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
Hemantha Kulathilake
 
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo,  EPFL, SwitzerlandHybrid neural networks for time series learning by Tian Guo,  EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
EuroIoTa
 
Introduction to Latex
Introduction to Latex Introduction to Latex
Introduction to Latex
Tareq Salaheldeen
 
Data mining techniques
Data mining techniquesData mining techniques
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
Bioinformatics and Computational Biosciences Branch
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
alessio_ferrari
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
Ferdin Joe John Joseph PhD
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
Sebastian Ruder
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
Luc Brun
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
Yoonho Lee
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
Mounia Lalmas-Roelleke
 
Clipstyler: Image style transfer with a single text condition
Clipstyler: Image style transfer with a single text conditionClipstyler: Image style transfer with a single text condition
Clipstyler: Image style transfer with a single text condition
ssuser6bab17
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
Rodion Kiryukhin
 

What's hot (20)

Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
 
How to make a presentation with LATEX? Introduction to BeamerPresentation ben...
How to make a presentation with LATEX? Introduction to BeamerPresentation ben...How to make a presentation with LATEX? Introduction to BeamerPresentation ben...
How to make a presentation with LATEX? Introduction to BeamerPresentation ben...
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo,  EPFL, SwitzerlandHybrid neural networks for time series learning by Tian Guo,  EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
 
Introduction to Latex
Introduction to Latex Introduction to Latex
Introduction to Latex
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
Clipstyler: Image style transfer with a single text condition
Clipstyler: Image style transfer with a single text conditionClipstyler: Image style transfer with a single text condition
Clipstyler: Image style transfer with a single text condition
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 

Similar to AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Knowledge

School of Science, Technology, Engineering and MathDep.docx
School of Science, Technology, Engineering and MathDep.docxSchool of Science, Technology, Engineering and MathDep.docx
School of Science, Technology, Engineering and MathDep.docx
anhlodge
 
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATIONRESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
International Journal of Technical Research & Application
 
DWDM syllabus.doc
DWDM syllabus.docDWDM syllabus.doc
DWDM syllabus.doc
RitCse
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligencefeiwin
 
Pros And Cons Of Applying Association Rule Mining In LMS
Pros And Cons Of Applying Association Rule Mining In LMSPros And Cons Of Applying Association Rule Mining In LMS
Pros And Cons Of Applying Association Rule Mining In LMS
NIET Journal of Engineering & Technology (NIETJET)
 
Apply deep learning to improve the question analysis model in the Vietnamese ...
Apply deep learning to improve the question analysis model in the Vietnamese ...Apply deep learning to improve the question analysis model in the Vietnamese ...
Apply deep learning to improve the question analysis model in the Vietnamese ...
IJECEIAES
 
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
IRJET Journal
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
IOSR Journals
 
Ijetr042132
Ijetr042132Ijetr042132
Introduction.pptx
Introduction.pptxIntroduction.pptx
Introduction.pptx
Mahsadelavari
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
Mark Reynolds
 
Intelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptx
Anonymous366406
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
IJDKP
 
ASSIGNMENT BOOKLET 2017 DIPLOMA IN IT 3 YEARS
ASSIGNMENT BOOKLET 2017 DIPLOMA IN IT 3 YEARSASSIGNMENT BOOKLET 2017 DIPLOMA IN IT 3 YEARS
ASSIGNMENT BOOKLET 2017 DIPLOMA IN IT 3 YEARS
Karin Faust
 
Associative Classification: Synopsis
Associative Classification: SynopsisAssociative Classification: Synopsis
Associative Classification: SynopsisJagdeep Singh Malhi
 
Legal Markup Generation in the Large: An Experience Report
Legal Markup Generation in the Large: An Experience ReportLegal Markup Generation in the Large: An Experience Report
Legal Markup Generation in the Large: An Experience Report
Lionel Briand
 
VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...
VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...
VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...
cscpconf
 
Professional Practice Course Outline
Professional Practice Course OutlineProfessional Practice Course Outline
Professional Practice Course Outline
Saqib Raza
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
kevig
 
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
ijnlc
 

Similar to AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Knowledge (20)

School of Science, Technology, Engineering and MathDep.docx
School of Science, Technology, Engineering and MathDep.docxSchool of Science, Technology, Engineering and MathDep.docx
School of Science, Technology, Engineering and MathDep.docx
 
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATIONRESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
 
DWDM syllabus.doc
DWDM syllabus.docDWDM syllabus.doc
DWDM syllabus.doc
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligence
 
Pros And Cons Of Applying Association Rule Mining In LMS
Pros And Cons Of Applying Association Rule Mining In LMSPros And Cons Of Applying Association Rule Mining In LMS
Pros And Cons Of Applying Association Rule Mining In LMS
 
Apply deep learning to improve the question analysis model in the Vietnamese ...
Apply deep learning to improve the question analysis model in the Vietnamese ...Apply deep learning to improve the question analysis model in the Vietnamese ...
Apply deep learning to improve the question analysis model in the Vietnamese ...
 
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
 
Ijetr042132
Ijetr042132Ijetr042132
Ijetr042132
 
Introduction.pptx
Introduction.pptxIntroduction.pptx
Introduction.pptx
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
 
Intelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptx
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
 
ASSIGNMENT BOOKLET 2017 DIPLOMA IN IT 3 YEARS
ASSIGNMENT BOOKLET 2017 DIPLOMA IN IT 3 YEARSASSIGNMENT BOOKLET 2017 DIPLOMA IN IT 3 YEARS
ASSIGNMENT BOOKLET 2017 DIPLOMA IN IT 3 YEARS
 
Associative Classification: Synopsis
Associative Classification: SynopsisAssociative Classification: Synopsis
Associative Classification: Synopsis
 
Legal Markup Generation in the Large: An Experience Report
Legal Markup Generation in the Large: An Experience ReportLegal Markup Generation in the Large: An Experience Report
Legal Markup Generation in the Large: An Experience Report
 
VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...
VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...
VISUALIZATION OF A SYNTHETIC REPRESENTATION OF ASSOCIATION RULES TO ASSIST EX...
 
Professional Practice Course Outline
Professional Practice Course OutlineProfessional Practice Course Outline
Professional Practice Course Outline
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
ON THE RELEVANCE OF QUERY EXPANSION USING PARALLEL CORPORA AND WORD EMBEDDING...
 

More from Minh Pham

Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTPrompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Minh Pham
 
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
Minh Pham
 
Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)
Minh Pham
 
Giới thiệu về AIML
Giới thiệu về AIMLGiới thiệu về AIML
Giới thiệu về AIML
Minh Pham
 
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Minh Pham
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Deep Contexualized Representation
Deep Contexualized RepresentationDeep Contexualized Representation
Deep Contexualized Representation
Minh Pham
 
Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)
Minh Pham
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
Minh Pham
 
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Minh Pham
 
Research Methods in Natural Language Processing
Research Methods in Natural Language ProcessingResearch Methods in Natural Language Processing
Research Methods in Natural Language Processing
Minh Pham
 
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotCác bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Minh Pham
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
Minh Pham
 

More from Minh Pham (13)

Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTPrompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
 
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
 
Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)
 
Giới thiệu về AIML
Giới thiệu về AIMLGiới thiệu về AIML
Giới thiệu về AIML
 
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Deep Contexualized Representation
Deep Contexualized RepresentationDeep Contexualized Representation
Deep Contexualized Representation
 
Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
 
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
 
Research Methods in Natural Language Processing
Research Methods in Natural Language ProcessingResearch Methods in Natural Language Processing
Research Methods in Natural Language Processing
 
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotCác bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 

AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Knowledge

  • 1. AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Knowledge Nguyen Manh Duc Tuan Toyo University November 12, 2020 Ngo Quang Huy Aimesoft JSC, Vietnam The 13th IEEE International Conference on Knowledge and Systems Engineering (KSE 2021) Nguyen Anh Duong Aimesoft JSC, Vietnam Pham Quang Nhat Minh Aimesoft JSC, Vietnam
  • 2. Table of contents 2 ■ Introduction ■ Methods ■ Experiments & Results ■ Conclusion
  • 3. Overview of Our Approaches 3 ■ Used traditional Information Retrieval models, pre-trained language models, legal domain knowledge ■ Propose a data augmentation method and text matching method in Task 2: Legal Textual Entailment ◻ Based on analysing structural characteristics of legal documents ■ First prize in Task 2 (72.16%), ranked second in Task 1 (80.61% of F2) and Task 3 (64.77% of accuracy)
  • 4. Main Findings 4 ■ Task 1 - Legal document retrieval: ◻ Combining lexical matching model with supporting model (BERT + CNN, Domain Invariant) improves the accuracy of document retrieval ■ Task 2 - Legal textual entailment: ◻ Augmenting more training data from law articles helps tackling the data shortage problem. ◻ Using the most relevant part of an article to the input query improved the accuracy of legal textual entailment
  • 5. Task 1: Legal Document Retrieval 5 ■ Task components: ◻ Questions ◻ Set of law articles ■ Objective: Automatically retrieving relevant law articles with respect to the input question ■ Following Nguyen et al., we combine two models ◻ Lexical matching model (BM25) ◻ Supporting model
  • 6. Proposed Approaches 6 ■ Models that can be complementary to the hard lexical matching (BM25) ◻ Supporting model capture features that are distinct from those captured by the lexical matching ■ Proposed two support models: ◻ Domain Invariant ◻ Deep CNN
  • 7. Domain Invariant 7 ■ Three main components: ◻ Feature Extractor ◻ Domain Classifier (Id of the law) ◻ Classifier (relevant or not) ■ Training objective: ◻ No discriminative information about the domain ◻ Keeping meaningful information for the classification task
  • 8. Deep CNN 8 ● Using BERT to encode candidate article and question ● Using various CNN layers to extract higher representations ● Final representations of article and question are concatenated
  • 9. Task 2: Legal Textual Entailment 9 ■ Input: question/statement & its relevant articles ■ Output: Yes/No ■ Example: Statement: Chỉ những hành vi pháp lý đơn phương làm thay đổi quyền, nghĩa vụ dân sự mới được coi là giao dịch dân sự. Relevant articles: Giao dịch dân sự Giao dịch dân sự là hợp đồng hoặc hành vi pháp lý đơn phương làm phát sinh, thay đổi hoặc chấm dứt quyền, nghĩa vụ dân sự. ⇒ No (The statement is false based on the content of legal articles)
  • 10. Proposed Methods 10 Three main components ■ Data Augmentation ■ Text Matching ■ Fine-tuning BERT
  • 11. Data Augmentation 11 ■ By utilizing structural features of a Vietnamese law article to generate a positive instance: ◻ concatenate each consequence part in clauses with every condition that followed the consequence ◻ rewrite clauses that do not include any point ■ By utilizing BM25 in Task 1 to generate negative samples ⇒ Finally, obtain 4237 training samples.
  • 13. Text Matching 13 Hút thuốc là hành vi bị nghiêm cấm trong cơ sở giáo dục. (Smoking is a prohibited act in educational institutions) Các hành vi bị nghiêm cấm trong cơ sở giáo dục (Prohibited acts in educational institutions) 1. Xúc phạm nhân phẩm, danh dự, xâm phạm thân thể nhà giáo, cán bộ, người lao động của cơ sở giáo dục và người học. (1. Infringing on dignity and honor, infringing upon the body of teachers, officials and employees of educational institutions and learners.) 2. Xuyên tạc nội dung giáo dục. (2. Misrepresenting of educational content.) 3. Gian lận trong học tập, kiểm tra, thi, tuyển sinh. (3. Cheating in study, test, exam, enrollment.) 4. Hút thuốc; uống rượu, bia; gây rối an ninh, trật tự. (4. Smoking; drinking beer; disrupting security and order.) ... 0.3 0.2 0.6 Các hành vi bị nghiêm cấm trong cơ sở giáo dục (Prohibited acts in educational institutions) 4. Hút thuốc; uống rượu, bia; gây rối an ninh, trật tự. (4. Smoking; drinking beer; disrupting security and order.)
  • 14. 14 Example of Text Matching Result
  • 15. 15 Fine-tuning BERT Legal Entailment as sentence pair classification ■ Pair the question with the matched clauses ■ Insert [CLS] and [SEP] ■ Concatenate vectors of 4 last hidden states ⇒ embedding vector of the sequence pair
  • 16. Task 3: Legal Question Answering 16 ■ Input: question/statement ■ Output: Yes/No ■ Example: Statement: Chỉ những hành vi pháp lý đơn phương làm thay đổi quyền, nghĩa vụ dân sự mới được coi là giao dịch dân sự. ⇒ No
  • 17. Our Approach 17 ■ Combine Task 1 and Task 2 with a slight difference of the legal textual entailment model. Legal Query Legal Document Retrieval Relevant Articles Law Article Data Legal Textual Entailment Legal Query If there is at least one relevant article entail the legal query, then the legal query is TRUE
  • 18. Experiments and Results: Task 1 18 Run Accuracy Rank (1) Only BM25 78.42% #7 (2) BM25+DANN 80.61% #2 (3) BM25+CNN 80.61% #2
  • 19. Experiments and Results: Task 2 19 ■ Divided augmented data into training and development subsets ◻ 3813 samples for training, 424 samples for validation ■ Extra experiment: used whole data for training ◻ Obtained 72.16% of accuracy of the private test set Run Accuracy Rank (1) BERT, lr = 2e-5 68.89% #1 (2) BERT, lr = 1e-4 67.61% #3 (3) Domain Variant Model 43% #8
  • 20. Experiments and Results: Task 3 20 ■ Train the model on 80% of the original training data ■ Max length: 256 at inference phase, 512 at training Run Accuracy Rank (1) BM25 + Text Matching 64.77% #2 (2) BM25, Domain Variant Model 61.36% #4 (3) BM25, Deep CNN 61.36% #4
  • 21. Conclusion 21 ■ Our systems are based on: ◻ Traditional approaches (BM25, cosine similarity, tf-idf) ◻ Deep learning models (pre-trained language models) ◻ Legal-domain-knowledge-based data augmentation techniques ■ Our proposed data augmentation and text matching methods can be applied to other legal text processing tasks in other languages rather than Vietnamese.
  • 22. Thank you very much for listening! 22