GENERALIZED LANGUAGE MODELS FOR CASE LAW RETRIEVAL - Big Data Expo 2019

Legal Information Retrieval with Generalized Language
Models
Julien ROSSI, Evangelos KANOULAS
September 19, 2019
Big Data Expo

Who we are
Prof. Dr. Evangelos Kanoulas
Professor at Amsterdam Business School, UvA
Professor at Institute of Informatics, UvA
Researcher in Information Retrieval, Conversational Agents
Julien Rossi
Lecturer at Amsterdam Business School, UvA
PhD Candidate, Legal Text Analytics
MBA Big Data, ABS
MSc Computer Sciences
1

Agenda
• Task, Dataset, Evaluation
• Model, Setup, Results
• Lessons Learned
2

COLIEE 2019
COLIEE stands for Competition on Legal Information Extraction/Entailment. This
competition started in 2014, in collaboration between University of Alberta (Canada)
and National Institute of Informatics (Japan).
It is a testing ground for applying text analytics to legal documents and tasks.
We have 2 Research Questions:
• RQ1: Can we deal with long documents ?
• RQ2: Can we improve retrieval with limited data ?
3

Task 1 - Legal Case Retrieval Task
• Collection of Canadian Supreme Court Judgments
• Single Topic: Immigration & Citizenship
• Search for Relevant Documents within a collection
• Query is a Legal Case
• Noticed cases are relevant to the query case
• Relevance is binary, and not motivated
4

• Labeled Dataset contains 285 query cases
• Each query case comes with a collection of 200 candidate cases
• In total 10000 unique documents
• Unlabeled Dataset contains 61 unknown query cases
• All cases involving Immigration and Citizenship
5

0 2000 4000 6000 8000 10000
Number of Tokens per Document
0.0
0.5
1.0
ProportionofDocuments
Cumulative Distribution of Document size
• Cumulative Distribution of number of
tokens per document
• Up to 12000 tokens in a document
• Median around 2500 tokens
• We can address RQ1 and RQ2
6

Task 3 - Statute Law Retrieval
• Search for Civil Code articles relevant to assess the validity of a legal statement
• Based on Japanese Bar Exam
• Query is a statement
• Relevant articles explain the point made in the query
• The legislation is the entire Japanese Civil Code, about 1000 articles in English
• The labeled Dataset contains 650 queries
• We can address RQ2
7

Evaluation
• We use Recall and Precision on the ranked list of retrieved documents
8

Model, Workflow
• Binary classifier for pairwise relevance trained on labeled dataset
• Derivate it into a ranker
• Predict relevance on unlabeled dataset
• BERT Implementation with google-bert1, pytorch-transformers2,
fast-bert3 and apex4
• LTR Implementation with Tensorflow Ranking5
1
https://github.com/google-research/bert
2
https://github.com/huggingface/pytorch-transformers
3
https://github.com/kaushaltrivedi/fast-bert
4
https://github.com/NVIDIA/apex
5
https://github.com/tensorflow/ranking
9

BERT: A Language Model
pursuant
to
article
41.3
,
the
[MASK]
can
defer
(...)
[MASK]
0.12
-0.3
1.45
0.001
(...)
truck
apple
be
defender
beyond
the
dropped
(...)
10

Pairwise Relevance Classiﬁer
• Solve the sequence length limitation (512 WordPiece tokens) by summarizing the
English part of documents
• Summarization based on TextRank (Barrios & al, 2016), implemented in gensim6
• Fine-Tuning of a BERT (Devlin & al, 2018) model followed by MLP
• This model is named Fine-Tuned in results
6
https://radimrehurek.com/gensim/
11

Pairwise Relevance Classiﬁer with Summarization
1. 0.94
2. 0.91
3. ...
12

Ranker with Learning to Rank
• Generate Features from the Fine-Tuned BERT
• Use these features as input to a Learning to Rank model
• These models are named LTR in results
• Training material limited to 285 lists
13

Ranker with Learning to Rank
1. 0.21
2. 0.14
3. ...
LTR
14

In-Domain Pre-Training
• Starting from a pre-trained BERT model
• Running additional iterations of pre-training tasks
• Using in-domain texts
• Canadian Court Decisions for the Task 1
• Japanese Supreme Court rulings in English, for the Task 3
• For pre-training only, 100k iterations, around 24 hours on 1 GPU
• This model is named Pre-Trained in results
15

In-Domain Pre-Training
Pre-Training
Fine-Tuning
16

Results - Case Law Retrieval
System R@10 P@10 R@1 P@1 P@R MAP
BM25 Summaries 0.14 0.07 0.02 0.11 0.11 0.14
BM25 0.76 0.36 0.32 0.70 0.68 0.73
PERFECT 0.97 0.50 0.41 1.00 1.00 1.00
Fine-Tuned 0.75 0.34 0.31 0.80 0.63 0.70
LTR 01 0.74 0.34 0.32 0.85 0.63 0.69
LTR 02 0.75 0.35 0.32 0.84 0.65 0.70
Pre-Trained 0.81 0.39 0.34 0.90 0.73 0.79
Table 1: Summary of metrics for all systems
17

Results - Statute Law Retrieval
System R@5 R@10 R@30 MAP P@1
UB3 - COLIEE Winner 0.7978 0.8539 0.9551 0.7988 n/a
Fine-Tuned 0.9010 0.9203 0.9686 0.8246 0.7971
Pre-Trained 0.8913 0.9130 0.9444 0.8321 0.8261
Table 2: Summary of metrics for all systems
18

Results
Back to our 2 Research Questions:
• RQ1: Can we deal with long documents ?
• Summarized texts as input to Neural Language Model allowed for retrieval
performance on par or higher than baselines
• RQ2: Can we improve retrieval with limited data ?
• Additional pre-training improves the performance of information retrieval, for small
dataset
• The uniformity of the legal language at hand (Court Decision in English) allows for
quick training
19

Critical Review of BERT
• BERT is a Language Model, it learns the language presented at pre-training
• BERT is strong with syntactic and semantic tasks
• ”Open Sesame (...)”, Lin & al., June 2019
• ”What does BERT look at? (...)”, Clark & al., June 2019
• ”BERT rediscovers the Classical NLP Pipeline”, Tenney & al., May 2019
• It is adapted to tasks operating at text level, less adapted to tasks operating at
higher levels of language understanding
• Pre-Training on similar texts as the downstream task is proven to add language
knowledge
20

• Ongoing discussion about Attention and Explanation
• ”Attention is not Explanation”, Jain and Wallace, May 2019
• ”Is Attention Interpretable?”, Serrano and Smith, June 2019
• ”Attention is not not Explanation”, Wiegreﬀe and Pinter, August 2019
• This is common to all systems based on Transformers: Open-AI GPT, GPT2,
Transformer XL, XLNet, XLM, etc.
• Going through attention weights with bertviz7, it seems the model focuses more
on word similarities than on semantics
7
https://github.com/jessevig/bertviz
21

• The ”Clever Hans”8 eﬀect, mistaking deep knowledge with surface correlations
• ”Probing Neural Network (...)”, Niven and Kao, August 2019
• In the age of AI, ”Correlation is not causation” is ”Good results are not
knowledge acquisition”
• Focus on dataset’s diversity of text for similar usage of knowledge
8
https://thegradient.pub/nlps-clever-hans-moment-has-arrived/
22

Take-Home
• Stay aware of dataset’s limitations
• Get to know what the model actually learns vs How it performs
• Learn through unsupervised Pre-Training
• Lots of orthogonal ways forward:
• More data for pre-training
• New pre-training tasks
• New Model architecture
• Diﬀerent heuristic for summarization
• More annotation on relevance assessment
23

Work with us?
We are interested in collaborations with the Industry, in the framework of Research
Projects.
We cover many domains:
• Information Extraction from Contracts
• Summarization of Legal Documents
• Information Retrieval, Search for Regulation
We need access to relevant Data
24

GENERALIZED LANGUAGE MODELS FOR CASE LAW RETRIEVAL - Big Data Expo 2019

Recommended

Recommended

More Related Content

Similar to GENERALIZED LANGUAGE MODELS FOR CASE LAW RETRIEVAL - Big Data Expo 2019

Similar to GENERALIZED LANGUAGE MODELS FOR CASE LAW RETRIEVAL - Big Data Expo 2019 (20)

More from webwinkelvakdag

More from webwinkelvakdag (20)

Recently uploaded

Recently uploaded (20)

GENERALIZED LANGUAGE MODELS FOR CASE LAW RETRIEVAL - Big Data Expo 2019