Combining Different Summarization Techiniques for Legal Text

Computer Science and Engineering
Combining Different
Summarization Techniques for
Legal Text
Filippo Galgani
Paul Compton
Achim Hoffmann
School of Computer Science and Engineering
Faculty of Engineering
University of New South Wales (Australia)
AustLII Research Seminars
1

Automatic Summarization
• Automatically create a shorter version of one or more texts
• Purpose: ﬁght information overload
• Judge if a document is relevant to a topic of interest
• Advantages: Dynamic summaries
• Applications: news, scientiﬁc articles, emails,
social media streams, websites, speech
2

• Common approach:
• content selection (sentence extraction):
• topic models
• graph based methods
• supervised techniques
• ...
4

• Automatic summarization supports:
• locating the documents of interest among
the large collections
• supporting manual curation
• help the lay user accessing legal text
• Legal texts, a challenging domain with the need for automatic language
processing techniques
• Case reports: long and often unstructured documents
5

CORPORATIONS – winding up – court-appointed liquidators – entry into
agreement – able to subsist more than three months – no prior approval under
s 477(2B) of Corporations Act 2001 (Cth) – application to extend "period" for
approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable
– power of Court under s 479(3) to direct liquidator – liquidator directed to act
on agreement as though approved – implied incidental powers of Court – prior
to approve agreement – power under s 1322(4)(a) to declare entry into
agreement and agreement not invalid
COSTS –– proper approach to admiralty and commercial litigation –– goods
transported under bill of lading incorporating Himalaya clause –– shipper and
consignee sued ship owner and stevedore for damage to cargo –– stevedore
successful in obtaining consent orders on motion dismissing proceedings
against it based on Himalaya clause –– stevedore not furnishing critical
evidence or information until after motion ﬁled –– whether stevedore should
have its costs –– importance of parties cooperating to identify the real issues in
dispute –– duty to resolve uncontentious issues at an early stage of litigation
–– stevedore awarded 75% of its costs of the proceedings
6

Corpus
• Training: 2816 FCA cases (2007-2008-2009) with
given catchphrases
• Plus citations (10.4 related cases on average)
• Test: 1074 FCA cases (2006) with given
catchphrases
• Plus citations (11.15 related cases on
average)
7

Pre-processing
• Catchphrases extraction (regular expression): 8
catchphrases on average (1.24 first level), total of 22755
phrases (19251+3504):
• 16566 different phrases
• 15359 (66.1%) occurs only once
• NLTK: sentence splitter, tokenizer, stopword filtering,
stemming, POS tagging
• On average (training):
• 221 sentences (total 622366)
• 7478 words per document (total 21 millions), 34 per
sentence (not filtered)
8

CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
---------------------under s SSS of
the AAAAA act ------------------------
--------------------------------------------
----------------2006 FCA YY----------
-------------------------------------------
--------------------2002 HCA JJ------
--------------------------------------------
--------------------------------------------
-------------------------------------------
Target Document
Target
Catchphrases
9

CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
---------------------under s SSS of
the AAAAA act ------------------------
--------------------------------------------
----------------2006 FCA YY----------
-------------------------------------------
--------------------2002 HCA JJ------
--------------------------------------------
--------------------------------------------
-------------------------------------------
Target Document
Citphrases
(cited catchphrases)
Cite
Cite
CASE 2002 HCA JJ
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
-------------------AATA XX-------------
-------------------------------------------
-------------------------------------------
--------------------------------------------
CASE 2006 FCA YY
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------------NSWSC SS---
-------------------------------------------
-------------------------------------------
--------------------------------------------
Cited Documents
Target
Catchphrases
AAAAA Act: sect SSS
TITLE
(1)-----------------------------------------
---------------------------------------------
(2)-----------------------------------------
---------------------------------------------
---------------------------------------------
(3)-----------------------------------------
-------------------------------------------
(4)-----------------------------------------
--------------------------------------------
-------------------------------------------
-------------------------------------------
Legislation Title
Cited Legislation
Cite
10

CASE 2011 FCA ZZ
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------2008 FCA XX------
-------------------------------------------
-------------------------------------------
--------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
---------------------under s SSS of
the AAAAA act ------------------------
--------------------------------------------
----------------2006 FCA YY----------
-------------------------------------------
--------------------2002 HCA JJ------
--------------------------------------------
--------------------------------------------
-------------------------------------------
Citances
Citphrases
(citing catchphrases)
Cite
Cite
Cite
Target Document
CASE 2009 FCA WW
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------2008 FCA XX------
-------------------------------------------
-------------------------------------------
--------------------------------------------
CASE 2010 FCACF KK
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------2008 FCA XX------
-------------------------------------------
-------------------------------------------
--------------------------------------------
Citphrases
(cited catchphrases)
Cite
Cite
CASE 2002 HCA JJ
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
-------------------AATA XX-------------
-------------------------------------------
-------------------------------------------
--------------------------------------------
CASE 2006 FCA YY
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------------NSWSC SS---
-------------------------------------------
-------------------------------------------
--------------------------------------------
Citing Documents
Cited Documents
Target
Catchphrases
AAAAA Act: sect SSS
TITLE
(1)-----------------------------------------
---------------------------------------------
(2)-----------------------------------------
---------------------------------------------
---------------------------------------------
(3)-----------------------------------------
-------------------------------------------
(4)-----------------------------------------
--------------------------------------------
-------------------------------------------
-------------------------------------------
Legislation Title
Cited Legislation
Cite
11

CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2010 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
-------------------------------------------
Catchphrases:...............,..............
..........,...............,..............,...........
Corpus
Term Statistics
Target DocumentCandidate Sentences
Target Catchphrases
Linguistic pre-processing
Statistics computaton
Citation analysis
Sentence scoring
Evaluation
Rouge
12

Methods (1)Fcfound(t) =
NDocstext&catchp.(t)
NDocstext(t)
cates the term. The Fcfound score of a term does not depend on the docu-
s computed using our database of catchphrases from all the corpus (2816
freq is the previous Fcfound score, multiplied by the number of occurrences
n the document:
Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
dia is the ratio between the number of occurrences of the term in the present
d the average number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
AV Galldoc(NOccur(t))
s the standard TFIDF measure:
TFIDF(t) = Freq(t, doc) · log

NDocstot
NDocs(t)

cstot is the total number of documents in the collection, and NDocs(t) is
Fcfound(t) =
NDocstextcatchp.(t)
NDocstext(t)
s the term. The Fcfound score of a term does not depend on the docu-
omputed using our database of catchphrases from all the corpus (2816
q is the previous Fcfound score, multiplied by the number of occurrences
document:
is the ratio between the number of occurrences of the term in the present
e average number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
e standard TFIDF measure:

NDocstot
NDocs(t)

is the total number of documents in the collection, and NDocs(t) is
cuments that contains the term t.
tes the term. The Fcfound score of a term does not depend on the docu-
computed using our database of catchphrases from all the corpus (2816
req is the previous Fcfound score, multiplied by the number of occurrences
the document:
ia is the ratio between the number of occurrences of the term in the present
the average number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
the standard TFIDF measure:

NDocstot
NDocs(t)

tot is the total number of documents in the collection, and NDocs(t) is
documents that contains the term t.
q and Thresnocc: using our data base of catchphrases and documents, for
se extraction in a given document; then we score sentences based on
se identiﬁed terms. Note that in the computation of all the methods,
ed and stopword ﬁltered.
ratio between how many times (that is in how many documents) a
n the catchphrases and in the text of the case, and how many times in
Fcfound(t) =
NDocstextcatchp.(t)
NDocstext(t)
he term. The Fcfound score of a term does not depend on the docu-
puted using our database of catchphrases from all the corpus (2816
the previous Fcfound score, multiplied by the number of occurrences
ocument:
cfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
he ratio between the number of occurrences of the term in the present
verage number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
tandard TFIDF measure:

How likely to appear in
catchphrases if in text
same, weighted by frequency
how often the term occurs with
respect to the average
frequency * inverse
document frequency
13

Methods (2)
Threshold: select the best threshold on number of
occurrences (or frequency) in the text, above which the term
appears in catchphrases (computed from the corpus)
MyScore:
- collect all the sentences (S) that contain any of the 10 most
frequent terms in the document
- score all the terms t in S by their frequency and ratio
NOccur(t,S)/NOccur(t,doc)
14

CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2010 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
-------------------------------------------
Catchphrases:...............,..............
..........,...............,..............,...........
Corpus
Term Statistics
Target DocumentCandidate Sentences
Target Catchphrases
Linguistic pre-processing
Statistics computaton
Citation analysis
Sentence scoring
Evaluation
Rouge
15

Automatic quantiﬁcation of similarity to a reference
summary
• ROUGE-1: count common tokens
• ROUGE-SU: count common skip-bigrams: in-order
pairs of words, allowing for gaps
• ROUGE-W: longest common subsequence, with
rewards for consecutive matches
Evaluation: Rouge
16

Sentences:
1 The Tribunalʼs review was conducted under s 500
of the Act.
2 It did set the scene, though, for the applicantʼs
apprehended bias challenge.
3 The first is whether the Tribunalʼs decision was
vitiated by a reasonable apprehension of bias.
4 The first respondent pay the applicantʼs costs of the
application.
5 This was relevant to the issue of whether the
applicant did not pass the character test.
6 The decision of the Tribunal be set aside.
Catchphrases
1 reasonable apprehension of bias
2 Tribunal member issued listening device warrant
directed at applicant seven months prior to appeal
hearing before same Tribunal member regarding
refusal of visa
4 denial of procedural fairness
5 decision to issue warrant required forming a view
about a possible criminal offence by applicant
6 applicant refused visa under s 501 of the Migration
Act 1958 (Cth)
7 incompatible functions performed in the
circumstances
8 need to preserve integrity of Tribunalʼs procedures
9 fair minded lay observer could reasonably entertain
an apprehension of bias
10 Tribunalʼs decision to be set aside
Evaluation
(on the catchphrase) is higher than a threshold, the catchph
ered a match. For example if we have a 10-word catchphra
sentence; if they have 6 words in common, we consider th
with threshold 0.5, but not a match with a threshold of 0.7
from the catchphrase to appear in the sentence). For a singl
precision and recall for a set of extracted sentences as:
Recall =
MatchedCatchphrases
TotalCatchphrases
Precision =
The recall is the number of catchphrases matched by at
divided by the total number of catchphrases, the precision
extracted which match at least one catchphrase, divided by
tences. This evaluation procedure lets us measure the perfo
n a threshold, the catchphrase-sentence pair is consid-
have a 10-word catchphrase, and a 15 words candidate
common, we consider this as a match using Rouge-1
h with a threshold of 0.7 (requiring at least 7/10 words
the sentence). For a single document, we can compute
xtracted sentences as:
phrases
rases
Precision =
MatchedSentences
ExtractedSentences
hphrases matched by at least one extracted sentence,
atchphrases, the precision is the number of sentences
e catchphrase, divided by the number of extracted sen-
Sentences:
1 The Tribunalʼs review was conducted under s 500
of the Act.
2 It did set the scene, though, for the applicantʼs
apprehended bias challenge.
3 The first is whether the Tribunalʼs decision was
vitiated by a reasonable apprehension of bias.
4 The first respondent pay the applicantʼs costs of the
application.
5 This was relevant to the issue of whether the
applicant did not pass the character test.
6 The decision of the Tribunal be set aside.
Catchphrases
1 reasonable apprehension of bias
2 Tribunal member issued listening device warrant
directed at applicant seven months prior to appeal
hearing before same Tribunal member regarding
refusal of visa
4 denial of procedural fairness
5 decision to issue warrant required forming a view
about a possible criminal offence by applicant
6 applicant refused visa under s 501 of the Migration
Act 1958 (Cth)
7 incompatible functions performed in the
circumstances
8 need to preserve integrity of Tribunalʼs procedures
9 fair minded lay observer could reasonably entertain
an apprehension of bias
10 Tribunalʼs decision to be set aside
4/4 Tokens
5/6 Skip-Bigrams
Recall = 2/10
Precision=2/6
3/3 Tokens
3/3 Skip-Bigrams
17

Results on training set (precision and recall)
18

Results on test set (precision and recall)
19

Methods-citations
• Creation of citation corpus: for each document, collect all citphrases and
citances (if any)
• Citances or citphrases used as candidate catchphrases (CsOnly, CpOnly):
• ranking by “centrality”, similarity based on Rouge scores
• HITS: hubs and authorities scores
• Citances or citphrases used to extract sentences from the target
document (CsSent, CpSent)
• sentences ranked by average similarity with citation text
• HITS: hubs and authorities scores
20

1 federal court of australia act 1976 (cth), s. 23
2 federal court
3 whether inherent jurisdiction or implied
incidental power or express discretionary power
corporations
no prior approval under s 477(2B) of
Corporations Act 2001 (Cth)
implied incidental powers of Court
prior to approve agreement
Federal Court
implied incidental power
4 corporations
5 failure by liquidator to obtain approval to enter
litigation funding agreement under s 477(2b) of the
corporations act 2001 (cth)
6 application for leave nunc pro tunc pursuant to
corporations act 2001 (cth), s 477(2b) approving
liquidators' entry into costs agreement with ﬁrm of
solicitors
Catchphrases
Citphrases
21

1 The Court may in the exercise of its implied
incidental power and its power under s 23 of the
Federal Court of Australia Act 1976 (Cth) (the
Federal Court Act), approve the Agreement.
1 federal court of australia act 1976 (cth), s. 23
2 federal court
3 whether inherent jurisdiction or implied incidental
power or express discretionary power
corporations
no prior approval under s 477(2B) of
Corporations Act 2001 (Cth)
implied incidental powers of Court
prior to approve agreement
Federal Court
implied incidental power
2 For that reason the prior approval of the Court, a
resolution of creditors or the approval of a committee
of inspectors was required under s 477(2B) of the
Corporations Act 2001 (Cth) (the Act) before the
Agreement was entered into.
4 corporations
5 failure by liquidator to obtain approval to enter
litigation funding agreement under s 477(2b) of the
corporations act 2001 (cth)
6 application for leave nunc pro tunc pursuant to
corporations act 2001 (cth), s 477(2b) approving
liquidators' entry into costs agreement with ﬁrm of
solicitors
Catchphrases
Citphrases
Sentences
22

Precision and Recall of different methods
23

CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
-------------------------------------------
Catchphrases:...............,..............
..........,...............,..............,...........
Corpus
Term Statistics
Frequency Information
Citation Analysis
Linguistic Processing
Candidate Sentences
Target Catchphrases
Evaluation
Rule Base
CASE 2010 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
Target Document
Knowledge
Acquisition
Testing
Rouge
R0:
If (True) then
followed/
applied
except except R1e1:
If (Pattern2)
then followed/
applied
R1:
If (Pattern1)
then
distinguished
Pattern2:
{Pattern1match.contains
NEGATION}
Pattern1:
( {Case} | {Refcase} )
(({Token})[0,8] | {Split} )
{BE}
({Token})[0,4]
{Token.string==”different”
}
if not
R2:
If (Pattern2)
then
distinguished
Pattern2:
( {Case} | {Refcase} )
({Token})[0,8]
{HAVE}
{Token.string==”no”}
({Token})[0,4]
{Token.string==”application
”}
if not
24

s 477(2B) of Corporations Act 2001 (Cth) – application to extend period for
26

Table 1. The 20 most frequent labels
Label Counts
PRACTICE AND PROCEDURE 661
MIGRATION 518
CORPORATIONS 295
ADMINISTRATIVE LAW 235
COSTS 170
TRADE PRACTICES 161
INDUSTRIAL LAW 93
BANKRUPTCY 86
NATIVE TITLE 79
TAXATION 73
INTELLECTUAL PROPERTY 61
EVIDENCE 56
CONTRACT 46
CORPORATIONS LAW 36
INCOME TAX 34
COPYRIGHT 27
PROCEDURE 24
CONTRACTS 24
MIGRATION LAW 24
EQUITY 23
... ...
4.2 Nearest Neighbour 27

Attributes
• CitCase:
• How many time the label is given in related cases (cited
or citing)
• How many time the label is given in related cases in
percent of the total
• The rank of the label in related cases (i.e. it is the ﬁrst or
second most present).
• CitLegis:
• How many time the legislation occurs with the label, vs
how many time occurs without
• number of legislation which satisfy the previous
E.g.: RANK(“native title”)=0
28

Attributes - terms
• TF: how many time the term(s) occur in the document
• TFIDF: tﬁdf rank of the term in the document
• CitSen: how many times the term occurs in sentences
about the target case
• CitCp: how many times the term occurs in all the
catchphrases of other documents that cite or are cited by
the target case
• CitAct: how many time the term occurs in the titles of the
acts cited by the target case
E.g.: TF(“native title”)=1
29

Table 2: Examples of statistics for one condition.
Tf(native title)=1.0 and Rank(native title)=0 − label=native title
1: Matches= 54/57=0.947 new= 27/28=0.964
2: Total ‘native title’ = 79 matched= 54/79=0.683
3: Errors: ‘aborigines’: 2, ‘aboriginals’: 1, ‘costs’: 1
4: Probability Random improvement= 4.18e-80
Row1:The rule matches 57 cases, of which 54 (94.7%) are correct (have ”native
title” label). Of this only 28 (27 correct) were not matched by the rules already
in the KB.
Row2:The total of cases with label native title is 79 (so this rule cover
54/79=68.3%). If the conclusion was generic this row would give a list of labels
posted by the rule.
Row3: The three cases which are labelled uncorrectly have labels ‘aborigines’
(twice), ‘aboriginals’ (once) and ‘cost’ (once) (one of the cases has two labels).
Row4: The probability that the improvement given by the rule is random is
10e-80.
obtain 1320 labels, of which 1185 (90%) are correct. The reason why no more
rules where inserted in the knowledge base is that after a while it is diﬃcult 30

Approaches
• Knowledge Acquisition: 27 rules manually created
• CitLeg: take the label most common in cited/citing
cases + those strongly associated with the legislation
• Machine Learning with bag of words representation
(Naive Bayes, SVM)
• Machine Learning with the identiﬁed features (Naive
Bayes, SVM)
32

Training set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
0.73 0.820 0.484 0.609
1.44 0.562 0.631 0.595
1.07 0.694 0.598 0.643
1.14 0.675 0.621 0.647
Test set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
0.70 0.770 0.459 0.575
1.32 0.555 0.600 0.577
1.03 0.667 0.578 0.620
1.16 0.631 0.629 0.630
33

Training set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
NB bow
SVM bow
NB features
SVM features
0.73 0.820 0.484 0.609
1.44 0.562 0.631 0.595
1.07 0.694 0.598 0.643
1.14 0.675 0.621 0.647
1.00 0.462 0.371 0.411
1.00 0.257 0.207 0.229
1.00 0.610 0.490 0.543
1.00 0.997 0.801 0.889
Test set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
NB bow
SVM bow
NB features
SVM features
0.70 0.770 0.459 0.575
1.32 0.555 0.600 0.577
1.03 0.667 0.578 0.620
1.16 0.631 0.629 0.630
1.00 0.492 0.421 0.454
1.00 0.314 0.269 0.290
1.00 0.194 0.166 0.179
1.00 0.259 0.222 0.239
34

s 477(2B) of Corporations Act 2001 (Cth) – application to extend period for
36

Attributes(terms)-1
• Sentence must contains at least n terms (within a m distance),
given:
• TF: the number of occurrences of the term in this document.
• AvgOcc: the average number of occurrences of the term in
the corpus
• DF: computed as the number of document in which the term
appear at least once divided by the total number of documents.
• TFIDF: computed as the rank of the term in the document
• CpOcc: how many times the term occurs in the set of all the
known catchphrases present in the corpus.
• The FcFound score: ratio between how many times (that is in
how many documents) the term appears both in the
catchphrases and in the text of the case, and how many times in
the text
37

Attributes(terms)-2
• Sentence must contains at least n terms (within a m distance),
given:
• CitSen: how many times the term occurs in all the sentences
(from other documents) that cite the target case.
• CitCp: how many times the term occurs in all the catcphrases
of other documents that cite or are cited by the target case.
• CitLeg: how many times the term occurs in the section titles
of the legislation cited by the target case.
• POS: the part of speech of the term
• PrpNoun: if the term is a proper noun
• Legal: set of legal terms extracted form (Olsson, 1999)
• Cue: set of cue words statistically extracted
38

Attributes(sentence)-1
• Sentence must satisfy:
• HasCitCase: contains at least n citations
• optionally considering only those cited at least m times
• HasCitLaw: contains at least n reference to legislation
• PhraseCit: contains n terms that must occur in one citphrase
• PhraseLaw: contains n terms that must occur in the title of
one cited section
39

Attributes(sentence)-2
• Sentence must satisfy:
• Length: minimum/maximum length
• Position: ﬁrst n or last m sentences
• Looking for one or more speciﬁc term(s):
• for which we can specify all term constraints
• i.e. cost, with tf(cost)10 and citcp(cost)5
• Looking for a sequence of words in the same order
• (i.e.‘native title exist|claim')
40

Knowledge Acquisition
• A user create rules looking at examples (sentences and
catchphrases)
• Interface shows the catchphrases, and
• frequency/statistical information
• citations information
• linguistic information
• mined patterns
• As attributes are selected, the user is guided by:
• example of other correct/incorrect matches
• statistics on rule performance
41

Example
As might have been expected, the bill of lading contains
a “Himalaya” clause in the widest terms which is usual in
such transactions.
goods transported under bill of lading incorporating
Himalaya clause
SENTENCE contains at least 2 terms with CpOcc 1
and FcFound 0.1 andCitCp1andTFIDF 4and
AvgOcc 1
SENTENCE also contains at least 2 terms with CpOcc
20 and FcFound 0.02 and CitCp 1 and isLegal and
TFIDF 16
Catchphrase
Sentence
Rule
42

SENTENCE contains at least 2 terms
with CpOcc 1 and FcFound 0.1
andCitCp1andTFIDF 4and AvgOcc 1
Example
Matches 347/429 sentences (p=0.81)
in 339 files
Catchphrases covered: 331
prob(r_i)=10e-19
That is to say, the Tribunal had to determine whether the
applicant was, by reason of his war-caused incapacity
alone, prevented from continuing to undertake
remunerative work that he had been undertaking.
SENTENCE also contains at least 2 terms
with CpOcc 20 and FcFound 0.02 and
CitCp 1 and isLegal and TFIDF 16
Catchphrases
Sentence
Rule
- whether applicantʼs war-caused incapacity alone prevented him from
continuing to undertake remunerative work he had been undertaking
- whether Tribunal took wrong approach to determining what was
remunerative work that applicant was prevented from continuing to
undertake
SENTENCE contains at least 2 terms with CpOcc 1
and FcFound 0.1 andCitCp1andTFIDF 4and
AvgOcc 1
SENTENCE also contains at least 2 terms with CpOcc
20 and FcFound 0.02 and CitCp 1 and isLegal and
TFIDF 16
43

Performances of the KB as rule are added: train set
44

Performances of the KB as rule are added: test set
45

Results
Training set Precision Recall F
KB
Citations
KB+CIT
LexRank
Random
0.785 0.410 0.538
0.701 0.513 0.592
0.744 0.576 0.650
0.508 0.392 0.443
0.267 0.223 0.243
Test set Precision Recall F
KB
Citations
KB+CIT
LexRank
Random
0.738 0.387 0.507
0.684 0.507 0.582
0.702 0.568 0.628
0.501 0.442 0.470
0.263 0.249 0.255
46

• Corpus of 2816 cases with citation information, an
approximate evaluation to compare methods
• Different kinds of techniques are combined using rules
• Manual Evaluation
Conclusion
47

Questions?
galganif@cse.unsw.edu.au
48

Combining Different Summarization Techiniques for Legal Text

Recommended

Recommended

More Related Content

Similar to Combining Different Summarization Techiniques for Legal Text

Similar to Combining Different Summarization Techiniques for Legal Text (20)

Recently uploaded

Recently uploaded (20)

Combining Different Summarization Techiniques for Legal Text