SlideShare a Scribd company logo
1 of 49
Computer Science and Engineering
Combining Different
Summarization Techniques for
Legal Text
Filippo Galgani
Paul Compton
Achim Hoffmann
School of Computer Science and Engineering
Faculty of Engineering
University of New South Wales (Australia)
AustLII Research Seminars
1
Automatic Summarization
• Automatically create a shorter version of one or more texts
• Purpose: fight information overload
• Judge if a document is relevant to a topic of interest
• Advantages: Dynamic summaries
• Applications: news, scientific articles, emails,
social media streams, websites, speech
2
Automatic Summarization
3
Automatic Summarization
• Common approach:
• content selection (sentence extraction):
• topic models
• graph based methods
• supervised techniques
• ...
4
Automatic Summarization
• Automatic summarization supports:
• locating the documents of interest among
the large collections
• supporting manual curation
• help the lay user accessing legal text
• Legal texts, a challenging domain with the need for automatic language
processing techniques
• Case reports: long and often unstructured documents
5
CORPORATIONS – winding up – court-appointed liquidators – entry into
agreement – able to subsist more than three months – no prior approval under
s 477(2B) of Corporations Act 2001 (Cth) – application to extend "period" for
approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable
– power of Court under s 479(3) to direct liquidator – liquidator directed to act
on agreement as though approved – implied incidental powers of Court – prior
to approve agreement – power under s 1322(4)(a) to declare entry into
agreement and agreement not invalid
COSTS –– proper approach to admiralty and commercial litigation –– goods
transported under bill of lading incorporating Himalaya clause –– shipper and
consignee sued ship owner and stevedore for damage to cargo –– stevedore
successful in obtaining consent orders on motion dismissing proceedings
against it based on Himalaya clause –– stevedore not furnishing critical
evidence or information until after motion filed –– whether stevedore should
have its costs –– importance of parties cooperating to identify the real issues in
dispute –– duty to resolve uncontentious issues at an early stage of litigation
–– stevedore awarded 75% of its costs of the proceedings
6
Corpus
• Training: 2816 FCA cases (2007-2008-2009) with
given catchphrases
• Plus citations (10.4 related cases on average)
• Test: 1074 FCA cases (2006) with given
catchphrases
• Plus citations (11.15 related cases on
average)
7
Pre-processing
• Catchphrases extraction (regular expression): 8
catchphrases on average (1.24 first level), total of 22755
phrases (19251+3504):
• 16566 different phrases
• 15359 (66.1%) occurs only once
• NLTK: sentence splitter, tokenizer, stopword filtering,
stemming, POS tagging
• On average (training):
• 221 sentences (total 622366)
• 7478 words per document (total 21 millions), 34 per
sentence (not filtered)
8
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
---------------------under s SSS of
the AAAAA act ------------------------
--------------------------------------------
----------------2006 FCA YY----------
-------------------------------------------
--------------------2002 HCA JJ------
--------------------------------------------
--------------------------------------------
-------------------------------------------
Target Document
Target
Catchphrases
9
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
---------------------under s SSS of
the AAAAA act ------------------------
--------------------------------------------
----------------2006 FCA YY----------
-------------------------------------------
--------------------2002 HCA JJ------
--------------------------------------------
--------------------------------------------
-------------------------------------------
Target Document
Citphrases
(cited catchphrases)
Cite
Cite
CASE 2002 HCA JJ
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
-------------------AATA XX-------------
-------------------------------------------
-------------------------------------------
--------------------------------------------
CASE 2006 FCA YY
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------------NSWSC SS---
-------------------------------------------
-------------------------------------------
--------------------------------------------
Cited Documents
Target
Catchphrases
AAAAA Act: sect SSS
TITLE
(1)-----------------------------------------
---------------------------------------------
(2)-----------------------------------------
---------------------------------------------
---------------------------------------------
(3)-----------------------------------------
-------------------------------------------
(4)-----------------------------------------
--------------------------------------------
-------------------------------------------
-------------------------------------------
Legislation Title
Cited Legislation
Cite
10
CASE 2011 FCA ZZ
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------2008 FCA XX------
-------------------------------------------
-------------------------------------------
--------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
---------------------under s SSS of
the AAAAA act ------------------------
--------------------------------------------
----------------2006 FCA YY----------
-------------------------------------------
--------------------2002 HCA JJ------
--------------------------------------------
--------------------------------------------
-------------------------------------------
Citances
Citphrases
(citing catchphrases)
Cite
Cite
Cite
Target Document
CASE 2009 FCA WW
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------2008 FCA XX------
-------------------------------------------
-------------------------------------------
--------------------------------------------
CASE 2010 FCACF KK
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------2008 FCA XX------
-------------------------------------------
-------------------------------------------
--------------------------------------------
Citphrases
(cited catchphrases)
Cite
Cite
CASE 2002 HCA JJ
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
-------------------AATA XX-------------
-------------------------------------------
-------------------------------------------
--------------------------------------------
CASE 2006 FCA YY
Catchphrases:...............,..............
..........,...............,..............
Decision:
---------------------------------------------
---------------------------------------------
---------------------------------------------
---------------------------------------------
--------------------------------------------
--------------------------NSWSC SS---
-------------------------------------------
-------------------------------------------
--------------------------------------------
Citing Documents
Cited Documents
Target
Catchphrases
AAAAA Act: sect SSS
TITLE
(1)-----------------------------------------
---------------------------------------------
(2)-----------------------------------------
---------------------------------------------
---------------------------------------------
(3)-----------------------------------------
-------------------------------------------
(4)-----------------------------------------
--------------------------------------------
-------------------------------------------
-------------------------------------------
Legislation Title
Cited Legislation
Cite
11
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2010 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
-------------------------------------------
Catchphrases:...............,..............
..........,...............,..............,...........
Corpus
Term Statistics
Target DocumentCandidate Sentences
Target Catchphrases
Linguistic pre-processing
Statistics computaton
Citation analysis
Sentence scoring
Evaluation
Rouge
12
Methods (1)Fcfound(t) =
NDocstext&catchp.(t)
NDocstext(t)
cates the term. The Fcfound score of a term does not depend on the docu-
s computed using our database of catchphrases from all the corpus (2816
freq is the previous Fcfound score, multiplied by the number of occurrences
n the document:
Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
dia is the ratio between the number of occurrences of the term in the present
d the average number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
AV Galldoc(NOccur(t))
s the standard TFIDF measure:
TFIDF(t) = Freq(t, doc) · log

NDocstot
NDocs(t)

cstot is the total number of documents in the collection, and NDocs(t) is
Fcfound(t) =
NDocstextcatchp.(t)
NDocstext(t)
s the term. The Fcfound score of a term does not depend on the docu-
omputed using our database of catchphrases from all the corpus (2816
q is the previous Fcfound score, multiplied by the number of occurrences
document:
Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
is the ratio between the number of occurrences of the term in the present
e average number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
AV Galldoc(NOccur(t))
e standard TFIDF measure:
TFIDF(t) = Freq(t, doc) · log

NDocstot
NDocs(t)

is the total number of documents in the collection, and NDocs(t) is
cuments that contains the term t.
tes the term. The Fcfound score of a term does not depend on the docu-
computed using our database of catchphrases from all the corpus (2816
req is the previous Fcfound score, multiplied by the number of occurrences
the document:
Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
ia is the ratio between the number of occurrences of the term in the present
the average number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
AV Galldoc(NOccur(t))
the standard TFIDF measure:
TFIDF(t) = Freq(t, doc) · log

NDocstot
NDocs(t)

tot is the total number of documents in the collection, and NDocs(t) is
documents that contains the term t.
q and Thresnocc: using our data base of catchphrases and documents, for
se extraction in a given document; then we score sentences based on
se identified terms. Note that in the computation of all the methods,
ed and stopword filtered.
ratio between how many times (that is in how many documents) a
n the catchphrases and in the text of the case, and how many times in
Fcfound(t) =
NDocstextcatchp.(t)
NDocstext(t)
he term. The Fcfound score of a term does not depend on the docu-
puted using our database of catchphrases from all the corpus (2816
the previous Fcfound score, multiplied by the number of occurrences
ocument:
cfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
he ratio between the number of occurrences of the term in the present
verage number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
AV Galldoc(NOccur(t))
tandard TFIDF measure:
 
How likely to appear in
catchphrases if in text
same, weighted by frequency
how often the term occurs with
respect to the average
frequency * inverse
document frequency
13
Methods (2)
Threshold: select the best threshold on number of
occurrences (or frequency) in the text, above which the term
appears in catchphrases (computed from the corpus)
MyScore:
- collect all the sentences (S) that contain any of the 10 most
frequent terms in the document
- score all the terms t in S by their frequency and ratio
NOccur(t,S)/NOccur(t,doc)
14
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2010 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
-------------------------------------------
Catchphrases:...............,..............
..........,...............,..............,...........
Corpus
Term Statistics
Target DocumentCandidate Sentences
Target Catchphrases
Linguistic pre-processing
Statistics computaton
Citation analysis
Sentence scoring
Evaluation
Rouge
15
Automatic quantification of similarity to a reference
summary
• ROUGE-1: count common tokens
• ROUGE-SU: count common skip-bigrams: in-order
pairs of words, allowing for gaps
• ROUGE-W: longest common subsequence, with
rewards for consecutive matches
Evaluation: Rouge
16
Sentences:
1 The Tribunalʼs review was conducted under s 500
of the Act.
2 It did set the scene, though, for the applicantʼs
apprehended bias challenge.
3 The first is whether the Tribunalʼs decision was
vitiated by a reasonable apprehension of bias.
4 The first respondent pay the applicantʼs costs of the
application.
5 This was relevant to the issue of whether the
applicant did not pass the character test.
6 The decision of the Tribunal be set aside.
Catchphrases
1 reasonable apprehension of bias
2 Tribunal member issued listening device warrant
directed at applicant seven months prior to appeal
hearing before same Tribunal member regarding
refusal of visa
4 denial of procedural fairness
5 decision to issue warrant required forming a view
about a possible criminal offence by applicant
6 applicant refused visa under s 501 of the Migration
Act 1958 (Cth)
7 incompatible functions performed in the
circumstances
8 need to preserve integrity of Tribunalʼs procedures
9 fair minded lay observer could reasonably entertain
an apprehension of bias
10 Tribunalʼs decision to be set aside
Evaluation
(on the catchphrase) is higher than a threshold, the catchph
ered a match. For example if we have a 10-word catchphra
sentence; if they have 6 words in common, we consider th
with threshold 0.5, but not a match with a threshold of 0.7
from the catchphrase to appear in the sentence). For a singl
precision and recall for a set of extracted sentences as:
Recall =
MatchedCatchphrases
TotalCatchphrases
Precision =
The recall is the number of catchphrases matched by at
divided by the total number of catchphrases, the precision
extracted which match at least one catchphrase, divided by
tences. This evaluation procedure lets us measure the perfo
n a threshold, the catchphrase-sentence pair is consid-
have a 10-word catchphrase, and a 15 words candidate
common, we consider this as a match using Rouge-1
h with a threshold of 0.7 (requiring at least 7/10 words
the sentence). For a single document, we can compute
xtracted sentences as:
phrases
rases
Precision =
MatchedSentences
ExtractedSentences
hphrases matched by at least one extracted sentence,
atchphrases, the precision is the number of sentences
e catchphrase, divided by the number of extracted sen-
Sentences:
1 The Tribunalʼs review was conducted under s 500
of the Act.
2 It did set the scene, though, for the applicantʼs
apprehended bias challenge.
3 The first is whether the Tribunalʼs decision was
vitiated by a reasonable apprehension of bias.
4 The first respondent pay the applicantʼs costs of the
application.
5 This was relevant to the issue of whether the
applicant did not pass the character test.
6 The decision of the Tribunal be set aside.
Catchphrases
1 reasonable apprehension of bias
2 Tribunal member issued listening device warrant
directed at applicant seven months prior to appeal
hearing before same Tribunal member regarding
refusal of visa
4 denial of procedural fairness
5 decision to issue warrant required forming a view
about a possible criminal offence by applicant
6 applicant refused visa under s 501 of the Migration
Act 1958 (Cth)
7 incompatible functions performed in the
circumstances
8 need to preserve integrity of Tribunalʼs procedures
9 fair minded lay observer could reasonably entertain
an apprehension of bias
10 Tribunalʼs decision to be set aside
4/4 Tokens
5/6 Skip-Bigrams
Recall = 2/10
Precision=2/6
3/3 Tokens
3/3 Skip-Bigrams
17
Results on training set (precision and recall)
18
Results on test set (precision and recall)
19
Methods-citations
• Creation of citation corpus: for each document, collect all citphrases and
citances (if any)
• Citances or citphrases used as candidate catchphrases (CsOnly, CpOnly):
• ranking by “centrality”, similarity based on Rouge scores
• HITS: hubs and authorities scores
• Citances or citphrases used to extract sentences from the target
document (CsSent, CpSent)
• sentences ranked by average similarity with citation text
• HITS: hubs and authorities scores
20
1 federal court of australia act 1976 (cth), s. 23
2 federal court
3 whether inherent jurisdiction or implied
incidental power or express discretionary power
corporations
no prior approval under s 477(2B) of
Corporations Act 2001 (Cth)
implied incidental powers of Court
prior to approve agreement
Federal Court
implied incidental power
4 corporations
5 failure by liquidator to obtain approval to enter
litigation funding agreement under s 477(2b) of the
corporations act 2001 (cth)
6 application for leave nunc pro tunc pursuant to
corporations act 2001 (cth), s 477(2b) approving
liquidators' entry into costs agreement with firm of
solicitors
Catchphrases
Citphrases
21
1 The Court may in the exercise of its implied
incidental power and its power under s 23 of the
Federal Court of Australia Act 1976 (Cth) (the
Federal Court Act), approve the Agreement.
1 federal court of australia act 1976 (cth), s. 23
2 federal court
3 whether inherent jurisdiction or implied incidental
power or express discretionary power
corporations
no prior approval under s 477(2B) of
Corporations Act 2001 (Cth)
implied incidental powers of Court
prior to approve agreement
Federal Court
implied incidental power
2 For that reason the prior approval of the Court, a
resolution of creditors or the approval of a committee
of inspectors was required under s 477(2B) of the
Corporations Act 2001 (Cth) (the Act) before the
Agreement was entered into.
4 corporations
5 failure by liquidator to obtain approval to enter
litigation funding agreement under s 477(2b) of the
corporations act 2001 (cth)
6 application for leave nunc pro tunc pursuant to
corporations act 2001 (cth), s 477(2b) approving
liquidators' entry into costs agreement with firm of
solicitors
Catchphrases
Citphrases
Sentences
22
Precision and Recall of different methods
23
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
-------------------------------------------
Catchphrases:...............,..............
..........,...............,..............,...........
Corpus
Term Statistics
Frequency Information
Citation Analysis
Linguistic Processing
Candidate Sentences
Target Catchphrases
Evaluation
Rule Base
CASE 2010 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
Target Document
Knowledge
Acquisition
Testing
Rouge
R0:
If (True) then
followed/
applied
except except R1e1:
If (Pattern2)
then followed/
applied
R1:
If (Pattern1)
then
distinguished
Pattern2:
{Pattern1match.contains
NEGATION}
Pattern1:
( {Case} | {Refcase} )
(({Token})[0,8] | {Split} )
{BE}
({Token})[0,4]
{Token.string==”different”
}
if not
R2:
If (Pattern2)
then
distinguished
Pattern2:
( {Case} | {Refcase} )
({Token})[0,8]
{HAVE}
{Token.string==”no”}
({Token})[0,4]
{Token.string==”application
”}
if not
24
High Level
Catchphrases
25
CORPORATIONS – winding up – court-appointed liquidators – entry into
agreement – able to subsist more than three months – no prior approval under
s 477(2B) of Corporations Act 2001 (Cth) – application to extend period for
approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable
– power of Court under s 479(3) to direct liquidator – liquidator directed to act
on agreement as though approved – implied incidental powers of Court – prior
to approve agreement – power under s 1322(4)(a) to declare entry into
agreement and agreement not invalid
COSTS –– proper approach to admiralty and commercial litigation –– goods
transported under bill of lading incorporating Himalaya clause –– shipper and
consignee sued ship owner and stevedore for damage to cargo –– stevedore
successful in obtaining consent orders on motion dismissing proceedings
against it based on Himalaya clause –– stevedore not furnishing critical
evidence or information until after motion filed –– whether stevedore should
have its costs –– importance of parties cooperating to identify the real issues in
dispute –– duty to resolve uncontentious issues at an early stage of litigation
–– stevedore awarded 75% of its costs of the proceedings
26
Table 1. The 20 most frequent labels
Label Counts
PRACTICE AND PROCEDURE 661
MIGRATION 518
CORPORATIONS 295
ADMINISTRATIVE LAW 235
COSTS 170
TRADE PRACTICES 161
INDUSTRIAL LAW 93
BANKRUPTCY 86
NATIVE TITLE 79
TAXATION 73
INTELLECTUAL PROPERTY 61
EVIDENCE 56
CONTRACT 46
CORPORATIONS LAW 36
INCOME TAX 34
COPYRIGHT 27
PROCEDURE 24
CONTRACTS 24
MIGRATION LAW 24
EQUITY 23
... ...
4.2 Nearest Neighbour 27
Attributes
• CitCase:
• How many time the label is given in related cases (cited
or citing)
• How many time the label is given in related cases in
percent of the total
• The rank of the label in related cases (i.e. it is the first or
second most present).
• CitLegis:
• How many time the legislation occurs with the label, vs
how many time occurs without
• number of legislation which satisfy the previous
E.g.: RANK(“native title”)=0
28
Attributes - terms
• TF: how many time the term(s) occur in the document
• TFIDF: tfidf rank of the term in the document
• CitSen: how many times the term occurs in sentences
about the target case
• CitCp: how many times the term occurs in all the
catchphrases of other documents that cite or are cited by
the target case
• CitAct: how many time the term occurs in the titles of the
acts cited by the target case
E.g.: TF(“native title”)=1
29
Table 2: Examples of statistics for one condition.
Tf(native title)=1.0 and Rank(native title)=0 −  label=native title
1: Matches= 54/57=0.947 new= 27/28=0.964
2: Total ‘native title’ = 79 matched= 54/79=0.683
3: Errors: ‘aborigines’: 2, ‘aboriginals’: 1, ‘costs’: 1
4: Probability Random improvement= 4.18e-80
Row1:The rule matches 57 cases, of which 54 (94.7%) are correct (have ”native
title” label). Of this only 28 (27 correct) were not matched by the rules already
in the KB.
Row2:The total of cases with label native title is 79 (so this rule cover
54/79=68.3%). If the conclusion was generic this row would give a list of labels
posted by the rule.
Row3: The three cases which are labelled uncorrectly have labels ‘aborigines’
(twice), ‘aboriginals’ (once) and ‘cost’ (once) (one of the cases has two labels).
Row4: The probability that the improvement given by the rule is random is
10e-80.
obtain 1320 labels, of which 1185 (90%) are correct. The reason why no more
rules where inserted in the knowledge base is that after a while it is difficult 30
Training set
31
Approaches
• Knowledge Acquisition: 27 rules manually created
• CitLeg: take the label most common in cited/citing
cases + those strongly associated with the legislation
• Machine Learning with bag of words representation
(Naive Bayes, SVM)
• Machine Learning with the identified features (Naive
Bayes, SVM)
32
Training set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
0.73 0.820 0.484 0.609
1.44 0.562 0.631 0.595
1.07 0.694 0.598 0.643
1.14 0.675 0.621 0.647
Test set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
0.70 0.770 0.459 0.575
1.32 0.555 0.600 0.577
1.03 0.667 0.578 0.620
1.16 0.631 0.629 0.630
33
Training set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
NB bow
SVM bow
NB features
SVM features
0.73 0.820 0.484 0.609
1.44 0.562 0.631 0.595
1.07 0.694 0.598 0.643
1.14 0.675 0.621 0.647
1.00 0.462 0.371 0.411
1.00 0.257 0.207 0.229
1.00 0.610 0.490 0.543
1.00 0.997 0.801 0.889
Test set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
NB bow
SVM bow
NB features
SVM features
0.70 0.770 0.459 0.575
1.32 0.555 0.600 0.577
1.03 0.667 0.578 0.620
1.16 0.631 0.629 0.630
1.00 0.492 0.421 0.454
1.00 0.314 0.269 0.290
1.00 0.194 0.166 0.179
1.00 0.259 0.222 0.239
34
Second Level
Catchphrases
35
CORPORATIONS – winding up – court-appointed liquidators – entry into
agreement – able to subsist more than three months – no prior approval under
s 477(2B) of Corporations Act 2001 (Cth) – application to extend period for
approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable
– power of Court under s 479(3) to direct liquidator – liquidator directed to act
on agreement as though approved – implied incidental powers of Court – prior
to approve agreement – power under s 1322(4)(a) to declare entry into
agreement and agreement not invalid
COSTS –– proper approach to admiralty and commercial litigation –– goods
transported under bill of lading incorporating Himalaya clause –– shipper and
consignee sued ship owner and stevedore for damage to cargo –– stevedore
successful in obtaining consent orders on motion dismissing proceedings
against it based on Himalaya clause –– stevedore not furnishing critical
evidence or information until after motion filed –– whether stevedore should
have its costs –– importance of parties cooperating to identify the real issues in
dispute –– duty to resolve uncontentious issues at an early stage of litigation
–– stevedore awarded 75% of its costs of the proceedings
36
Attributes(terms)-1
• Sentence must contains at least n terms (within a m distance),
given:
• TF: the number of occurrences of the term in this document.
• AvgOcc: the average number of occurrences of the term in
the corpus
• DF: computed as the number of document in which the term
appear at least once divided by the total number of documents.
• TFIDF: computed as the rank of the term in the document
• CpOcc: how many times the term occurs in the set of all the
known catchphrases present in the corpus.
• The FcFound score: ratio between how many times (that is in
how many documents) the term appears both in the
catchphrases and in the text of the case, and how many times in
the text
37
Attributes(terms)-2
• Sentence must contains at least n terms (within a m distance),
given:
• CitSen: how many times the term occurs in all the sentences
(from other documents) that cite the target case.
• CitCp: how many times the term occurs in all the catcphrases
of other documents that cite or are cited by the target case.
• CitLeg: how many times the term occurs in the section titles
of the legislation cited by the target case.
• POS: the part of speech of the term
• PrpNoun: if the term is a proper noun
• Legal: set of legal terms extracted form (Olsson, 1999)
• Cue: set of cue words statistically extracted
38
Attributes(sentence)-1
• Sentence must satisfy:
• HasCitCase: contains at least n citations
• optionally considering only those cited at least m times
• HasCitLaw: contains at least n reference to legislation
• optionally considering only those cited at least m times
• PhraseCit: contains n terms that must occur in one citphrase
• optionally considering only those cited at least m times
• PhraseLaw: contains n terms that must occur in the title of
one cited section
• optionally considering only those cited at least m times
39
Attributes(sentence)-2
• Sentence must satisfy:
• Length: minimum/maximum length
• Position: first n or last m sentences
• Looking for one or more specific term(s):
• for which we can specify all term constraints
• i.e. cost, with tf(cost)10 and citcp(cost)5
• Looking for a sequence of words in the same order
• (i.e.‘native title exist|claim')
40
Knowledge Acquisition
• A user create rules looking at examples (sentences and
catchphrases)
• Interface shows the catchphrases, and
• frequency/statistical information
• citations information
• linguistic information
• mined patterns
• As attributes are selected, the user is guided by:
• example of other correct/incorrect matches
• statistics on rule performance
41
Example
As might have been expected, the bill of lading contains
a “Himalaya” clause in the widest terms which is usual in
such transactions.
goods transported under bill of lading incorporating
Himalaya clause
SENTENCE contains at least 2 terms with CpOcc  1
and FcFound  0.1 andCitCp1andTFIDF 4and
AvgOcc  1
SENTENCE also contains at least 2 terms with CpOcc 
20 and FcFound  0.02 and CitCp  1 and isLegal and
TFIDF  16
Catchphrase
Sentence
Rule
42
SENTENCE contains at least 2 terms
with CpOcc  1 and FcFound  0.1
andCitCp1andTFIDF 4and AvgOcc  1
Example
Matches 347/429 sentences (p=0.81)
in 339 files
Catchphrases covered: 331
prob(r_i)=10e-19
That is to say, the Tribunal had to determine whether the
applicant was, by reason of his war-caused incapacity
alone, prevented from continuing to undertake
remunerative work that he had been undertaking.
SENTENCE also contains at least 2 terms
with CpOcc  20 and FcFound  0.02 and
CitCp  1 and isLegal and TFIDF  16
Catchphrases
Sentence
Rule
- whether applicantʼs war-caused incapacity alone prevented him from
continuing to undertake remunerative work he had been undertaking
- whether Tribunal took wrong approach to determining what was
remunerative work that applicant was prevented from continuing to
undertake
SENTENCE contains at least 2 terms with CpOcc  1
and FcFound  0.1 andCitCp1andTFIDF 4and
AvgOcc  1
SENTENCE also contains at least 2 terms with CpOcc 
20 and FcFound  0.02 and CitCp  1 and isLegal and
TFIDF  16
43
Performances of the KB as rule are added: train set
44
Performances of the KB as rule are added: test set
45
Results
Training set Precision Recall F
KB
Citations
KB+CIT
LexRank
Random
0.785 0.410 0.538
0.701 0.513 0.592
0.744 0.576 0.650
0.508 0.392 0.443
0.267 0.223 0.243
Test set Precision Recall F
KB
Citations
KB+CIT
LexRank
Random
0.738 0.387 0.507
0.684 0.507 0.582
0.702 0.568 0.628
0.501 0.442 0.470
0.263 0.249 0.255
46
• Corpus of 2816 cases with citation information, an
approximate evaluation to compare methods
• Different kinds of techniques are combined using rules
• Manual Evaluation
Conclusion
47
Questions?
galganif@cse.unsw.edu.au
48
49

More Related Content

Similar to Combining Different Summarization Techiniques for Legal Text

The Future of Courts - Convicted by a Robot - REALLY?
The Future of Courts - Convicted by a Robot - REALLY?The Future of Courts - Convicted by a Robot - REALLY?
The Future of Courts - Convicted by a Robot - REALLY?WorldFuture2015
 
ADA Accessibility Requirements for Internet Retailers
ADA Accessibility Requirements for Internet RetailersADA Accessibility Requirements for Internet Retailers
ADA Accessibility Requirements for Internet RetailersEduardo Meza-Etienne
 
Industry@RuleML2015: Automated Decision Support for Financial Regulatory/Pol...
Industry@RuleML2015:  Automated Decision Support for Financial Regulatory/Pol...Industry@RuleML2015:  Automated Decision Support for Financial Regulatory/Pol...
Industry@RuleML2015: Automated Decision Support for Financial Regulatory/Pol...RuleML
 
Domain Name Association Registry-Registrar Operations Working Group
Domain Name Association Registry-Registrar Operations Working GroupDomain Name Association Registry-Registrar Operations Working Group
Domain Name Association Registry-Registrar Operations Working GroupDomain Name Association
 
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP
 
Big Data Analysis for Standard Essential Patents
Big Data Analysis for Standard Essential PatentsBig Data Analysis for Standard Essential Patents
Big Data Analysis for Standard Essential PatentsAlex G. Lee, Ph.D. Esq. CLP
 
Agile2011 Conference – Key Take Aways
Agile2011 Conference – Key Take AwaysAgile2011 Conference – Key Take Aways
Agile2011 Conference – Key Take AwaysSynerzip
 
Appellate Practice - 101 (Series: Newbie Litigator School 101 - Part 2)
Appellate Practice - 101 (Series: Newbie Litigator School 101 - Part 2)Appellate Practice - 101 (Series: Newbie Litigator School 101 - Part 2)
Appellate Practice - 101 (Series: Newbie Litigator School 101 - Part 2)Financial Poise
 
Readying a Patent for Sale
Readying a Patent for SaleReadying a Patent for Sale
Readying a Patent for SaleErik Oliver
 
The Open Networking Foundation: Standard Bearer for SDN
The Open Networking Foundation: Standard Bearer for SDNThe Open Networking Foundation: Standard Bearer for SDN
The Open Networking Foundation: Standard Bearer for SDNOpen Networking Summits
 
Common Pitfalls and Key Considerations in Getting (and Keeping) Intellectual ...
Common Pitfalls and Key Considerations in Getting (and Keeping) Intellectual ...Common Pitfalls and Key Considerations in Getting (and Keeping) Intellectual ...
Common Pitfalls and Key Considerations in Getting (and Keeping) Intellectual ...UCICove
 
CostSavingsSolutionsForCorpCounsel_2009
CostSavingsSolutionsForCorpCounsel_2009CostSavingsSolutionsForCorpCounsel_2009
CostSavingsSolutionsForCorpCounsel_2009SBHPatrick
 
Implications of 2015 Amendments to the Federal Rules of Civil Procedure
Implications of 2015 Amendments to the Federal Rules of Civil ProcedureImplications of 2015 Amendments to the Federal Rules of Civil Procedure
Implications of 2015 Amendments to the Federal Rules of Civil ProcedureWinston & Strawn LLP
 

Similar to Combining Different Summarization Techiniques for Legal Text (20)

The Future of Courts - Convicted by a Robot - REALLY?
The Future of Courts - Convicted by a Robot - REALLY?The Future of Courts - Convicted by a Robot - REALLY?
The Future of Courts - Convicted by a Robot - REALLY?
 
ADA Accessibility Requirements for Internet Retailers
ADA Accessibility Requirements for Internet RetailersADA Accessibility Requirements for Internet Retailers
ADA Accessibility Requirements for Internet Retailers
 
Industry@RuleML2015: Automated Decision Support for Financial Regulatory/Pol...
Industry@RuleML2015:  Automated Decision Support for Financial Regulatory/Pol...Industry@RuleML2015:  Automated Decision Support for Financial Regulatory/Pol...
Industry@RuleML2015: Automated Decision Support for Financial Regulatory/Pol...
 
DNA Reg Ops Meeting
DNA Reg Ops MeetingDNA Reg Ops Meeting
DNA Reg Ops Meeting
 
Domain Name Association Registry-Registrar Operations Working Group
Domain Name Association Registry-Registrar Operations Working GroupDomain Name Association Registry-Registrar Operations Working Group
Domain Name Association Registry-Registrar Operations Working Group
 
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
 
Holdup & Royalty Stacking: Theory & Evidence - Anne Layne-Farrar - December 2...
Holdup & Royalty Stacking: Theory & Evidence - Anne Layne-Farrar - December 2...Holdup & Royalty Stacking: Theory & Evidence - Anne Layne-Farrar - December 2...
Holdup & Royalty Stacking: Theory & Evidence - Anne Layne-Farrar - December 2...
 
5 things municipal lawyers need to know about eDiscovery
5 things municipal lawyers need to know about eDiscovery5 things municipal lawyers need to know about eDiscovery
5 things municipal lawyers need to know about eDiscovery
 
Big Data Analysis for Standard Essential Patents
Big Data Analysis for Standard Essential PatentsBig Data Analysis for Standard Essential Patents
Big Data Analysis for Standard Essential Patents
 
Investigative powers in practice – EUROPEAN UNION – November 2018 OECD GFC
Investigative powers in practice – EUROPEAN UNION – November 2018 OECD GFCInvestigative powers in practice – EUROPEAN UNION – November 2018 OECD GFC
Investigative powers in practice – EUROPEAN UNION – November 2018 OECD GFC
 
DNA Reg Ops
DNA Reg OpsDNA Reg Ops
DNA Reg Ops
 
Agile2011 Conference – Key Take Aways
Agile2011 Conference – Key Take AwaysAgile2011 Conference – Key Take Aways
Agile2011 Conference – Key Take Aways
 
Appellate Practice - 101 (Series: Newbie Litigator School 101 - Part 2)
Appellate Practice - 101 (Series: Newbie Litigator School 101 - Part 2)Appellate Practice - 101 (Series: Newbie Litigator School 101 - Part 2)
Appellate Practice - 101 (Series: Newbie Litigator School 101 - Part 2)
 
Readying a Patent for Sale
Readying a Patent for SaleReadying a Patent for Sale
Readying a Patent for Sale
 
The Open Networking Foundation: Standard Bearer for SDN
The Open Networking Foundation: Standard Bearer for SDNThe Open Networking Foundation: Standard Bearer for SDN
The Open Networking Foundation: Standard Bearer for SDN
 
Common Pitfalls and Key Considerations in Getting (and Keeping) Intellectual ...
Common Pitfalls and Key Considerations in Getting (and Keeping) Intellectual ...Common Pitfalls and Key Considerations in Getting (and Keeping) Intellectual ...
Common Pitfalls and Key Considerations in Getting (and Keeping) Intellectual ...
 
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
 
CostSavingsSolutionsForCorpCounsel_2009
CostSavingsSolutionsForCorpCounsel_2009CostSavingsSolutionsForCorpCounsel_2009
CostSavingsSolutionsForCorpCounsel_2009
 
FIRE and SMEs
FIRE and SMEsFIRE and SMEs
FIRE and SMEs
 
Implications of 2015 Amendments to the Federal Rules of Civil Procedure
Implications of 2015 Amendments to the Federal Rules of Civil ProcedureImplications of 2015 Amendments to the Federal Rules of Civil Procedure
Implications of 2015 Amendments to the Federal Rules of Civil Procedure
 

Recently uploaded

ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 

Recently uploaded (20)

ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 

Combining Different Summarization Techiniques for Legal Text

  • 1. Computer Science and Engineering Combining Different Summarization Techniques for Legal Text Filippo Galgani Paul Compton Achim Hoffmann School of Computer Science and Engineering Faculty of Engineering University of New South Wales (Australia) AustLII Research Seminars 1
  • 2. Automatic Summarization • Automatically create a shorter version of one or more texts • Purpose: fight information overload • Judge if a document is relevant to a topic of interest • Advantages: Dynamic summaries • Applications: news, scientific articles, emails, social media streams, websites, speech 2
  • 4. Automatic Summarization • Common approach: • content selection (sentence extraction): • topic models • graph based methods • supervised techniques • ... 4
  • 5. Automatic Summarization • Automatic summarization supports: • locating the documents of interest among the large collections • supporting manual curation • help the lay user accessing legal text • Legal texts, a challenging domain with the need for automatic language processing techniques • Case reports: long and often unstructured documents 5
  • 6. CORPORATIONS – winding up – court-appointed liquidators – entry into agreement – able to subsist more than three months – no prior approval under s 477(2B) of Corporations Act 2001 (Cth) – application to extend "period" for approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable – power of Court under s 479(3) to direct liquidator – liquidator directed to act on agreement as though approved – implied incidental powers of Court – prior to approve agreement – power under s 1322(4)(a) to declare entry into agreement and agreement not invalid COSTS –– proper approach to admiralty and commercial litigation –– goods transported under bill of lading incorporating Himalaya clause –– shipper and consignee sued ship owner and stevedore for damage to cargo –– stevedore successful in obtaining consent orders on motion dismissing proceedings against it based on Himalaya clause –– stevedore not furnishing critical evidence or information until after motion filed –– whether stevedore should have its costs –– importance of parties cooperating to identify the real issues in dispute –– duty to resolve uncontentious issues at an early stage of litigation –– stevedore awarded 75% of its costs of the proceedings 6
  • 7. Corpus • Training: 2816 FCA cases (2007-2008-2009) with given catchphrases • Plus citations (10.4 related cases on average) • Test: 1074 FCA cases (2006) with given catchphrases • Plus citations (11.15 related cases on average) 7
  • 8. Pre-processing • Catchphrases extraction (regular expression): 8 catchphrases on average (1.24 first level), total of 22755 phrases (19251+3504): • 16566 different phrases • 15359 (66.1%) occurs only once • NLTK: sentence splitter, tokenizer, stopword filtering, stemming, POS tagging • On average (training): • 221 sentences (total 622366) • 7478 words per document (total 21 millions), 34 per sentence (not filtered) 8
  • 9. CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- ---------------------under s SSS of the AAAAA act ------------------------ -------------------------------------------- ----------------2006 FCA YY---------- ------------------------------------------- --------------------2002 HCA JJ------ -------------------------------------------- -------------------------------------------- ------------------------------------------- Target Document Target Catchphrases 9
  • 10. CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- ---------------------under s SSS of the AAAAA act ------------------------ -------------------------------------------- ----------------2006 FCA YY---------- ------------------------------------------- --------------------2002 HCA JJ------ -------------------------------------------- -------------------------------------------- ------------------------------------------- Target Document Citphrases (cited catchphrases) Cite Cite CASE 2002 HCA JJ Catchphrases:...............,.............. ..........,...............,.............. Decision: --------------------------------------------- --------------------------------------------- --------------------------------------------- --------------------------------------------- -------------------------------------------- -------------------AATA XX------------- ------------------------------------------- ------------------------------------------- -------------------------------------------- CASE 2006 FCA YY Catchphrases:...............,.............. ..........,...............,.............. Decision: --------------------------------------------- --------------------------------------------- --------------------------------------------- --------------------------------------------- -------------------------------------------- --------------------------NSWSC SS--- ------------------------------------------- ------------------------------------------- -------------------------------------------- Cited Documents Target Catchphrases AAAAA Act: sect SSS TITLE (1)----------------------------------------- --------------------------------------------- (2)----------------------------------------- --------------------------------------------- --------------------------------------------- (3)----------------------------------------- ------------------------------------------- (4)----------------------------------------- -------------------------------------------- ------------------------------------------- ------------------------------------------- Legislation Title Cited Legislation Cite 10
  • 11. CASE 2011 FCA ZZ Catchphrases:...............,.............. ..........,...............,.............. Decision: --------------------------------------------- --------------------------------------------- --------------------------------------------- --------------------------------------------- -------------------------------------------- --------------------2008 FCA XX------ ------------------------------------------- ------------------------------------------- -------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- ---------------------under s SSS of the AAAAA act ------------------------ -------------------------------------------- ----------------2006 FCA YY---------- ------------------------------------------- --------------------2002 HCA JJ------ -------------------------------------------- -------------------------------------------- ------------------------------------------- Citances Citphrases (citing catchphrases) Cite Cite Cite Target Document CASE 2009 FCA WW Catchphrases:...............,.............. ..........,...............,.............. Decision: --------------------------------------------- --------------------------------------------- --------------------------------------------- --------------------------------------------- -------------------------------------------- --------------------2008 FCA XX------ ------------------------------------------- ------------------------------------------- -------------------------------------------- CASE 2010 FCACF KK Catchphrases:...............,.............. ..........,...............,.............. Decision: --------------------------------------------- --------------------------------------------- --------------------------------------------- --------------------------------------------- -------------------------------------------- --------------------2008 FCA XX------ ------------------------------------------- ------------------------------------------- -------------------------------------------- Citphrases (cited catchphrases) Cite Cite CASE 2002 HCA JJ Catchphrases:...............,.............. ..........,...............,.............. Decision: --------------------------------------------- --------------------------------------------- --------------------------------------------- --------------------------------------------- -------------------------------------------- -------------------AATA XX------------- ------------------------------------------- ------------------------------------------- -------------------------------------------- CASE 2006 FCA YY Catchphrases:...............,.............. ..........,...............,.............. Decision: --------------------------------------------- --------------------------------------------- --------------------------------------------- --------------------------------------------- -------------------------------------------- --------------------------NSWSC SS--- ------------------------------------------- ------------------------------------------- -------------------------------------------- Citing Documents Cited Documents Target Catchphrases AAAAA Act: sect SSS TITLE (1)----------------------------------------- --------------------------------------------- (2)----------------------------------------- --------------------------------------------- --------------------------------------------- (3)----------------------------------------- ------------------------------------------- (4)----------------------------------------- -------------------------------------------- ------------------------------------------- ------------------------------------------- Legislation Title Cited Legislation Cite 11
  • 12. CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2010 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- --------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------- ------------------------------------------- Catchphrases:...............,.............. ..........,...............,..............,........... Corpus Term Statistics Target DocumentCandidate Sentences Target Catchphrases Linguistic pre-processing Statistics computaton Citation analysis Sentence scoring Evaluation Rouge 12
  • 13. Methods (1)Fcfound(t) = NDocstext&catchp.(t) NDocstext(t) cates the term. The Fcfound score of a term does not depend on the docu- s computed using our database of catchphrases from all the corpus (2816 freq is the previous Fcfound score, multiplied by the number of occurrences n the document: Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc) dia is the ratio between the number of occurrences of the term in the present d the average number of occurrences of the term in the collection: Freqmedia(t) = NOccur(t, doc) AV Galldoc(NOccur(t)) s the standard TFIDF measure: TFIDF(t) = Freq(t, doc) · log NDocstot NDocs(t) cstot is the total number of documents in the collection, and NDocs(t) is Fcfound(t) = NDocstextcatchp.(t) NDocstext(t) s the term. The Fcfound score of a term does not depend on the docu- omputed using our database of catchphrases from all the corpus (2816 q is the previous Fcfound score, multiplied by the number of occurrences document: Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc) is the ratio between the number of occurrences of the term in the present e average number of occurrences of the term in the collection: Freqmedia(t) = NOccur(t, doc) AV Galldoc(NOccur(t)) e standard TFIDF measure: TFIDF(t) = Freq(t, doc) · log NDocstot NDocs(t) is the total number of documents in the collection, and NDocs(t) is cuments that contains the term t. tes the term. The Fcfound score of a term does not depend on the docu- computed using our database of catchphrases from all the corpus (2816 req is the previous Fcfound score, multiplied by the number of occurrences the document: Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc) ia is the ratio between the number of occurrences of the term in the present the average number of occurrences of the term in the collection: Freqmedia(t) = NOccur(t, doc) AV Galldoc(NOccur(t)) the standard TFIDF measure: TFIDF(t) = Freq(t, doc) · log NDocstot NDocs(t) tot is the total number of documents in the collection, and NDocs(t) is documents that contains the term t. q and Thresnocc: using our data base of catchphrases and documents, for se extraction in a given document; then we score sentences based on se identified terms. Note that in the computation of all the methods, ed and stopword filtered. ratio between how many times (that is in how many documents) a n the catchphrases and in the text of the case, and how many times in Fcfound(t) = NDocstextcatchp.(t) NDocstext(t) he term. The Fcfound score of a term does not depend on the docu- puted using our database of catchphrases from all the corpus (2816 the previous Fcfound score, multiplied by the number of occurrences ocument: cfoundfreq(t) = Fcfound(t) · NOccur(t, doc) he ratio between the number of occurrences of the term in the present verage number of occurrences of the term in the collection: Freqmedia(t) = NOccur(t, doc) AV Galldoc(NOccur(t)) tandard TFIDF measure: How likely to appear in catchphrases if in text same, weighted by frequency how often the term occurs with respect to the average frequency * inverse document frequency 13
  • 14. Methods (2) Threshold: select the best threshold on number of occurrences (or frequency) in the text, above which the term appears in catchphrases (computed from the corpus) MyScore: - collect all the sentences (S) that contain any of the 10 most frequent terms in the document - score all the terms t in S by their frequency and ratio NOccur(t,S)/NOccur(t,doc) 14
  • 15. CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2010 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- --------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------- ------------------------------------------- Catchphrases:...............,.............. ..........,...............,..............,........... Corpus Term Statistics Target DocumentCandidate Sentences Target Catchphrases Linguistic pre-processing Statistics computaton Citation analysis Sentence scoring Evaluation Rouge 15
  • 16. Automatic quantification of similarity to a reference summary • ROUGE-1: count common tokens • ROUGE-SU: count common skip-bigrams: in-order pairs of words, allowing for gaps • ROUGE-W: longest common subsequence, with rewards for consecutive matches Evaluation: Rouge 16
  • 17. Sentences: 1 The Tribunalʼs review was conducted under s 500 of the Act. 2 It did set the scene, though, for the applicantʼs apprehended bias challenge. 3 The first is whether the Tribunalʼs decision was vitiated by a reasonable apprehension of bias. 4 The first respondent pay the applicantʼs costs of the application. 5 This was relevant to the issue of whether the applicant did not pass the character test. 6 The decision of the Tribunal be set aside. Catchphrases 1 reasonable apprehension of bias 2 Tribunal member issued listening device warrant directed at applicant seven months prior to appeal hearing before same Tribunal member regarding refusal of visa 4 denial of procedural fairness 5 decision to issue warrant required forming a view about a possible criminal offence by applicant 6 applicant refused visa under s 501 of the Migration Act 1958 (Cth) 7 incompatible functions performed in the circumstances 8 need to preserve integrity of Tribunalʼs procedures 9 fair minded lay observer could reasonably entertain an apprehension of bias 10 Tribunalʼs decision to be set aside Evaluation (on the catchphrase) is higher than a threshold, the catchph ered a match. For example if we have a 10-word catchphra sentence; if they have 6 words in common, we consider th with threshold 0.5, but not a match with a threshold of 0.7 from the catchphrase to appear in the sentence). For a singl precision and recall for a set of extracted sentences as: Recall = MatchedCatchphrases TotalCatchphrases Precision = The recall is the number of catchphrases matched by at divided by the total number of catchphrases, the precision extracted which match at least one catchphrase, divided by tences. This evaluation procedure lets us measure the perfo n a threshold, the catchphrase-sentence pair is consid- have a 10-word catchphrase, and a 15 words candidate common, we consider this as a match using Rouge-1 h with a threshold of 0.7 (requiring at least 7/10 words the sentence). For a single document, we can compute xtracted sentences as: phrases rases Precision = MatchedSentences ExtractedSentences hphrases matched by at least one extracted sentence, atchphrases, the precision is the number of sentences e catchphrase, divided by the number of extracted sen- Sentences: 1 The Tribunalʼs review was conducted under s 500 of the Act. 2 It did set the scene, though, for the applicantʼs apprehended bias challenge. 3 The first is whether the Tribunalʼs decision was vitiated by a reasonable apprehension of bias. 4 The first respondent pay the applicantʼs costs of the application. 5 This was relevant to the issue of whether the applicant did not pass the character test. 6 The decision of the Tribunal be set aside. Catchphrases 1 reasonable apprehension of bias 2 Tribunal member issued listening device warrant directed at applicant seven months prior to appeal hearing before same Tribunal member regarding refusal of visa 4 denial of procedural fairness 5 decision to issue warrant required forming a view about a possible criminal offence by applicant 6 applicant refused visa under s 501 of the Migration Act 1958 (Cth) 7 incompatible functions performed in the circumstances 8 need to preserve integrity of Tribunalʼs procedures 9 fair minded lay observer could reasonably entertain an apprehension of bias 10 Tribunalʼs decision to be set aside 4/4 Tokens 5/6 Skip-Bigrams Recall = 2/10 Precision=2/6 3/3 Tokens 3/3 Skip-Bigrams 17
  • 18. Results on training set (precision and recall) 18
  • 19. Results on test set (precision and recall) 19
  • 20. Methods-citations • Creation of citation corpus: for each document, collect all citphrases and citances (if any) • Citances or citphrases used as candidate catchphrases (CsOnly, CpOnly): • ranking by “centrality”, similarity based on Rouge scores • HITS: hubs and authorities scores • Citances or citphrases used to extract sentences from the target document (CsSent, CpSent) • sentences ranked by average similarity with citation text • HITS: hubs and authorities scores 20
  • 21. 1 federal court of australia act 1976 (cth), s. 23 2 federal court 3 whether inherent jurisdiction or implied incidental power or express discretionary power corporations no prior approval under s 477(2B) of Corporations Act 2001 (Cth) implied incidental powers of Court prior to approve agreement Federal Court implied incidental power 4 corporations 5 failure by liquidator to obtain approval to enter litigation funding agreement under s 477(2b) of the corporations act 2001 (cth) 6 application for leave nunc pro tunc pursuant to corporations act 2001 (cth), s 477(2b) approving liquidators' entry into costs agreement with firm of solicitors Catchphrases Citphrases 21
  • 22. 1 The Court may in the exercise of its implied incidental power and its power under s 23 of the Federal Court of Australia Act 1976 (Cth) (the Federal Court Act), approve the Agreement. 1 federal court of australia act 1976 (cth), s. 23 2 federal court 3 whether inherent jurisdiction or implied incidental power or express discretionary power corporations no prior approval under s 477(2B) of Corporations Act 2001 (Cth) implied incidental powers of Court prior to approve agreement Federal Court implied incidental power 2 For that reason the prior approval of the Court, a resolution of creditors or the approval of a committee of inspectors was required under s 477(2B) of the Corporations Act 2001 (Cth) (the Act) before the Agreement was entered into. 4 corporations 5 failure by liquidator to obtain approval to enter litigation funding agreement under s 477(2b) of the corporations act 2001 (cth) 6 application for leave nunc pro tunc pursuant to corporations act 2001 (cth), s 477(2b) approving liquidators' entry into costs agreement with firm of solicitors Catchphrases Citphrases Sentences 22
  • 23. Precision and Recall of different methods 23
  • 24. CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- CASE 2008 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- --------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------- ------------------------------------------- Catchphrases:...............,.............. ..........,...............,..............,........... Corpus Term Statistics Frequency Information Citation Analysis Linguistic Processing Candidate Sentences Target Catchphrases Evaluation Rule Base CASE 2010 FCA XX Catchphrases:...............,.............. ..........,...............,..............,........... Decision: -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- -------------------------------------------- -------------------------------------------- -------------------------------------------- ------------------------------------------- Target Document Knowledge Acquisition Testing Rouge R0: If (True) then followed/ applied except except R1e1: If (Pattern2) then followed/ applied R1: If (Pattern1) then distinguished Pattern2: {Pattern1match.contains NEGATION} Pattern1: ( {Case} | {Refcase} ) (({Token})[0,8] | {Split} ) {BE} ({Token})[0,4] {Token.string==”different” } if not R2: If (Pattern2) then distinguished Pattern2: ( {Case} | {Refcase} ) ({Token})[0,8] {HAVE} {Token.string==”no”} ({Token})[0,4] {Token.string==”application ”} if not 24
  • 26. CORPORATIONS – winding up – court-appointed liquidators – entry into agreement – able to subsist more than three months – no prior approval under s 477(2B) of Corporations Act 2001 (Cth) – application to extend period for approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable – power of Court under s 479(3) to direct liquidator – liquidator directed to act on agreement as though approved – implied incidental powers of Court – prior to approve agreement – power under s 1322(4)(a) to declare entry into agreement and agreement not invalid COSTS –– proper approach to admiralty and commercial litigation –– goods transported under bill of lading incorporating Himalaya clause –– shipper and consignee sued ship owner and stevedore for damage to cargo –– stevedore successful in obtaining consent orders on motion dismissing proceedings against it based on Himalaya clause –– stevedore not furnishing critical evidence or information until after motion filed –– whether stevedore should have its costs –– importance of parties cooperating to identify the real issues in dispute –– duty to resolve uncontentious issues at an early stage of litigation –– stevedore awarded 75% of its costs of the proceedings 26
  • 27. Table 1. The 20 most frequent labels Label Counts PRACTICE AND PROCEDURE 661 MIGRATION 518 CORPORATIONS 295 ADMINISTRATIVE LAW 235 COSTS 170 TRADE PRACTICES 161 INDUSTRIAL LAW 93 BANKRUPTCY 86 NATIVE TITLE 79 TAXATION 73 INTELLECTUAL PROPERTY 61 EVIDENCE 56 CONTRACT 46 CORPORATIONS LAW 36 INCOME TAX 34 COPYRIGHT 27 PROCEDURE 24 CONTRACTS 24 MIGRATION LAW 24 EQUITY 23 ... ... 4.2 Nearest Neighbour 27
  • 28. Attributes • CitCase: • How many time the label is given in related cases (cited or citing) • How many time the label is given in related cases in percent of the total • The rank of the label in related cases (i.e. it is the first or second most present). • CitLegis: • How many time the legislation occurs with the label, vs how many time occurs without • number of legislation which satisfy the previous E.g.: RANK(“native title”)=0 28
  • 29. Attributes - terms • TF: how many time the term(s) occur in the document • TFIDF: tfidf rank of the term in the document • CitSen: how many times the term occurs in sentences about the target case • CitCp: how many times the term occurs in all the catchphrases of other documents that cite or are cited by the target case • CitAct: how many time the term occurs in the titles of the acts cited by the target case E.g.: TF(“native title”)=1 29
  • 30. Table 2: Examples of statistics for one condition. Tf(native title)=1.0 and Rank(native title)=0 − label=native title 1: Matches= 54/57=0.947 new= 27/28=0.964 2: Total ‘native title’ = 79 matched= 54/79=0.683 3: Errors: ‘aborigines’: 2, ‘aboriginals’: 1, ‘costs’: 1 4: Probability Random improvement= 4.18e-80 Row1:The rule matches 57 cases, of which 54 (94.7%) are correct (have ”native title” label). Of this only 28 (27 correct) were not matched by the rules already in the KB. Row2:The total of cases with label native title is 79 (so this rule cover 54/79=68.3%). If the conclusion was generic this row would give a list of labels posted by the rule. Row3: The three cases which are labelled uncorrectly have labels ‘aborigines’ (twice), ‘aboriginals’ (once) and ‘cost’ (once) (one of the cases has two labels). Row4: The probability that the improvement given by the rule is random is 10e-80. obtain 1320 labels, of which 1185 (90%) are correct. The reason why no more rules where inserted in the knowledge base is that after a while it is difficult 30
  • 32. Approaches • Knowledge Acquisition: 27 rules manually created • CitLeg: take the label most common in cited/citing cases + those strongly associated with the legislation • Machine Learning with bag of words representation (Naive Bayes, SVM) • Machine Learning with the identified features (Naive Bayes, SVM) 32
  • 33. Training set Labels Precision Recall F KB CitLeg KB+CitLeg KB+NN 0.73 0.820 0.484 0.609 1.44 0.562 0.631 0.595 1.07 0.694 0.598 0.643 1.14 0.675 0.621 0.647 Test set Labels Precision Recall F KB CitLeg KB+CitLeg KB+NN 0.70 0.770 0.459 0.575 1.32 0.555 0.600 0.577 1.03 0.667 0.578 0.620 1.16 0.631 0.629 0.630 33
  • 34. Training set Labels Precision Recall F KB CitLeg KB+CitLeg KB+NN NB bow SVM bow NB features SVM features 0.73 0.820 0.484 0.609 1.44 0.562 0.631 0.595 1.07 0.694 0.598 0.643 1.14 0.675 0.621 0.647 1.00 0.462 0.371 0.411 1.00 0.257 0.207 0.229 1.00 0.610 0.490 0.543 1.00 0.997 0.801 0.889 Test set Labels Precision Recall F KB CitLeg KB+CitLeg KB+NN NB bow SVM bow NB features SVM features 0.70 0.770 0.459 0.575 1.32 0.555 0.600 0.577 1.03 0.667 0.578 0.620 1.16 0.631 0.629 0.630 1.00 0.492 0.421 0.454 1.00 0.314 0.269 0.290 1.00 0.194 0.166 0.179 1.00 0.259 0.222 0.239 34
  • 36. CORPORATIONS – winding up – court-appointed liquidators – entry into agreement – able to subsist more than three months – no prior approval under s 477(2B) of Corporations Act 2001 (Cth) – application to extend period for approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable – power of Court under s 479(3) to direct liquidator – liquidator directed to act on agreement as though approved – implied incidental powers of Court – prior to approve agreement – power under s 1322(4)(a) to declare entry into agreement and agreement not invalid COSTS –– proper approach to admiralty and commercial litigation –– goods transported under bill of lading incorporating Himalaya clause –– shipper and consignee sued ship owner and stevedore for damage to cargo –– stevedore successful in obtaining consent orders on motion dismissing proceedings against it based on Himalaya clause –– stevedore not furnishing critical evidence or information until after motion filed –– whether stevedore should have its costs –– importance of parties cooperating to identify the real issues in dispute –– duty to resolve uncontentious issues at an early stage of litigation –– stevedore awarded 75% of its costs of the proceedings 36
  • 37. Attributes(terms)-1 • Sentence must contains at least n terms (within a m distance), given: • TF: the number of occurrences of the term in this document. • AvgOcc: the average number of occurrences of the term in the corpus • DF: computed as the number of document in which the term appear at least once divided by the total number of documents. • TFIDF: computed as the rank of the term in the document • CpOcc: how many times the term occurs in the set of all the known catchphrases present in the corpus. • The FcFound score: ratio between how many times (that is in how many documents) the term appears both in the catchphrases and in the text of the case, and how many times in the text 37
  • 38. Attributes(terms)-2 • Sentence must contains at least n terms (within a m distance), given: • CitSen: how many times the term occurs in all the sentences (from other documents) that cite the target case. • CitCp: how many times the term occurs in all the catcphrases of other documents that cite or are cited by the target case. • CitLeg: how many times the term occurs in the section titles of the legislation cited by the target case. • POS: the part of speech of the term • PrpNoun: if the term is a proper noun • Legal: set of legal terms extracted form (Olsson, 1999) • Cue: set of cue words statistically extracted 38
  • 39. Attributes(sentence)-1 • Sentence must satisfy: • HasCitCase: contains at least n citations • optionally considering only those cited at least m times • HasCitLaw: contains at least n reference to legislation • optionally considering only those cited at least m times • PhraseCit: contains n terms that must occur in one citphrase • optionally considering only those cited at least m times • PhraseLaw: contains n terms that must occur in the title of one cited section • optionally considering only those cited at least m times 39
  • 40. Attributes(sentence)-2 • Sentence must satisfy: • Length: minimum/maximum length • Position: first n or last m sentences • Looking for one or more specific term(s): • for which we can specify all term constraints • i.e. cost, with tf(cost)10 and citcp(cost)5 • Looking for a sequence of words in the same order • (i.e.‘native title exist|claim') 40
  • 41. Knowledge Acquisition • A user create rules looking at examples (sentences and catchphrases) • Interface shows the catchphrases, and • frequency/statistical information • citations information • linguistic information • mined patterns • As attributes are selected, the user is guided by: • example of other correct/incorrect matches • statistics on rule performance 41
  • 42. Example As might have been expected, the bill of lading contains a “Himalaya” clause in the widest terms which is usual in such transactions. goods transported under bill of lading incorporating Himalaya clause SENTENCE contains at least 2 terms with CpOcc 1 and FcFound 0.1 andCitCp1andTFIDF 4and AvgOcc 1 SENTENCE also contains at least 2 terms with CpOcc 20 and FcFound 0.02 and CitCp 1 and isLegal and TFIDF 16 Catchphrase Sentence Rule 42
  • 43. SENTENCE contains at least 2 terms with CpOcc 1 and FcFound 0.1 andCitCp1andTFIDF 4and AvgOcc 1 Example Matches 347/429 sentences (p=0.81) in 339 files Catchphrases covered: 331 prob(r_i)=10e-19 That is to say, the Tribunal had to determine whether the applicant was, by reason of his war-caused incapacity alone, prevented from continuing to undertake remunerative work that he had been undertaking. SENTENCE also contains at least 2 terms with CpOcc 20 and FcFound 0.02 and CitCp 1 and isLegal and TFIDF 16 Catchphrases Sentence Rule - whether applicantʼs war-caused incapacity alone prevented him from continuing to undertake remunerative work he had been undertaking - whether Tribunal took wrong approach to determining what was remunerative work that applicant was prevented from continuing to undertake SENTENCE contains at least 2 terms with CpOcc 1 and FcFound 0.1 andCitCp1andTFIDF 4and AvgOcc 1 SENTENCE also contains at least 2 terms with CpOcc 20 and FcFound 0.02 and CitCp 1 and isLegal and TFIDF 16 43
  • 44. Performances of the KB as rule are added: train set 44
  • 45. Performances of the KB as rule are added: test set 45
  • 46. Results Training set Precision Recall F KB Citations KB+CIT LexRank Random 0.785 0.410 0.538 0.701 0.513 0.592 0.744 0.576 0.650 0.508 0.392 0.443 0.267 0.223 0.243 Test set Precision Recall F KB Citations KB+CIT LexRank Random 0.738 0.387 0.507 0.684 0.507 0.582 0.702 0.568 0.628 0.501 0.442 0.470 0.263 0.249 0.255 46
  • 47. • Corpus of 2816 cases with citation information, an approximate evaluation to compare methods • Different kinds of techniques are combined using rules • Manual Evaluation Conclusion 47
  • 49. 49