JavaScript Usage Statistics 2024 - The Ultimate Guide
Combining Different Summarization Techiniques for Legal Text
1. Computer Science and Engineering
Combining Different
Summarization Techniques for
Legal Text
Filippo Galgani
Paul Compton
Achim Hoffmann
School of Computer Science and Engineering
Faculty of Engineering
University of New South Wales (Australia)
AustLII Research Seminars
1
2. Automatic Summarization
• Automatically create a shorter version of one or more texts
• Purpose: fight information overload
• Judge if a document is relevant to a topic of interest
• Advantages: Dynamic summaries
• Applications: news, scientific articles, emails,
social media streams, websites, speech
2
5. Automatic Summarization
• Automatic summarization supports:
• locating the documents of interest among
the large collections
• supporting manual curation
• help the lay user accessing legal text
• Legal texts, a challenging domain with the need for automatic language
processing techniques
• Case reports: long and often unstructured documents
5
6. CORPORATIONS – winding up – court-appointed liquidators – entry into
agreement – able to subsist more than three months – no prior approval under
s 477(2B) of Corporations Act 2001 (Cth) – application to extend "period" for
approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable
– power of Court under s 479(3) to direct liquidator – liquidator directed to act
on agreement as though approved – implied incidental powers of Court – prior
to approve agreement – power under s 1322(4)(a) to declare entry into
agreement and agreement not invalid
COSTS –– proper approach to admiralty and commercial litigation –– goods
transported under bill of lading incorporating Himalaya clause –– shipper and
consignee sued ship owner and stevedore for damage to cargo –– stevedore
successful in obtaining consent orders on motion dismissing proceedings
against it based on Himalaya clause –– stevedore not furnishing critical
evidence or information until after motion filed –– whether stevedore should
have its costs –– importance of parties cooperating to identify the real issues in
dispute –– duty to resolve uncontentious issues at an early stage of litigation
–– stevedore awarded 75% of its costs of the proceedings
6
7. Corpus
• Training: 2816 FCA cases (2007-2008-2009) with
given catchphrases
• Plus citations (10.4 related cases on average)
• Test: 1074 FCA cases (2006) with given
catchphrases
• Plus citations (11.15 related cases on
average)
7
8. Pre-processing
• Catchphrases extraction (regular expression): 8
catchphrases on average (1.24 first level), total of 22755
phrases (19251+3504):
• 16566 different phrases
• 15359 (66.1%) occurs only once
• NLTK: sentence splitter, tokenizer, stopword filtering,
stemming, POS tagging
• On average (training):
• 221 sentences (total 622366)
• 7478 words per document (total 21 millions), 34 per
sentence (not filtered)
8
9. CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
---------------------under s SSS of
the AAAAA act ------------------------
--------------------------------------------
----------------2006 FCA YY----------
-------------------------------------------
--------------------2002 HCA JJ------
--------------------------------------------
--------------------------------------------
-------------------------------------------
Target Document
Target
Catchphrases
9
12. CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2010 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
-------------------------------------------
Catchphrases:...............,..............
..........,...............,..............,...........
Corpus
Term Statistics
Target DocumentCandidate Sentences
Target Catchphrases
Linguistic pre-processing
Statistics computaton
Citation analysis
Sentence scoring
Evaluation
Rouge
12
13. Methods (1)Fcfound(t) =
NDocstext&catchp.(t)
NDocstext(t)
cates the term. The Fcfound score of a term does not depend on the docu-
s computed using our database of catchphrases from all the corpus (2816
freq is the previous Fcfound score, multiplied by the number of occurrences
n the document:
Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
dia is the ratio between the number of occurrences of the term in the present
d the average number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
AV Galldoc(NOccur(t))
s the standard TFIDF measure:
TFIDF(t) = Freq(t, doc) · log
NDocstot
NDocs(t)
cstot is the total number of documents in the collection, and NDocs(t) is
Fcfound(t) =
NDocstextcatchp.(t)
NDocstext(t)
s the term. The Fcfound score of a term does not depend on the docu-
omputed using our database of catchphrases from all the corpus (2816
q is the previous Fcfound score, multiplied by the number of occurrences
document:
Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
is the ratio between the number of occurrences of the term in the present
e average number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
AV Galldoc(NOccur(t))
e standard TFIDF measure:
TFIDF(t) = Freq(t, doc) · log
NDocstot
NDocs(t)
is the total number of documents in the collection, and NDocs(t) is
cuments that contains the term t.
tes the term. The Fcfound score of a term does not depend on the docu-
computed using our database of catchphrases from all the corpus (2816
req is the previous Fcfound score, multiplied by the number of occurrences
the document:
Fcfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
ia is the ratio between the number of occurrences of the term in the present
the average number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
AV Galldoc(NOccur(t))
the standard TFIDF measure:
TFIDF(t) = Freq(t, doc) · log
NDocstot
NDocs(t)
tot is the total number of documents in the collection, and NDocs(t) is
documents that contains the term t.
q and Thresnocc: using our data base of catchphrases and documents, for
se extraction in a given document; then we score sentences based on
se identified terms. Note that in the computation of all the methods,
ed and stopword filtered.
ratio between how many times (that is in how many documents) a
n the catchphrases and in the text of the case, and how many times in
Fcfound(t) =
NDocstextcatchp.(t)
NDocstext(t)
he term. The Fcfound score of a term does not depend on the docu-
puted using our database of catchphrases from all the corpus (2816
the previous Fcfound score, multiplied by the number of occurrences
ocument:
cfoundfreq(t) = Fcfound(t) · NOccur(t, doc)
he ratio between the number of occurrences of the term in the present
verage number of occurrences of the term in the collection:
Freqmedia(t) =
NOccur(t, doc)
AV Galldoc(NOccur(t))
tandard TFIDF measure:
How likely to appear in
catchphrases if in text
same, weighted by frequency
how often the term occurs with
respect to the average
frequency * inverse
document frequency
13
14. Methods (2)
Threshold: select the best threshold on number of
occurrences (or frequency) in the text, above which the term
appears in catchphrases (computed from the corpus)
MyScore:
- collect all the sentences (S) that contain any of the 10 most
frequent terms in the document
- score all the terms t in S by their frequency and ratio
NOccur(t,S)/NOccur(t,doc)
14
15. CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2008 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
CASE 2010 FCA XX
Catchphrases:...............,..............
..........,...............,..............,...........
Decision:
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
-------------------------------------------
--------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
-------------------------------------------
Catchphrases:...............,..............
..........,...............,..............,...........
Corpus
Term Statistics
Target DocumentCandidate Sentences
Target Catchphrases
Linguistic pre-processing
Statistics computaton
Citation analysis
Sentence scoring
Evaluation
Rouge
15
16. Automatic quantification of similarity to a reference
summary
• ROUGE-1: count common tokens
• ROUGE-SU: count common skip-bigrams: in-order
pairs of words, allowing for gaps
• ROUGE-W: longest common subsequence, with
rewards for consecutive matches
Evaluation: Rouge
16
17. Sentences:
1 The Tribunalʼs review was conducted under s 500
of the Act.
2 It did set the scene, though, for the applicantʼs
apprehended bias challenge.
3 The first is whether the Tribunalʼs decision was
vitiated by a reasonable apprehension of bias.
4 The first respondent pay the applicantʼs costs of the
application.
5 This was relevant to the issue of whether the
applicant did not pass the character test.
6 The decision of the Tribunal be set aside.
Catchphrases
1 reasonable apprehension of bias
2 Tribunal member issued listening device warrant
directed at applicant seven months prior to appeal
hearing before same Tribunal member regarding
refusal of visa
4 denial of procedural fairness
5 decision to issue warrant required forming a view
about a possible criminal offence by applicant
6 applicant refused visa under s 501 of the Migration
Act 1958 (Cth)
7 incompatible functions performed in the
circumstances
8 need to preserve integrity of Tribunalʼs procedures
9 fair minded lay observer could reasonably entertain
an apprehension of bias
10 Tribunalʼs decision to be set aside
Evaluation
(on the catchphrase) is higher than a threshold, the catchph
ered a match. For example if we have a 10-word catchphra
sentence; if they have 6 words in common, we consider th
with threshold 0.5, but not a match with a threshold of 0.7
from the catchphrase to appear in the sentence). For a singl
precision and recall for a set of extracted sentences as:
Recall =
MatchedCatchphrases
TotalCatchphrases
Precision =
The recall is the number of catchphrases matched by at
divided by the total number of catchphrases, the precision
extracted which match at least one catchphrase, divided by
tences. This evaluation procedure lets us measure the perfo
n a threshold, the catchphrase-sentence pair is consid-
have a 10-word catchphrase, and a 15 words candidate
common, we consider this as a match using Rouge-1
h with a threshold of 0.7 (requiring at least 7/10 words
the sentence). For a single document, we can compute
xtracted sentences as:
phrases
rases
Precision =
MatchedSentences
ExtractedSentences
hphrases matched by at least one extracted sentence,
atchphrases, the precision is the number of sentences
e catchphrase, divided by the number of extracted sen-
Sentences:
1 The Tribunalʼs review was conducted under s 500
of the Act.
2 It did set the scene, though, for the applicantʼs
apprehended bias challenge.
3 The first is whether the Tribunalʼs decision was
vitiated by a reasonable apprehension of bias.
4 The first respondent pay the applicantʼs costs of the
application.
5 This was relevant to the issue of whether the
applicant did not pass the character test.
6 The decision of the Tribunal be set aside.
Catchphrases
1 reasonable apprehension of bias
2 Tribunal member issued listening device warrant
directed at applicant seven months prior to appeal
hearing before same Tribunal member regarding
refusal of visa
4 denial of procedural fairness
5 decision to issue warrant required forming a view
about a possible criminal offence by applicant
6 applicant refused visa under s 501 of the Migration
Act 1958 (Cth)
7 incompatible functions performed in the
circumstances
8 need to preserve integrity of Tribunalʼs procedures
9 fair minded lay observer could reasonably entertain
an apprehension of bias
10 Tribunalʼs decision to be set aside
4/4 Tokens
5/6 Skip-Bigrams
Recall = 2/10
Precision=2/6
3/3 Tokens
3/3 Skip-Bigrams
17
20. Methods-citations
• Creation of citation corpus: for each document, collect all citphrases and
citances (if any)
• Citances or citphrases used as candidate catchphrases (CsOnly, CpOnly):
• ranking by “centrality”, similarity based on Rouge scores
• HITS: hubs and authorities scores
• Citances or citphrases used to extract sentences from the target
document (CsSent, CpSent)
• sentences ranked by average similarity with citation text
• HITS: hubs and authorities scores
20
21. 1 federal court of australia act 1976 (cth), s. 23
2 federal court
3 whether inherent jurisdiction or implied
incidental power or express discretionary power
corporations
no prior approval under s 477(2B) of
Corporations Act 2001 (Cth)
implied incidental powers of Court
prior to approve agreement
Federal Court
implied incidental power
4 corporations
5 failure by liquidator to obtain approval to enter
litigation funding agreement under s 477(2b) of the
corporations act 2001 (cth)
6 application for leave nunc pro tunc pursuant to
corporations act 2001 (cth), s 477(2b) approving
liquidators' entry into costs agreement with firm of
solicitors
Catchphrases
Citphrases
21
22. 1 The Court may in the exercise of its implied
incidental power and its power under s 23 of the
Federal Court of Australia Act 1976 (Cth) (the
Federal Court Act), approve the Agreement.
1 federal court of australia act 1976 (cth), s. 23
2 federal court
3 whether inherent jurisdiction or implied incidental
power or express discretionary power
corporations
no prior approval under s 477(2B) of
Corporations Act 2001 (Cth)
implied incidental powers of Court
prior to approve agreement
Federal Court
implied incidental power
2 For that reason the prior approval of the Court, a
resolution of creditors or the approval of a committee
of inspectors was required under s 477(2B) of the
Corporations Act 2001 (Cth) (the Act) before the
Agreement was entered into.
4 corporations
5 failure by liquidator to obtain approval to enter
litigation funding agreement under s 477(2b) of the
corporations act 2001 (cth)
6 application for leave nunc pro tunc pursuant to
corporations act 2001 (cth), s 477(2b) approving
liquidators' entry into costs agreement with firm of
solicitors
Catchphrases
Citphrases
Sentences
22
26. CORPORATIONS – winding up – court-appointed liquidators – entry into
agreement – able to subsist more than three months – no prior approval under
s 477(2B) of Corporations Act 2001 (Cth) – application to extend period for
approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable
– power of Court under s 479(3) to direct liquidator – liquidator directed to act
on agreement as though approved – implied incidental powers of Court – prior
to approve agreement – power under s 1322(4)(a) to declare entry into
agreement and agreement not invalid
COSTS –– proper approach to admiralty and commercial litigation –– goods
transported under bill of lading incorporating Himalaya clause –– shipper and
consignee sued ship owner and stevedore for damage to cargo –– stevedore
successful in obtaining consent orders on motion dismissing proceedings
against it based on Himalaya clause –– stevedore not furnishing critical
evidence or information until after motion filed –– whether stevedore should
have its costs –– importance of parties cooperating to identify the real issues in
dispute –– duty to resolve uncontentious issues at an early stage of litigation
–– stevedore awarded 75% of its costs of the proceedings
26
27. Table 1. The 20 most frequent labels
Label Counts
PRACTICE AND PROCEDURE 661
MIGRATION 518
CORPORATIONS 295
ADMINISTRATIVE LAW 235
COSTS 170
TRADE PRACTICES 161
INDUSTRIAL LAW 93
BANKRUPTCY 86
NATIVE TITLE 79
TAXATION 73
INTELLECTUAL PROPERTY 61
EVIDENCE 56
CONTRACT 46
CORPORATIONS LAW 36
INCOME TAX 34
COPYRIGHT 27
PROCEDURE 24
CONTRACTS 24
MIGRATION LAW 24
EQUITY 23
... ...
4.2 Nearest Neighbour 27
28. Attributes
• CitCase:
• How many time the label is given in related cases (cited
or citing)
• How many time the label is given in related cases in
percent of the total
• The rank of the label in related cases (i.e. it is the first or
second most present).
• CitLegis:
• How many time the legislation occurs with the label, vs
how many time occurs without
• number of legislation which satisfy the previous
E.g.: RANK(“native title”)=0
28
29. Attributes - terms
• TF: how many time the term(s) occur in the document
• TFIDF: tfidf rank of the term in the document
• CitSen: how many times the term occurs in sentences
about the target case
• CitCp: how many times the term occurs in all the
catchphrases of other documents that cite or are cited by
the target case
• CitAct: how many time the term occurs in the titles of the
acts cited by the target case
E.g.: TF(“native title”)=1
29
30. Table 2: Examples of statistics for one condition.
Tf(native title)=1.0 and Rank(native title)=0 − label=native title
1: Matches= 54/57=0.947 new= 27/28=0.964
2: Total ‘native title’ = 79 matched= 54/79=0.683
3: Errors: ‘aborigines’: 2, ‘aboriginals’: 1, ‘costs’: 1
4: Probability Random improvement= 4.18e-80
Row1:The rule matches 57 cases, of which 54 (94.7%) are correct (have ”native
title” label). Of this only 28 (27 correct) were not matched by the rules already
in the KB.
Row2:The total of cases with label native title is 79 (so this rule cover
54/79=68.3%). If the conclusion was generic this row would give a list of labels
posted by the rule.
Row3: The three cases which are labelled uncorrectly have labels ‘aborigines’
(twice), ‘aboriginals’ (once) and ‘cost’ (once) (one of the cases has two labels).
Row4: The probability that the improvement given by the rule is random is
10e-80.
obtain 1320 labels, of which 1185 (90%) are correct. The reason why no more
rules where inserted in the knowledge base is that after a while it is difficult 30
32. Approaches
• Knowledge Acquisition: 27 rules manually created
• CitLeg: take the label most common in cited/citing
cases + those strongly associated with the legislation
• Machine Learning with bag of words representation
(Naive Bayes, SVM)
• Machine Learning with the identified features (Naive
Bayes, SVM)
32
33. Training set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
0.73 0.820 0.484 0.609
1.44 0.562 0.631 0.595
1.07 0.694 0.598 0.643
1.14 0.675 0.621 0.647
Test set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
0.70 0.770 0.459 0.575
1.32 0.555 0.600 0.577
1.03 0.667 0.578 0.620
1.16 0.631 0.629 0.630
33
34. Training set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
NB bow
SVM bow
NB features
SVM features
0.73 0.820 0.484 0.609
1.44 0.562 0.631 0.595
1.07 0.694 0.598 0.643
1.14 0.675 0.621 0.647
1.00 0.462 0.371 0.411
1.00 0.257 0.207 0.229
1.00 0.610 0.490 0.543
1.00 0.997 0.801 0.889
Test set Labels Precision Recall F
KB
CitLeg
KB+CitLeg
KB+NN
NB bow
SVM bow
NB features
SVM features
0.70 0.770 0.459 0.575
1.32 0.555 0.600 0.577
1.03 0.667 0.578 0.620
1.16 0.631 0.629 0.630
1.00 0.492 0.421 0.454
1.00 0.314 0.269 0.290
1.00 0.194 0.166 0.179
1.00 0.259 0.222 0.239
34
36. CORPORATIONS – winding up – court-appointed liquidators – entry into
agreement – able to subsist more than three months – no prior approval under
s 477(2B) of Corporations Act 2001 (Cth) – application to extend period for
approval under s 1322(4)(d) – no relevant period – s 1322(4)(d) not applicable
– power of Court under s 479(3) to direct liquidator – liquidator directed to act
on agreement as though approved – implied incidental powers of Court – prior
to approve agreement – power under s 1322(4)(a) to declare entry into
agreement and agreement not invalid
COSTS –– proper approach to admiralty and commercial litigation –– goods
transported under bill of lading incorporating Himalaya clause –– shipper and
consignee sued ship owner and stevedore for damage to cargo –– stevedore
successful in obtaining consent orders on motion dismissing proceedings
against it based on Himalaya clause –– stevedore not furnishing critical
evidence or information until after motion filed –– whether stevedore should
have its costs –– importance of parties cooperating to identify the real issues in
dispute –– duty to resolve uncontentious issues at an early stage of litigation
–– stevedore awarded 75% of its costs of the proceedings
36
37. Attributes(terms)-1
• Sentence must contains at least n terms (within a m distance),
given:
• TF: the number of occurrences of the term in this document.
• AvgOcc: the average number of occurrences of the term in
the corpus
• DF: computed as the number of document in which the term
appear at least once divided by the total number of documents.
• TFIDF: computed as the rank of the term in the document
• CpOcc: how many times the term occurs in the set of all the
known catchphrases present in the corpus.
• The FcFound score: ratio between how many times (that is in
how many documents) the term appears both in the
catchphrases and in the text of the case, and how many times in
the text
37
38. Attributes(terms)-2
• Sentence must contains at least n terms (within a m distance),
given:
• CitSen: how many times the term occurs in all the sentences
(from other documents) that cite the target case.
• CitCp: how many times the term occurs in all the catcphrases
of other documents that cite or are cited by the target case.
• CitLeg: how many times the term occurs in the section titles
of the legislation cited by the target case.
• POS: the part of speech of the term
• PrpNoun: if the term is a proper noun
• Legal: set of legal terms extracted form (Olsson, 1999)
• Cue: set of cue words statistically extracted
38
39. Attributes(sentence)-1
• Sentence must satisfy:
• HasCitCase: contains at least n citations
• optionally considering only those cited at least m times
• HasCitLaw: contains at least n reference to legislation
• optionally considering only those cited at least m times
• PhraseCit: contains n terms that must occur in one citphrase
• optionally considering only those cited at least m times
• PhraseLaw: contains n terms that must occur in the title of
one cited section
• optionally considering only those cited at least m times
39
40. Attributes(sentence)-2
• Sentence must satisfy:
• Length: minimum/maximum length
• Position: first n or last m sentences
• Looking for one or more specific term(s):
• for which we can specify all term constraints
• i.e. cost, with tf(cost)10 and citcp(cost)5
• Looking for a sequence of words in the same order
• (i.e.‘native title exist|claim')
40
41. Knowledge Acquisition
• A user create rules looking at examples (sentences and
catchphrases)
• Interface shows the catchphrases, and
• frequency/statistical information
• citations information
• linguistic information
• mined patterns
• As attributes are selected, the user is guided by:
• example of other correct/incorrect matches
• statistics on rule performance
41
42. Example
As might have been expected, the bill of lading contains
a “Himalaya” clause in the widest terms which is usual in
such transactions.
goods transported under bill of lading incorporating
Himalaya clause
SENTENCE contains at least 2 terms with CpOcc 1
and FcFound 0.1 andCitCp1andTFIDF 4and
AvgOcc 1
SENTENCE also contains at least 2 terms with CpOcc
20 and FcFound 0.02 and CitCp 1 and isLegal and
TFIDF 16
Catchphrase
Sentence
Rule
42
43. SENTENCE contains at least 2 terms
with CpOcc 1 and FcFound 0.1
andCitCp1andTFIDF 4and AvgOcc 1
Example
Matches 347/429 sentences (p=0.81)
in 339 files
Catchphrases covered: 331
prob(r_i)=10e-19
That is to say, the Tribunal had to determine whether the
applicant was, by reason of his war-caused incapacity
alone, prevented from continuing to undertake
remunerative work that he had been undertaking.
SENTENCE also contains at least 2 terms
with CpOcc 20 and FcFound 0.02 and
CitCp 1 and isLegal and TFIDF 16
Catchphrases
Sentence
Rule
- whether applicantʼs war-caused incapacity alone prevented him from
continuing to undertake remunerative work he had been undertaking
- whether Tribunal took wrong approach to determining what was
remunerative work that applicant was prevented from continuing to
undertake
SENTENCE contains at least 2 terms with CpOcc 1
and FcFound 0.1 andCitCp1andTFIDF 4and
AvgOcc 1
SENTENCE also contains at least 2 terms with CpOcc
20 and FcFound 0.02 and CitCp 1 and isLegal and
TFIDF 16
43
46. Results
Training set Precision Recall F
KB
Citations
KB+CIT
LexRank
Random
0.785 0.410 0.538
0.701 0.513 0.592
0.744 0.576 0.650
0.508 0.392 0.443
0.267 0.223 0.243
Test set Precision Recall F
KB
Citations
KB+CIT
LexRank
Random
0.738 0.387 0.507
0.684 0.507 0.582
0.702 0.568 0.628
0.501 0.442 0.470
0.263 0.249 0.255
46
47. • Corpus of 2816 cases with citation information, an
approximate evaluation to compare methods
• Different kinds of techniques are combined using rules
• Manual Evaluation
Conclusion
47