Slides.ltdca

On Semi-Supervised Learning
Of Legal Semantics
L. Thorne McCarty
Rutgers University

Three Papers
● 1998: Structured Casenotes: How Publishers Can Add Value to
Public Domain Legal Materials on the World Wide Web.
● 2007: Deep Semantic Interpretations of Legal Texts.
● 2015: How to Ground a Language for Legal Discourse in a
Prototypical Perceptual Semantics.
And a Proposal:
A research strategy to produce a computational summary
of a legal case, which can be scaled up to a realistic legal
corpus.

The Challenge
A structured casenote is a computational summary of the
procedural history of a case along with the substantive legal
conclusions articulated at each stage of the process. It would play
the same role in the legal information systems of the 21st century
that West Headnotes and Key Numbers have played in the 20th
century.
From my 1998 paper:
Why focus on procedural history?

The traditional case brief focuses on the procedural context first:
Who is suing whom, and for what? What is the plaintiff's legal
theory? What facts does the plaintiff allege to support this theory?
How does the defendant respond? How does the trial court
dispose of the case? What is the basis of the appeal? What
issues of law are presented to the appellate court? How does the
appellate court resolve these issues, and with what justification?
Think about the traditional “brief” that students are
taught to write in their first year of law school:
Within this procedural framework, we would represent
the substantive issues at stake in the decision.

● For the computational summary, we need an expressive
Knowledge Representation (KR) language.
● How can we build a database of structured casenotes at the
appropriate scale?
● Fully automated processing of legal texts?
● Semi-automated, with a human editor in the loop?
● For either approach, we need a Natural Language (NL)
technology that can handle the complexity of legal cases.
● But in 1998, neither the NL nor the KR technology was
sufficiently advanced.

Two Steps Toward a Solution:
ICAIL '07
Contributions:
● Showed that a “state-of-the-art statistical parser ... can handle
even the complex syntactic constructions of an appellate court
judge.”
● Showed that the “semantic interpretation of the full text of a
judicial opinion can be computed automatically from the output
of the parser.” Technical specifications:
● Quasi-Logical Form (QLF).
● Definite Clause Grammar (DCG).

She has also brought this ADA suit in which
she claims that her former employer, Policy
Management Systems Corporation,
discriminated against her on account of her
disability.
526 U.S. 795 (1999)

Terms:
term(lex, var, list)
...
“She has also brought this ADA suit ... “

The petitioner contends that the regulatory
takings claim should not have been decided by
the jury and that the Court of Appeals adopted an
erroneous standard for regulatory takings liability.
526 U.S. 687 (1999)
sterm(decided,C,[_,_])
...
AND
sterm(adopted,J,[_,_])
...
[modal(should),negative,perfect,passive]

The court ruled that sufficient evidence had
been presented to the jury from which it
reasonably could have decided each of these
questions in Del Monte Dunes' favor.
526 U.S. 687 (1999)

Semantics of 'WDT' and 'WHNP': W^nterm(which,W,[])
Semanticsof 'IN': Obj^Subj^P^pterm(in,P,[Subj,Obj])
Unify: Obj = nterm(which,W,[])

Term = pterm(in,P,[Subj,Obj])
Semanticsof 'WHPP':
W^Subj^P^pterm(in,P,[Subj,nterm(which,W,[])])

Semantics of 'S': E^sterm(claims,E,[_,_])
Unify: Term = pterm(in,P,[E,nterm(which,W,[])])
Tense = [present]
Semanticsof 'SBAR':
W^(E^(P^pterm(in,P,[E,nterm(which,W,[])]) &

sterm(claims,E,[_,_]))/[present])

● How accurate are these semantic interpretations?
● Unfortunately, we do not have the data to answer this
question.
● Consider a different strategy:
● Write hand-coded extraction patterns to map information
from the QLF interpretations into the format of a structured
casenote.
● Generalize these extraction patterns by the unsupervised
learning of the legal semantics implicit in a large set of
unannotated legal cases.
● The total system would thus be engaged in a form of
semi-supervised learning of legal semantics.

Two Steps Toward a Solution:
ICAIL '15
● New Article (less technical, more intuitive):
“How to Ground a Language for Legal Discourse in a
Prototypical Perceptual Semantics”
(An edited transcript of a presentation at the Legal Quanta
Symposium at Michigan State University College of Law on
October 29, 2015)
Forthcoming in 2016 Michigan State Law Review _____.
Includes links to my more technical papers.

● Prototype Coding:
● The basic idea is to represent a point in an n-dimensional
space by measuring its distance from a prototype in several
specified directions.
● Furthermore, assuming that our initial space is Euclidean,
we want to select a prototype that lies at the origin of an
embedded, low-dimensional, nonlinear subspace, which is in
some sense “optimal”.
● The second point leads to a theory of
● Manifold Learning
● Deep Learning
● The theory has three components, drawn from:
Probability, Geometry, Logic.

● The Probabilistic Model:
This is a diffusion process determined by a potential function,
U(x), and its gradient, ∇U(x), in an arbitrary n-dimensional
Euclidean space.
The invariant probability measure for the diffusion process is
proportional to , which means that ∇U(x) is proportional to
the gradient of the log of the stationary probability density.
e
2U x

● The Geometric Model:
This is a Riemannian manifold with a Riemannian metric, ,
which we interpret as a measure of dissimilarity.
Using this dissimilarity metric, we can define a radial coordinate,
ρ, and the directional coordinates, θ1
, θ2
,...,θn– 1
, in our original n-
dimensional space, and then compute an optimal nonlinear k-
dimensional subspace.
The radial coordinate is defined to follow the gradient vector,
∇U(x), and the directional coordinates are defined to be
orthogonal to ∇U(x).
gij  x

7X7
patch
60,000 images 600,000 patches
49 dimensions
12 dimensions
sample
scan
encode
scan
14X14
patch
48 dimensions
encode
12 dimensions
encode Category: 4
12 dimensions48 dimensions

● is estimated from
the data using the mean
shift algorithm.
● at a prototype.
● The prototypical clusters
partition the space of
600,000 patches.
∇ U x
∇ U x=0
35 Prototypes

Prototype 09
Prototype 27
Prototype 30
Principal Axes
ρ

Geodesic Coordinate Curves
θ
θ

● The Logical Language:
The proposed logical language is a categorical logic based on
the category of differential manifolds (Man), which is weaker
than a logic based on the category of sets (Set) or the category
of topological spaces (Top).
For an intuitive understanding of what this means, assume that
we have replaced the standard semantics of classical logic,
based on sets and their elements, with a semantics based on
manifolds and their points. The atomic formulas can then be
interpreted as prototypical clusters, and the geometric properties
of these clusters can be propagated throughout the rest of the
language.
The same strategy can be applied to the entirety of my
Language for Legal Discourse (LLD).

Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
Geometric model is constrained by
the probabilistic model.
Probability measure is constrained by the data.
Conjecture: The existence of these mutual constraints makes
It possible to learn the semantics of a complex knowledge
representation language.

● Why is this a “prototypical perceptual semantics”?
● It is a prototypical semantics because it is based on a
representation of prototypical clusters.
● It is a prototypical perceptual semantics because the primary
illustrations of the theory are drawn from the field of image
processing.
● Claim: If we can build a logical language on these
foundations, we will have a plausible account of how
human cognition could be grounded in human
perception.

Can We Learn
A Grounded Semantics
Without a Perceptual Ground?
● Two reasons to think this is possible:
● The theory of differential similarity is not really sensitive to
the precise details of the representations used at the lower
levels.
● There is increasing evidence that the semantics of lexical
items can be represented, approximately, as a vector in a
high-dimensional vector space, using only the information
available in the texts.

● Research Strategy:
● We initialize our model with a word embedding computed
from legal texts.
● We learn the higher level concepts in a legal domain by
applying the theory of differential similarity.
● Discussion?

Slides.ltdca

More Related Content

What's hot

Similar to Slides.ltdca

Recently uploaded

Slides.ltdca