How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics

How to Ground
A Language for Legal Discourse
In a Prototypical Perceptual Semantics
L. Thorne McCarty
Rutgers University

Background Papers
● “An Implementation of Eisner v. Macomber,” in ICAIL-'95.
– Computational reconstruction of 1920 corporate tax case.
– Based on a theory of “prototypes and deformations.”
● “Some Arguments About Legal Arguments,” in ICAIL-'97.
– Critical review of the literature.
– Discussion of “The Correct Theory” in Section 5:
● “Legal reasoning is a form of theory construction... A judge
rendering a decision in a case is constructing a theory of that
case... If we are looking for a computational analogue of
this phenomenon, the first field that comes to mind is
machine learning...”

ICAIL-'97, Section 5:
… Most machine learning algorithms assume that concepts have
“classical” definitions, with necessary and sufficient conditions, but
legal concepts tend to be defined by prototypes. When you first look at
prototype models [Smith and Medin, 1981], they seem to make the
learning problem harder, rather than easier, since the space of possible
concepts seems to be exponentially larger in these models than it is in
the classical model. But empirically, this is not the case. Somehow, the
requirement that the exemplar of a concept must be “similar” to a
prototype (a kind of “horizontal” constraint) seems to reinforce the
requirement that the exemplar must be placed at some determinate
level of the concept hierarchy (a kind of “vertical” constraint). How is
this possible? This is one of the great mysteries of cognitive science.
It is also one of the great mysteries of legal theory. ...

Summary
Contemporary trends in machine learning have now shed new light on
the subject. In this paper, I will describe my
● Recent work on “manifold learning”:
“Clustering, Coding and the Concept of Similarity,” arXiv:1401.2411 [cs.LG] (10 Jan 2014).
● Work in progress on “deep learning” (forthcoming, 2015):
“Differential Similarity in Higher Dimensional Spaces: Theory and Applications.”
“Deep Learning with a Riemannian Dissimilarity Metric.”
Taken together, this work leads to a logical language grounded in a
prototypical perceptual semantics, with implications for legal theory.

Prototype Coding
What is prototype coding?
● The basic idea is to represent a point in an n-dimensional space by
measuring its distance from a prototype in several specified
directions.
● Furthermore, we want to select a prototype that lies at the origin of
an embedded, low-dimensional, nonlinear subspace, which is in
some sense “optimal”.

Manifold Learning
S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold
Tangent Classifier,” in NIPS 2011:
Three hypotheses:
1. ...
2. The (unsupervised) manifold hypothesis, according to which real world
data presented in high dimensional spaces is likely to concentrate in the
vicinity of non-linear sub-manifolds of much lower dimensionality ...
[citations omitted]
3. The manifold hypothesis for classification, according to which points of
different classes are likely to concentrate along different sub-manifolds,
separated by low density regions of the input space.

Manifold Learning
The Probabilistic Model:
Brownian motion with a drift term. More precisely, a diffusion process
generated by the following differential operator:
● The invariant probability measure is proportional to .
● Thus is the gradient of the log of the probability density.
eU x
∇ U x

Manifold Learning
U x , y , z≈Ox
6
y
6
z
6
 ∇ U  x , y , z≈Ox
5
y
5
z
5


Manifold Learning
The Geometric Model:
To implement the idea of prototype coding, we choose:
● A radial coordinate, ρ, which follows .
● The directional coordinates, θ1
, θ2
,...,θn−1
, orthogonal to .
But we actually want a lower-dimensional subspace, obtained by
projecting our diffusion process onto a k−1 dimensional subset of the
directional coordinates. The device we need is a Riemannian metric,
, which we interpret as a measure of dissimilarity. Crucially, the
dissimilarity metric should depend on the probability measure.
∇ U  x
∇ U  x 
gij  x 

Manifold Learning
● Find a principal axis for the ρ
coordinate.
● Choose the principal directions
for the θ1
, θ2
,..., θk –1
coordinates.
● To compute the coordinate
curves, follow the geodesics of
the Riemannian metric in each of
the k−1 principal directions.

Manifold Learning
Prototypical Clusters
● Probability density is a mixture:
● These two prototypical clusters
are “exponentially” far apart.
It is natural to refer to this model as
a theory of differential similarity.
e
U  x
≈ p1 e
U 1 x
 p2 e
U 2x

Deep Learning
S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold Tangent
Classifier,” in NIPS 2011:
Three hypotheses:
1. The semi-supervised learning hypothesis, according to which learning
aspects of the input distribution p(x) can improve models of the conditional
distribution of the supervised target p(y|x) ... [citation omitted]. This hypothesis
underlies not only the strict semi-supervised setting where one has many more
unlabeled examples at his disposal than labeled ones, but also the successful
unsupervised pretraining approach for learning deep architectures … [citations
omitted].
2. ...
3. ...

Deep Learning
Historically, used as a benchmark for supervised learning:
● Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-Based Learning Applied to
Document Recognition." Proceedings of the IEEE, 86(11):2278-2324 (November, 1998).
We will treat it as a problem in unsupervised feature learning.
Standard Example: MNIST
● pixels
● 60,000 training set images
● 10,000 test set images
28×28

Deep Learning
7X7
patch
60,000 images 600,000 patches
49 dimensions
12 dimensions
sample
scan
encode
scan
14X14
patch
48 dimensions
encode
12 dimensions
encode Category: 4
12 dimensions48 dimensions

Deep Learning
● is estimated from
the data using the mean
shift algorithm.
● at a prototype.
● The prototypical clusters
partition the space of
600,000 patches.
∇ U x
∇ U x=0
35 Prototypes

Deep Learning
Geodesic Coordinates for Two Prototypes

Deep Learning
General Procedure:
● Construct the product manifold from the encoded values of the
smaller patches.
● Construct a submanifold using the Riemannian dissimilarity metric.
encode Category: 4
12 dimensions48 dimensions

The Logical Language
Rewrite the top four patches as a logical product:
Use the syntax of my Language for Legal Discourse (LLD):

For this interpretation, we need a logical language based on category
theory:
Define: Categorical Product
● In Man, this is the product manifold.
Define: Categorical Subobject
● In Man, this is a submanifold.
objects morphisms
Set abstract sets arbitrary mappings
Top topological spaces continuous mappings
Man differential manifolds smooth mappings

For this interpretation, we need a logical language based on category
theory:
Define: Categorical Product
● In Man, this is the product manifold.
Define: Categorical Subobject
● In Man, this is a submanifold.
objects morphisms
Set abstract sets arbitrary mappings
Top topological spaces continuous mappings
Man differential manifolds smooth mappings
logic
classical
intuitionistic
????

Sequent Calculus:
●
Actor and Corporation are interpreted as differential manifolds.
●
macomber and so are interpreted as points on these manifolds.
●
Control is interpreted as a submanifold of the product manifold.
● A sequent is interpreted as a morphism.

Structural Rule for cut:
Introduction and Elimination Rules for conjunction:
Horn Axioms:
This is sufficient for horn clause logic programming.

Novel Property:
A proof is a composition of morphisms in the category Man, i.e., it is a
smooth mapping of differential manifolds.

Novel Property:
A subspace is not always a submanifold.
● Implications for Godel's Theorem?
● Implications for Learnability?
Note: If we are looking for a learnable
knowledge representation language, we
want it to be as restrictive as possible.
1.0 0.5 0.5 1.0x
1.0
0.5
0.5
1.0
y

Introduction and Elimination Rules for existential quantifiers:
Introduction and Elimination Rules for universal quantifiers:
Introduction and Elimination Rules for implication:
Axioms for simple embedded implications:

Conclusion:
We have thus reconstructed, with a semantics grounded in the category
of differential manifolds, Man, the full intuitionistic logic
programming language in:
● “Clausal Intuitionistic Logic. I. Fixed-Point Semantics,” J. of Logic
Programing, 5(1): 1-31 (1988).
● “Clausal Intuitionistic Logic. II. Tableau Proof Procedures,” J. of
Logic Programing, 5(2): 93-132 (1988).

Defining the Ontology of LLD
From “A Language for Legal Discourse. I. Basic Features,” in
ICAIL'89:
● “There are many common sense categories underlying the
representation of a legal problem domain: space, time, mass, action,
permission, obligation, causation, purpose, intention, knowledge,
belief, and so on. The idea is to select a small set of these common
sense categories, ... and … develop a knowledge representation
language that faithfully mirrors the structure of this set. The
language should be formal: it should have a compositional syntax, a
precise semantics and a well-defined inference mechanism. ...”

Defining the Ontology of LLD
● Count Terms and Mass Terms
● Events/Actions and Modalities Over Actions
● “Permissions and Obligations,” IJCAI '83.
● “Modalities Over Actions,” KR '94.
● Knowledge and Belief
● S.N. Artemov, “The Logic of Justification,” Rev. of Symbolic Logic, 7(1): 1-36
(2008).
● M. Fitting, “Reasoning with Justifications” (2009).

Probability
Geometry
Logic
Toward a Theory of Coherence

Probability
Geometry
Logic
Artificial Intelligence

Probability
Geometry
Logic
Stochastic Differential
Geometry: Emery & Meyer
(1989), Hsu (2002)

Probability
Geometry
Logic
Stochastic Differential
Geometry: Emery & Meyer
(1989), Hsu (2002)
MacLane & Moerdijk,
“Sheaves in Geometry
and Logic” (1992)

Logic
Geometry
Probability

Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.

Logic
Geometry
Probability
Constraints
Geometric model is constrained by
the probabilistic model.

Logic
Geometry
Probability
Constraints
Probability measure is constrained by the data.

Logic
Geometry
Probability
Constraints
Probability measure is constrained by the data.
Conjecture: The existence of these mutual constraints makes
theory construction possible.

… Somehow, the requirement that the exemplar of a concept must be
“similar” to a prototype (a kind of “horizontal” constraint) seems to
reinforce the requirement that the exemplar must be placed at some
determinate level of the concept hierarchy (a kind of “vertical”
constraint). How is this possible?
This is one of the great mysteries of cognitive science.
It is also one of the great mysteries of legal theory.

… Somehow, the requirement that the exemplar of a concept must be
“similar” to a prototype (a kind of “horizontal” constraint) seems to
reinforce the requirement that the exemplar must be placed at some
determinate level of the concept hierarchy (a kind of “vertical” constraint).
How is this possible?
This is one of the great mysteries of cognitive science.
It is also one of the great mysteries of legal theory.
Q: Is the mystery now solved?

How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics

Similar to How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics (20)

Recently uploaded

Recently uploaded (20)

How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics