Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
How to Ground
A Language for Legal Discourse
In a Prototypical Perceptual Semantics
L. Thorne McCarty
Rutgers University
Background Papers
● “An Implementation of Eisner v. Macomber,” in ICAIL-'95.
– Computational reconstruction of 1920 corpor...
ICAIL-'97, Section 5:
… Most machine learning algorithms assume that concepts have
“classical” definitions, with necessary...
Summary
Contemporary trends in machine learning have now shed new light on
the subject. In this paper, I will describe my
...
Prototype Coding
What is prototype coding?
● The basic idea is to represent a point in an n-dimensional space by
measuring...
Manifold Learning
S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold
Tangent Classifier,” in NIPS 201...
Manifold Learning
The Probabilistic Model:
Brownian motion with a drift term. More precisely, a diffusion process
generate...
Manifold Learning
U x , y , z≈Ox
6
y
6
z
6
 ∇ U  x , y , z≈Ox
5
y
5
z
5

Manifold Learning
The Geometric Model:
To implement the idea of prototype coding, we choose:
● A radial coordinate, ρ, whi...
Manifold Learning
● Find a principal axis for the ρ
coordinate.
● Choose the principal directions
for the θ1
, θ2
,..., θk...
Manifold Learning
Prototypical Clusters
● Probability density is a mixture:
● These two prototypical clusters
are “exponen...
Deep Learning
S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold Tangent
Classifier,” in NIPS 2011:
T...
Deep Learning
Historically, used as a benchmark for supervised learning:
● Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner....
Deep Learning
7X7
patch
60,000 images 600,000 patches
49 dimensions
12 dimensions
sample
scan
encode
scan
14X14
patch
48 d...
Deep Learning
● is estimated from
the data using the mean
shift algorithm.
● at a prototype.
● The prototypical clusters
p...
Deep Learning
● is estimated from
the data using the mean
shift algorithm.
● at a prototype.
● The prototypical clusters
p...
Deep Learning
Geodesic Coordinates for Two Prototypes
Deep Learning
7X7
patch
60,000 images 600,000 patches
49 dimensions
12 dimensions
sample
scan
encode
scan
14X14
patch
48 d...
Deep Learning
General Procedure:
● Construct the product manifold from the encoded values of the
smaller patches.
● Constr...
The Logical Language
Rewrite the top four patches as a logical product:
Use the syntax of my Language for Legal Discourse ...
The Logical Language
For this interpretation, we need a logical language based on category
theory:
Define: Categorical Pro...
The Logical Language
For this interpretation, we need a logical language based on category
theory:
Define: Categorical Pro...
The Logical Language
Sequent Calculus:
●
Actor and Corporation are interpreted as differential manifolds.
●
macomber and s...
The Logical Language
Structural Rule for cut:
Introduction and Elimination Rules for conjunction:
Horn Axioms:
This is suf...
The Logical Language
Novel Property:
A proof is a composition of morphisms in the category Man, i.e., it is a
smooth mappi...
The Logical Language
Novel Property:
A subspace is not always a submanifold.
● Implications for Godel's Theorem?
● Implica...
The Logical Language
Introduction and Elimination Rules for existential quantifiers:
Introduction and Elimination Rules fo...
The Logical Language
Conclusion:
We have thus reconstructed, with a semantics grounded in the category
of differential man...
Defining the Ontology of LLD
From “A Language for Legal Discourse. I. Basic Features,” in
ICAIL'89:
● “There are many comm...
Defining the Ontology of LLD
● Count Terms and Mass Terms
● Events/Actions and Modalities Over Actions
● “Permissions and ...
Probability
Geometry
Logic
Toward a Theory of Coherence
Probability
Geometry
Logic
Artificial Intelligence
Toward a Theory of Coherence
Probability
Geometry
Logic
Artificial Intelligence
Stochastic Differential
Geometry: Emery & Meyer
(1989), Hsu (2002)
Towa...
Probability
Geometry
Logic
Artificial Intelligence
Stochastic Differential
Geometry: Emery & Meyer
(1989), Hsu (2002)
MacL...
Toward a Theory of Coherence
Logic
Geometry
Probability
Toward a Theory of Coherence
Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
Toward a Theory of Coherence
Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
Geometric model ...
Toward a Theory of Coherence
Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
Geometric model ...
Toward a Theory of Coherence
Logic
Geometry
Probability
Constraints
Logic is constrained by the geometry.
Geometric model ...
Toward a Theory of Coherence
ICAIL-'97, Section 5:
… Somehow, the requirement that the exemplar of a concept must be
“simi...
Toward a Theory of Coherence
ICAIL-'97, Section 5:
… Somehow, the requirement that the exemplar of a concept must be
“simi...
Upcoming SlideShare
Loading in …5
×

How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics

385 views

Published on

Slides for my talk at the 15th International Conference on Artificial Intelligence and Law (ICAIL 2015), June 11, 2015.

The full ICAIL 2015 paper is available on ResearchGate at bit.ly/1qCnLJq.

Published in: Technology
  • Be the first to comment

How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics

  1. 1. How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics L. Thorne McCarty Rutgers University
  2. 2. Background Papers ● “An Implementation of Eisner v. Macomber,” in ICAIL-'95. – Computational reconstruction of 1920 corporate tax case. – Based on a theory of “prototypes and deformations.” ● “Some Arguments About Legal Arguments,” in ICAIL-'97. – Critical review of the literature. – Discussion of “The Correct Theory” in Section 5: ● “Legal reasoning is a form of theory construction... A judge rendering a decision in a case is constructing a theory of that case... If we are looking for a computational analogue of this phenomenon, the first field that comes to mind is machine learning...”
  3. 3. ICAIL-'97, Section 5: … Most machine learning algorithms assume that concepts have “classical” definitions, with necessary and sufficient conditions, but legal concepts tend to be defined by prototypes. When you first look at prototype models [Smith and Medin, 1981], they seem to make the learning problem harder, rather than easier, since the space of possible concepts seems to be exponentially larger in these models than it is in the classical model. But empirically, this is not the case. Somehow, the requirement that the exemplar of a concept must be “similar” to a prototype (a kind of “horizontal” constraint) seems to reinforce the requirement that the exemplar must be placed at some determinate level of the concept hierarchy (a kind of “vertical” constraint). How is this possible? This is one of the great mysteries of cognitive science. It is also one of the great mysteries of legal theory. ...
  4. 4. Summary Contemporary trends in machine learning have now shed new light on the subject. In this paper, I will describe my ● Recent work on “manifold learning”: “Clustering, Coding and the Concept of Similarity,” arXiv:1401.2411 [cs.LG] (10 Jan 2014). ● Work in progress on “deep learning” (forthcoming, 2015): “Differential Similarity in Higher Dimensional Spaces: Theory and Applications.” “Deep Learning with a Riemannian Dissimilarity Metric.” Taken together, this work leads to a logical language grounded in a prototypical perceptual semantics, with implications for legal theory.
  5. 5. Prototype Coding What is prototype coding? ● The basic idea is to represent a point in an n-dimensional space by measuring its distance from a prototype in several specified directions. ● Furthermore, we want to select a prototype that lies at the origin of an embedded, low-dimensional, nonlinear subspace, which is in some sense “optimal”.
  6. 6. Manifold Learning S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold Tangent Classifier,” in NIPS 2011: Three hypotheses: 1. ... 2. The (unsupervised) manifold hypothesis, according to which real world data presented in high dimensional spaces is likely to concentrate in the vicinity of non-linear sub-manifolds of much lower dimensionality ... [citations omitted] 3. The manifold hypothesis for classification, according to which points of different classes are likely to concentrate along different sub-manifolds, separated by low density regions of the input space.
  7. 7. Manifold Learning The Probabilistic Model: Brownian motion with a drift term. More precisely, a diffusion process generated by the following differential operator: ● The invariant probability measure is proportional to . ● Thus is the gradient of the log of the probability density. eU x ∇ U x
  8. 8. Manifold Learning U x , y , z≈Ox 6 y 6 z 6  ∇ U  x , y , z≈Ox 5 y 5 z 5 
  9. 9. Manifold Learning The Geometric Model: To implement the idea of prototype coding, we choose: ● A radial coordinate, ρ, which follows . ● The directional coordinates, θ1 , θ2 ,...,θn−1 , orthogonal to . But we actually want a lower-dimensional subspace, obtained by projecting our diffusion process onto a k−1 dimensional subset of the directional coordinates. The device we need is a Riemannian metric, , which we interpret as a measure of dissimilarity. Crucially, the dissimilarity metric should depend on the probability measure. ∇ U  x ∇ U  x  gij  x 
  10. 10. Manifold Learning ● Find a principal axis for the ρ coordinate. ● Choose the principal directions for the θ1 , θ2 ,..., θk –1 coordinates. ● To compute the coordinate curves, follow the geodesics of the Riemannian metric in each of the k−1 principal directions.
  11. 11. Manifold Learning Prototypical Clusters ● Probability density is a mixture: ● These two prototypical clusters are “exponentially” far apart. It is natural to refer to this model as a theory of differential similarity. e U  x ≈ p1 e U 1 x  p2 e U 2x
  12. 12. Deep Learning S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold Tangent Classifier,” in NIPS 2011: Three hypotheses: 1. The semi-supervised learning hypothesis, according to which learning aspects of the input distribution p(x) can improve models of the conditional distribution of the supervised target p(y|x) ... [citation omitted]. This hypothesis underlies not only the strict semi-supervised setting where one has many more unlabeled examples at his disposal than labeled ones, but also the successful unsupervised pretraining approach for learning deep architectures … [citations omitted]. 2. ... 3. ...
  13. 13. Deep Learning Historically, used as a benchmark for supervised learning: ● Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-Based Learning Applied to Document Recognition." Proceedings of the IEEE, 86(11):2278-2324 (November, 1998). We will treat it as a problem in unsupervised feature learning. Standard Example: MNIST ● pixels ● 60,000 training set images ● 10,000 test set images 28×28
  14. 14. Deep Learning 7X7 patch 60,000 images 600,000 patches 49 dimensions 12 dimensions sample scan encode scan 14X14 patch 48 dimensions encode 12 dimensions encode Category: 4 12 dimensions48 dimensions
  15. 15. Deep Learning ● is estimated from the data using the mean shift algorithm. ● at a prototype. ● The prototypical clusters partition the space of 600,000 patches. ∇ U x ∇ U x=0 35 Prototypes
  16. 16. Deep Learning ● is estimated from the data using the mean shift algorithm. ● at a prototype. ● The prototypical clusters partition the space of 600,000 patches. ∇ U x ∇ U x=0 35 Prototypes
  17. 17. Deep Learning Geodesic Coordinates for Two Prototypes
  18. 18. Deep Learning 7X7 patch 60,000 images 600,000 patches 49 dimensions 12 dimensions sample scan encode scan 14X14 patch 48 dimensions encode 12 dimensions encode Category: 4 12 dimensions48 dimensions
  19. 19. Deep Learning General Procedure: ● Construct the product manifold from the encoded values of the smaller patches. ● Construct a submanifold using the Riemannian dissimilarity metric. encode Category: 4 12 dimensions48 dimensions
  20. 20. The Logical Language Rewrite the top four patches as a logical product: Use the syntax of my Language for Legal Discourse (LLD):
  21. 21. The Logical Language For this interpretation, we need a logical language based on category theory: Define: Categorical Product ● In Man, this is the product manifold. Define: Categorical Subobject ● In Man, this is a submanifold. objects morphisms Set abstract sets arbitrary mappings Top topological spaces continuous mappings Man differential manifolds smooth mappings
  22. 22. The Logical Language For this interpretation, we need a logical language based on category theory: Define: Categorical Product ● In Man, this is the product manifold. Define: Categorical Subobject ● In Man, this is a submanifold. objects morphisms Set abstract sets arbitrary mappings Top topological spaces continuous mappings Man differential manifolds smooth mappings logic classical intuitionistic ????
  23. 23. The Logical Language Sequent Calculus: ● Actor and Corporation are interpreted as differential manifolds. ● macomber and so are interpreted as points on these manifolds. ● Control is interpreted as a submanifold of the product manifold. ● A sequent is interpreted as a morphism.
  24. 24. The Logical Language Structural Rule for cut: Introduction and Elimination Rules for conjunction: Horn Axioms: This is sufficient for horn clause logic programming.
  25. 25. The Logical Language Novel Property: A proof is a composition of morphisms in the category Man, i.e., it is a smooth mapping of differential manifolds.
  26. 26. The Logical Language Novel Property: A subspace is not always a submanifold. ● Implications for Godel's Theorem? ● Implications for Learnability? Note: If we are looking for a learnable knowledge representation language, we want it to be as restrictive as possible. 1.0 0.5 0.5 1.0x 1.0 0.5 0.5 1.0 y
  27. 27. The Logical Language Introduction and Elimination Rules for existential quantifiers: Introduction and Elimination Rules for universal quantifiers: Introduction and Elimination Rules for implication: Axioms for simple embedded implications:
  28. 28. The Logical Language Conclusion: We have thus reconstructed, with a semantics grounded in the category of differential manifolds, Man, the full intuitionistic logic programming language in: ● “Clausal Intuitionistic Logic. I. Fixed-Point Semantics,” J. of Logic Programing, 5(1): 1-31 (1988). ● “Clausal Intuitionistic Logic. II. Tableau Proof Procedures,” J. of Logic Programing, 5(2): 93-132 (1988).
  29. 29. Defining the Ontology of LLD From “A Language for Legal Discourse. I. Basic Features,” in ICAIL'89: ● “There are many common sense categories underlying the representation of a legal problem domain: space, time, mass, action, permission, obligation, causation, purpose, intention, knowledge, belief, and so on. The idea is to select a small set of these common sense categories, ... and … develop a knowledge representation language that faithfully mirrors the structure of this set. The language should be formal: it should have a compositional syntax, a precise semantics and a well-defined inference mechanism. ...”
  30. 30. Defining the Ontology of LLD ● Count Terms and Mass Terms ● Events/Actions and Modalities Over Actions ● “Permissions and Obligations,” IJCAI '83. ● “Modalities Over Actions,” KR '94. ● Knowledge and Belief ● S.N. Artemov, “The Logic of Justification,” Rev. of Symbolic Logic, 7(1): 1-36 (2008). ● M. Fitting, “Reasoning with Justifications” (2009).
  31. 31. Probability Geometry Logic Toward a Theory of Coherence
  32. 32. Probability Geometry Logic Artificial Intelligence Toward a Theory of Coherence
  33. 33. Probability Geometry Logic Artificial Intelligence Stochastic Differential Geometry: Emery & Meyer (1989), Hsu (2002) Toward a Theory of Coherence
  34. 34. Probability Geometry Logic Artificial Intelligence Stochastic Differential Geometry: Emery & Meyer (1989), Hsu (2002) MacLane & Moerdijk, “Sheaves in Geometry and Logic” (1992) Toward a Theory of Coherence
  35. 35. Toward a Theory of Coherence Logic Geometry Probability
  36. 36. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry.
  37. 37. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model.
  38. 38. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model. Probability measure is constrained by the data.
  39. 39. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model. Probability measure is constrained by the data. Conjecture: The existence of these mutual constraints makes theory construction possible.
  40. 40. Toward a Theory of Coherence ICAIL-'97, Section 5: … Somehow, the requirement that the exemplar of a concept must be “similar” to a prototype (a kind of “horizontal” constraint) seems to reinforce the requirement that the exemplar must be placed at some determinate level of the concept hierarchy (a kind of “vertical” constraint). How is this possible? This is one of the great mysteries of cognitive science. It is also one of the great mysteries of legal theory.
  41. 41. Toward a Theory of Coherence ICAIL-'97, Section 5: … Somehow, the requirement that the exemplar of a concept must be “similar” to a prototype (a kind of “horizontal” constraint) seems to reinforce the requirement that the exemplar must be placed at some determinate level of the concept hierarchy (a kind of “vertical” constraint). How is this possible? This is one of the great mysteries of cognitive science. It is also one of the great mysteries of legal theory. Q: Is the mystery now solved?

×