Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

404 views

Published on

The full ICAIL 2015 paper is available on ResearchGate at bit.ly/1qCnLJq.

Published in:
Technology

No Downloads

Total views

404

On SlideShare

0

From Embeds

0

Number of Embeds

5

Shares

0

Downloads

3

Comments

0

Likes

1

No embeds

No notes for slide

- 1. How to Ground A Language for Legal Discourse In a Prototypical Perceptual Semantics L. Thorne McCarty Rutgers University
- 2. Background Papers ● “An Implementation of Eisner v. Macomber,” in ICAIL-'95. – Computational reconstruction of 1920 corporate tax case. – Based on a theory of “prototypes and deformations.” ● “Some Arguments About Legal Arguments,” in ICAIL-'97. – Critical review of the literature. – Discussion of “The Correct Theory” in Section 5: ● “Legal reasoning is a form of theory construction... A judge rendering a decision in a case is constructing a theory of that case... If we are looking for a computational analogue of this phenomenon, the first field that comes to mind is machine learning...”
- 3. ICAIL-'97, Section 5: … Most machine learning algorithms assume that concepts have “classical” definitions, with necessary and sufficient conditions, but legal concepts tend to be defined by prototypes. When you first look at prototype models [Smith and Medin, 1981], they seem to make the learning problem harder, rather than easier, since the space of possible concepts seems to be exponentially larger in these models than it is in the classical model. But empirically, this is not the case. Somehow, the requirement that the exemplar of a concept must be “similar” to a prototype (a kind of “horizontal” constraint) seems to reinforce the requirement that the exemplar must be placed at some determinate level of the concept hierarchy (a kind of “vertical” constraint). How is this possible? This is one of the great mysteries of cognitive science. It is also one of the great mysteries of legal theory. ...
- 4. Summary Contemporary trends in machine learning have now shed new light on the subject. In this paper, I will describe my ● Recent work on “manifold learning”: “Clustering, Coding and the Concept of Similarity,” arXiv:1401.2411 [cs.LG] (10 Jan 2014). ● Work in progress on “deep learning” (forthcoming, 2015): “Differential Similarity in Higher Dimensional Spaces: Theory and Applications.” “Deep Learning with a Riemannian Dissimilarity Metric.” Taken together, this work leads to a logical language grounded in a prototypical perceptual semantics, with implications for legal theory.
- 5. Prototype Coding What is prototype coding? ● The basic idea is to represent a point in an n-dimensional space by measuring its distance from a prototype in several specified directions. ● Furthermore, we want to select a prototype that lies at the origin of an embedded, low-dimensional, nonlinear subspace, which is in some sense “optimal”.
- 6. Manifold Learning S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold Tangent Classifier,” in NIPS 2011: Three hypotheses: 1. ... 2. The (unsupervised) manifold hypothesis, according to which real world data presented in high dimensional spaces is likely to concentrate in the vicinity of non-linear sub-manifolds of much lower dimensionality ... [citations omitted] 3. The manifold hypothesis for classification, according to which points of different classes are likely to concentrate along different sub-manifolds, separated by low density regions of the input space.
- 7. Manifold Learning The Probabilistic Model: Brownian motion with a drift term. More precisely, a diffusion process generated by the following differential operator: ● The invariant probability measure is proportional to . ● Thus is the gradient of the log of the probability density. eU x ∇ U x
- 8. Manifold Learning U x , y , z≈Ox 6 y 6 z 6 ∇ U x , y , z≈Ox 5 y 5 z 5
- 9. Manifold Learning The Geometric Model: To implement the idea of prototype coding, we choose: ● A radial coordinate, ρ, which follows . ● The directional coordinates, θ1 , θ2 ,...,θn−1 , orthogonal to . But we actually want a lower-dimensional subspace, obtained by projecting our diffusion process onto a k−1 dimensional subset of the directional coordinates. The device we need is a Riemannian metric, , which we interpret as a measure of dissimilarity. Crucially, the dissimilarity metric should depend on the probability measure. ∇ U x ∇ U x gij x
- 10. Manifold Learning ● Find a principal axis for the ρ coordinate. ● Choose the principal directions for the θ1 , θ2 ,..., θk –1 coordinates. ● To compute the coordinate curves, follow the geodesics of the Riemannian metric in each of the k−1 principal directions.
- 11. Manifold Learning Prototypical Clusters ● Probability density is a mixture: ● These two prototypical clusters are “exponentially” far apart. It is natural to refer to this model as a theory of differential similarity. e U x ≈ p1 e U 1 x p2 e U 2x
- 12. Deep Learning S. Rifai, Y.N. Dauphin, P. Vincent, Y. Bengio, X. Muller, “The Manifold Tangent Classifier,” in NIPS 2011: Three hypotheses: 1. The semi-supervised learning hypothesis, according to which learning aspects of the input distribution p(x) can improve models of the conditional distribution of the supervised target p(y|x) ... [citation omitted]. This hypothesis underlies not only the strict semi-supervised setting where one has many more unlabeled examples at his disposal than labeled ones, but also the successful unsupervised pretraining approach for learning deep architectures … [citations omitted]. 2. ... 3. ...
- 13. Deep Learning Historically, used as a benchmark for supervised learning: ● Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-Based Learning Applied to Document Recognition." Proceedings of the IEEE, 86(11):2278-2324 (November, 1998). We will treat it as a problem in unsupervised feature learning. Standard Example: MNIST ● pixels ● 60,000 training set images ● 10,000 test set images 28×28
- 14. Deep Learning 7X7 patch 60,000 images 600,000 patches 49 dimensions 12 dimensions sample scan encode scan 14X14 patch 48 dimensions encode 12 dimensions encode Category: 4 12 dimensions48 dimensions
- 15. Deep Learning ● is estimated from the data using the mean shift algorithm. ● at a prototype. ● The prototypical clusters partition the space of 600,000 patches. ∇ U x ∇ U x=0 35 Prototypes
- 16. Deep Learning ● is estimated from the data using the mean shift algorithm. ● at a prototype. ● The prototypical clusters partition the space of 600,000 patches. ∇ U x ∇ U x=0 35 Prototypes
- 17. Deep Learning Geodesic Coordinates for Two Prototypes
- 18. Deep Learning 7X7 patch 60,000 images 600,000 patches 49 dimensions 12 dimensions sample scan encode scan 14X14 patch 48 dimensions encode 12 dimensions encode Category: 4 12 dimensions48 dimensions
- 19. Deep Learning General Procedure: ● Construct the product manifold from the encoded values of the smaller patches. ● Construct a submanifold using the Riemannian dissimilarity metric. encode Category: 4 12 dimensions48 dimensions
- 20. The Logical Language Rewrite the top four patches as a logical product: Use the syntax of my Language for Legal Discourse (LLD):
- 21. The Logical Language For this interpretation, we need a logical language based on category theory: Define: Categorical Product ● In Man, this is the product manifold. Define: Categorical Subobject ● In Man, this is a submanifold. objects morphisms Set abstract sets arbitrary mappings Top topological spaces continuous mappings Man differential manifolds smooth mappings
- 22. The Logical Language For this interpretation, we need a logical language based on category theory: Define: Categorical Product ● In Man, this is the product manifold. Define: Categorical Subobject ● In Man, this is a submanifold. objects morphisms Set abstract sets arbitrary mappings Top topological spaces continuous mappings Man differential manifolds smooth mappings logic classical intuitionistic ????
- 23. The Logical Language Sequent Calculus: ● Actor and Corporation are interpreted as differential manifolds. ● macomber and so are interpreted as points on these manifolds. ● Control is interpreted as a submanifold of the product manifold. ● A sequent is interpreted as a morphism.
- 24. The Logical Language Structural Rule for cut: Introduction and Elimination Rules for conjunction: Horn Axioms: This is sufficient for horn clause logic programming.
- 25. The Logical Language Novel Property: A proof is a composition of morphisms in the category Man, i.e., it is a smooth mapping of differential manifolds.
- 26. The Logical Language Novel Property: A subspace is not always a submanifold. ● Implications for Godel's Theorem? ● Implications for Learnability? Note: If we are looking for a learnable knowledge representation language, we want it to be as restrictive as possible. 1.0 0.5 0.5 1.0x 1.0 0.5 0.5 1.0 y
- 27. The Logical Language Introduction and Elimination Rules for existential quantifiers: Introduction and Elimination Rules for universal quantifiers: Introduction and Elimination Rules for implication: Axioms for simple embedded implications:
- 28. The Logical Language Conclusion: We have thus reconstructed, with a semantics grounded in the category of differential manifolds, Man, the full intuitionistic logic programming language in: ● “Clausal Intuitionistic Logic. I. Fixed-Point Semantics,” J. of Logic Programing, 5(1): 1-31 (1988). ● “Clausal Intuitionistic Logic. II. Tableau Proof Procedures,” J. of Logic Programing, 5(2): 93-132 (1988).
- 29. Defining the Ontology of LLD From “A Language for Legal Discourse. I. Basic Features,” in ICAIL'89: ● “There are many common sense categories underlying the representation of a legal problem domain: space, time, mass, action, permission, obligation, causation, purpose, intention, knowledge, belief, and so on. The idea is to select a small set of these common sense categories, ... and … develop a knowledge representation language that faithfully mirrors the structure of this set. The language should be formal: it should have a compositional syntax, a precise semantics and a well-defined inference mechanism. ...”
- 30. Defining the Ontology of LLD ● Count Terms and Mass Terms ● Events/Actions and Modalities Over Actions ● “Permissions and Obligations,” IJCAI '83. ● “Modalities Over Actions,” KR '94. ● Knowledge and Belief ● S.N. Artemov, “The Logic of Justification,” Rev. of Symbolic Logic, 7(1): 1-36 (2008). ● M. Fitting, “Reasoning with Justifications” (2009).
- 31. Probability Geometry Logic Toward a Theory of Coherence
- 32. Probability Geometry Logic Artificial Intelligence Toward a Theory of Coherence
- 33. Probability Geometry Logic Artificial Intelligence Stochastic Differential Geometry: Emery & Meyer (1989), Hsu (2002) Toward a Theory of Coherence
- 34. Probability Geometry Logic Artificial Intelligence Stochastic Differential Geometry: Emery & Meyer (1989), Hsu (2002) MacLane & Moerdijk, “Sheaves in Geometry and Logic” (1992) Toward a Theory of Coherence
- 35. Toward a Theory of Coherence Logic Geometry Probability
- 36. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry.
- 37. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model.
- 38. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model. Probability measure is constrained by the data.
- 39. Toward a Theory of Coherence Logic Geometry Probability Constraints Logic is constrained by the geometry. Geometric model is constrained by the probabilistic model. Probability measure is constrained by the data. Conjecture: The existence of these mutual constraints makes theory construction possible.
- 40. Toward a Theory of Coherence ICAIL-'97, Section 5: … Somehow, the requirement that the exemplar of a concept must be “similar” to a prototype (a kind of “horizontal” constraint) seems to reinforce the requirement that the exemplar must be placed at some determinate level of the concept hierarchy (a kind of “vertical” constraint). How is this possible? This is one of the great mysteries of cognitive science. It is also one of the great mysteries of legal theory.
- 41. Toward a Theory of Coherence ICAIL-'97, Section 5: … Somehow, the requirement that the exemplar of a concept must be “similar” to a prototype (a kind of “horizontal” constraint) seems to reinforce the requirement that the exemplar must be placed at some determinate level of the concept hierarchy (a kind of “vertical” constraint). How is this possible? This is one of the great mysteries of cognitive science. It is also one of the great mysteries of legal theory. Q: Is the mystery now solved?

No public clipboards found for this slide

Be the first to comment