Not an axiom, and not always true - the most controversial of the "graphoid axioms" of conditional independence. New results from algebraic geometry viewed in the light of mathematical probability theory.
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
The Intersection Axiom of Conditional Probability
1. The intersection axiom
of conditional probability:
some “new” [?] results
Richard D. Gill
Mathematical Institute, University Leiden
This version: 20 March, 2019
The best-laid plans of mice and men often go awry
No matter how carefully a project is planned, something may still go wrong with it.
The saying is adapted from a line in “To a Mouse,” by Robert Burns:
“The best laid schemes o' mice an' men / Gang aft a-gley.”
(X ⊥⊥ Y ∣ Z) & (X ⊥⊥ Z ∣ Y) ⟹ X ⊥⊥ (Y, Z)
Presented at Algebraic Statistics seminar, Leiden, 27 February 2019
2. Algebraic Statistics
Seth Sullivant
North Carolina State University
E-mail address: smsulli2@ncsu.edu
2010 Mathematics Subject Classification. Primary 62-01, 14-01, 13P10,
13P15, 14M12, 14M25, 14P10, 14T05, 52B20, 60J10, 62F03, 62H17,
90C10, 92D15
Key words and phrases. algebraic statistics, graphical models, contingency
tables, conditional independence, phylogenetic models, design of
experiments, Gr¨obner bases, real algebraic geometry, exponential families,
exact test, maximum likelihood degree, Markov basis, disclosure limitation,
random graph models, model selection, identifiability
Abstract. Algebraic statistics uses tools from algebraic geometry, com-
mutative algebra, combinatorics, and their computational sides to ad-
dress problems in statistics and its applications. The starting point
for this connection is the observation that many statistical models are
semialgebraic sets. The algebra/statistics connection is now over twenty
years old– this book presents the first comprehensive and introductory
treatment of the subject. After background material in probability, al-
gebra, and statistics, the book covers a range of topics in algebraic
statistics including algebraic exponential families, likelihood inference,
Fisher’s exact test, bounds on entries of contingency tables, design of
experiments, identifiability of hidden variable models, phylogenetic mod-
els, and model selection. The book is suitable for both classroom use
and independent study, as it has numerous examples, references, and
over 150 exercises.
3. • The intersection axiom:
• “New” result:
where W:= f(Y) = g(Z) for some f, g
• In particular, we can take W = Law((Y, Z) | X)
• If f and g are trivial (constant) we obtain “axiom 5”
• Also “new”: Nontrivial f, g exist such that f(Y) = g(Z) a.e. iff A, B exist
with probabilities strictly between 0 and 1 s.t.
Call such a joint law decomposable
(X ⊥⊥ Y ∣ Z) & (X ⊥⊥ Z ∣ Y) ⟺ X ⊥⊥ (Y, Z) ∣ W
Pr(Y ∈ A & Z ∈ Bc) = 0 = Pr(Y ∈ Ac & Z ∈ B)
(X ⊥⊥ Y ∣ Z) & (X ⊥⊥ Z ∣ Y) ⟹ X ⊥⊥ (Y, Z)
4.
5.
6.
7. Comfort zones
• All variables have finite support (Algebraic Geometry)
• All variables have countable support
• All variables have continuous joint probability densities
(many applied statisticians)
• All densities are strictly positive
• All distributions are non-degenerate Gaussian
• All variables take values in Polish spaces (My favourite)
8. Please recall
• The joint probability distribution of X and Y can be disintegrated into the
marginal distribution of X and a family of conditional distributions of Y
given X = x
• The disintegration is unique up to almost everywhere equivalence
• Conditional independence of X and Y given Z is just ordinary
independence within each of the joint laws of X and Y conditional on Z = z
• For me, 0/0 = “undefined” and 0 x “undefined” = 0
• In other words: conditional distributions do exist even if we condition
on zero probability events; they just fail to be uniquely defined.
• The non-uniqueness is harmless
9. Some new notation
• I’ll denote by “law(X)” the probability distribution of X on 𝒳
• In the finite, discrete case, a “law” is just a vector of real numbers, non-negative,
adding to one.
• In the Polish case, the set of probability laws on a given Polish space is itself a
Polish space under an appropriate metric. One can moreover take convex
combinations.
• The family of conditional distributions of X given Y, (law(X | Y = y))y ∈ 𝒴 can be
thought of as a function of y ∈ 𝒴. The function in question is Borel measurable.
• As a function of the random variable Y, we can consider it as a random variable!!!!
• By Law(X | Y) I’ll denote that random variable, taking values in the space of
probability laws on 𝒳.
11. Proof of lemma, discrete case
Recall, X ⫫ Y | Z ⟺ p(x, y, z) = g(x, z) h(y, z)
Thus X ⫫ Y | L ⟺ we can factor p(x, y, l) this way
Given function p(x, y), pick any
Lemma: X ⫫ Y | Law(X | Y)
x ∈ 𝒳, y ∈ 𝒴, ℓ ∈ Δ|𝒳|−1
p(x, y, ℓ) = p(x, y) ⋅ 1{ℓ = p( ⋅ , y)/p(y)}
= ℓ(x)p(y)1{ℓ = p( ⋅ , y)/p(y)}
= Eval(ℓ, x) . p(y)1{ℓ = p( ⋅ , y)/p(y)}
Proof of lemma, Polish case
Similar, but different – we don’t assume existence of joint densities!
12. • X ⫫ Y | Z ⟹ Law(X | Y, Z) = Law(X | Z)
• X ⫫ Z | Y ⟹ Law(X | Y, Z) = Law(X | Y)
• So we have w(Y, Z) = g(Z) = f(Y) =: W for some functions
w, g, f
• By our lemma, X ⫫ (Y, Z) | Law(X | (Y, Z))
• We found functions g, f such that g(Z) = f(Y) and, with W:=
w(Y, Z) = g(Z) = f(Y), X ⫫ (Y, Z) | W
Proof of forwards implication
13. • Suppose X ⫫ (Y, Z) | W where W = g(Z) = f(Y) for some
functions g, f
• By axiom …, X ⫫ Y | (W, Z)
• So X ⫫ Y | (g(Z), Z)
• So X ⫫ Y | Z
• Similarly, X ⫫ Z | Y
Proof of reverse implication
14. Sullivant
• Uses primary decomposition of toric ideals to come up with a nice
parametrisation of the model “Axiom 5”
• Given: finite sets 𝒳, 𝒴, 𝒵, what is the set of all probability measures on
their product satisfying Axiom 5, and with p(y) > 0, p(z) > 0, for all y, z ?
• Answer: pick partitions of 𝒴, 𝒵 which are in 1-1 correspondence with one
another. Call one of them “𝒲”. Pick a positive probability distribution on
𝒲. Pick indecomposable probability distributions on the products of
corresponding partition elements of 𝒴 and 𝒵. Pick probability distributions
on 𝒳, also corresponding to the preceding, not necessarily all different
• Now put them together: in simulation terms: generate r.v. W = w∊ 𝒲.
Generate (Y,Z) given W =w and independently thereof generate X given W
= w.
15. Polish spaces
• Exactly same construction … just replace “partition” by a
Borel measurable map onto another Polish space
• “Corresponding partitions” … Borel measurable maps
onto same Polish space
16. Questions
• Does algebraic geometry provide any further “statistical”
insights?
• Can some of you join me to turn all these ideas into a nice
joint paper?
• Could there be a category theoretical meta-theorem?
17. Sullivant, book, ch. 4, esp. section 4.3.1
conditional distributions, Ann. Statist. 26 (1998), no. 1, 363–397.
DS04] Adrian Dobra and Seth Sullivant, A divide-and-conquer algorithm for gen-
erating Markov bases of multi-way tables, Comput. Statist. 19 (2004), no. 3,
347–366.
DS13] Ruth Davidson and Seth Sullivant, Polyhedral combinatorics of UPGMA
cones, Adv. in Appl. Math. 50 (2013), no. 2, 327–338. MR 3003350
DS14] Ruth Davidson and Seth Sullivant, Distance-based phylogenetic methods
around a polytomy, IEEE/ACM Transactions on Computational Biology
and Bioinformatics 11 (2014), 325–335.
DSS07] Mathias Drton, Bernd Sturmfels, and Seth Sullivant, Algebraic factor anal-
ysis: tetrads, pentads and beyond, Probab. Theory Related Fields 138
(2007), no. 3-4, 463–493.
DSS09] , Lectures on Algebraic Statistics, Oberwolfach Seminars, vol. 39,
Birkh¨auser Verlag, Basel, 2009. MR 2723140
DW16] Mathias Drton and Luca Weihs, Generic identifiability of linear structural
equation models by ancestor decomposition, Scand. J. Stat. 43 (2016), no. 4,
1035–1045. MR 3573674
Dwo06] Cynthia Dwork, Di↵erential privacy, Automata, languages and program-
ming. Part II, Lecture Notes in Comput. Sci., vol. 4052, Springer, Berlin,
2006, pp. 1–12. MR 2307219
DY10] Mathias Drton and Josephine Yu, On a parametrization of positive semi-
definite matrices with zeros, SIAM J. Matrix Anal. Appl. 31 (2010), no. 5,
2665–2680. MR 2740626 (2011j:15057)
EAM95] Robert J. Elliott, Lakhdar Aggoun, and John B. Moore, Hidden Markov
models, Applications of Mathematics (New York), vol. 29, Springer-Verlag,
Bibliography 463
Fel81] Joseph Felsenstein, Evolutionary trees from dna sequences: a maximum
likelihood approach, Journal of Molecular Evolution 17 (1981), 368–376.
Fel03] , Inferring Phylogenies, Sinauer Associates, Inc., Sunderland, 2003.
FG12] Shmuel Friedland and Elizabeth Gross, A proof of the set-theoretic version
of the salmon conjecture, J. Algebra 356 (2012), 374–379. MR 2891138
FHKT08] Conor Fahey, Serkan Ho¸sten, Nathan Krieger, and Leslie Timpe, Least
squares methods for equidistant tree reconstruction, arXiv:0808.3979, 2008.
Fin10] Audrey Finkler, Goodness of fit statistics for sparse contingency tables,
arXiv:1006.2620, 2010.
Fin11] Alex Fink, The binomial ideal of the intersection axiom for conditional
probabilities, J. Algebraic Combin. 33 (2011), no. 3, 455–463. MR 2772542
Fis66] Franklin M. Fisher, The identification problem in econometrics, McGraw-
Hill, 1966.
FKS16] Stefan Forcey, Logan Keefe, and William Sands, Facets of the balanced
minimal evolution polytope, J. Math. Biol. 73 (2016), no. 2, 447–468.
MR 3521111
FKS17] , Split-facets for balanced minimal evolution polytopes and the
permutoassociahedron, Bull. Math. Biol. 79 (2017), no. 5, 975–994.
MR 3634386
Nor98] J. R. Norris, Markov chains, Cambridge Series in Statistical and Proba-
bilistic Mathematics, vol. 2, Cambridge University Press, Cambridge, 1998,
Reprint of 1997 original. MR 1600720 (99c:60144)
Ohs10] Hidefumi Ohsugi, Normality of cut polytopes of graphs in a minor closed
property, Discrete Math. 310 (2010), no. 6-7, 1160–1166. MR 2579848
(2011e:52022)
OHT13] Mitsunori Ogawa, Hisayuki Hara, and Akimichi Takemura, Graver basis for
an undirected graph and its application to testing the beta model of random
graphs, Ann. Inst. Statist. Math. 65 (2013), no. 1, 191–212. MR 3011620
OS99] Shmuel Onn and Bernd Sturmfels, Cutting corners, Adv. in Appl. Math.
23 (1999), no. 1, 29–48. MR 1692984 (2000h:52019)
Pau00] Yves Pauplin, Direct calculation of a tree length using a distance matrix,
Journal of Molecular Evolution 51 (2000), 41–47.
Pea82] Judea Pearl, Reverend bayes on inference engines: A distributed hierarchi-
cal approach, Proceedings of the Second National Conference on Artificial
Intelligence. AAAI-82: Pittsburgh, PA., Menlo Park, California: AAAI
Press., 1982, pp. 133–136.
Pea09] , Causality, second ed., Cambridge University Press, Cambridge,
2009, Models, reasoning, and inference. MR 2548166 (2010i:68148)
Pet14] Jonas Peters, On the intersection property of conditional independence and
its application to causal discovery, Journal of Causal Inference 3 (2014),
97–108.
PRF10] Sonja Petrovi´c, Alessandro Rinaldo, and Stephen E. Fienberg, Algebraic
statistics for a directed random graph model with reciprocation, Algebraic
methods in statistics and probability II, Contemp. Math., vol. 516, Amer.
Math. Soc., Providence, RI, 2010, pp. 261–283. MR 2730754
References
19. The (semi-)graphoid axioms
of (conditional) independence
• Symmetry
• Decomposition
• Weak union
• Contraction
• Intersection ( X ⊥⊥ Y ∣ Z & X ⊥⊥ Z ∣ Y ) ⟹ X ⊥⊥ (Y, Z)
X ⊥⊥ Y ⟹ Y ⊥⊥ X
X ⊥⊥ (Y, Z) ⟹ X ⊥⊥ Y
X ⊥⊥ (Y, Z) ⟹ X ⊥⊥ Y ∣ Z
( X ⊥⊥ Z ∣ Y & X ⊥⊥ Y ) ⟹ X ⊥⊥ (Y, Z)
20. Comfort zones
All variables have:
• Finite outcome space [Nice for algebraic geometry]
• Countable outcome space
• Continuous joint density with respect to sigma-finite
product measures [?]
• Outcome spaces are Polish ❤
Other “convenience” assumptions: Strictly positive joint density
Multivariate normal also allows algebraic geometry approach