Keynote at WOMBAT 2019 (Monash University) https://www.monash.edu/business/wombat2019
Abstract:
Studying causal effects and structures is central to research in management, social science, economics, and other areas, yet typical analysis methods are designed for low-dimensional data. Classification & Regression Trees ("trees") and their variants are popular predictive tools used in many machine learning applications and predictive research, as they are powerful in high-dimensional predictive scenarios. Yet trees are not commonly used in causal-explanatory research. In this talk I will describe adaptations of trees that we developed for tackling two causal-explanatory issues: self selection and confounder detection. For self selection, we developed a novel tree-based approach adjusting for observable self-selection bias in intervention studies, thereby creating a useful tool for analysis of observational impact studies as well as post-analysis of experimental data which scales for big data. For tackling confounders, we repurose trees for automated detection of potential Simpson's paradoxes in data with few or many potential confounding variables, and even with very large samples. I'll also show insights revealed when applying these trees to applications in eGov, labor economics, and healthcare.
Presentation at special event "To Explain or To Predict?" at Tel Aviv University, July 9, 2012. Event co-organized by the Israel Statistical Association and Tel Aviv University's Department of Statistics and OR.
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
Slide from Prof. Galit Shmueli's talk at University of Toronto's Rotman School of Management, March 4, 2016. This talk is part of Rotman's Big Data Expert Speaker Series.
https://www.rotman.utoronto.ca/ProfessionalDevelopment/Events/UpcomingEvents/20160304GalitShmueli.aspx
Presentation at special event "To Explain or To Predict?" at Tel Aviv University, July 9, 2012. Event co-organized by the Israel Statistical Association and Tel Aviv University's Department of Statistics and OR.
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
Slide from Prof. Galit Shmueli's talk at University of Toronto's Rotman School of Management, March 4, 2016. This talk is part of Rotman's Big Data Expert Speaker Series.
https://www.rotman.utoronto.ca/ProfessionalDevelopment/Events/UpcomingEvents/20160304GalitShmueli.aspx
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
Slides from keynote presentation at 3rd Taiwan Summer Workshop in Information Management (TSWIM) by Galit Shmueli on "To Explain or To Predict? Predictive Analytics in Information Systems Research"
Slides accompanying Malcolm Moore’s 2014 webcast on statistical and predictive modelling where he demonstrates JMP as an effective tool for exploratory data analysis, and JMP Pro as an expert modelling tool that scales to any number of Xs and Ys, is effective with messy data, and reduces the risk of selecting the wrong model. Watch the webcasts at http://www.jmp.com/uk/about/events/webcasts/
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
Machine learning models, especially deep neural networks have been shown to reveal membership information of inputs in the training data. Such membership inference attacks are a serious privacy concern, for example, patients providing medical records to build a model that detects HIV would not want their identity to be leaked. Further, we show that the attack accuracy amplifies when the model is used to predict samples that come from a different distribution than the training set, which is often the case in real world applications. Therefore, we propose the use of causal learning approaches where a model learns the causal relationship between the input features and the outcome. An ideal causal model is known to be invariant to the training distribution and hence generalizes well to shifts between samples from the same distribution and across different distributions. First, we prove that models learned using causal structure provide stronger differential privacy guarantees than associational models under reasonable assumptions. Next, we show that causal models trained on sufficiently large samples are robust to membership inference attacks across different distributions of datasets and those trained on smaller sample sizes always have lower attack accuracy than corresponding associational models. Finally, we confirm our theoretical claims with experimental evaluation on 4 moderately complex Bayesian network datasets and a colored MNIST image dataset. Associational models exhibit upto 80\% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess. Our results confirm the value of the generalizability of causal models in reducing susceptibility to privacy attacks. Paper available at https://arxiv.org/abs/1909.12732
Causal Inference in Data Science and Machine LearningBill Liu
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
This presentation is based on ``Statistical Modeling: The two cultures'' from Leo Breiman. It compares the data modeling culture (statistics) and the algorithmic modeling culture (machine learning).
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
Slides from keynote presentation at 3rd Taiwan Summer Workshop in Information Management (TSWIM) by Galit Shmueli on "To Explain or To Predict? Predictive Analytics in Information Systems Research"
Slides accompanying Malcolm Moore’s 2014 webcast on statistical and predictive modelling where he demonstrates JMP as an effective tool for exploratory data analysis, and JMP Pro as an expert modelling tool that scales to any number of Xs and Ys, is effective with messy data, and reduces the risk of selecting the wrong model. Watch the webcasts at http://www.jmp.com/uk/about/events/webcasts/
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
Machine learning models, especially deep neural networks have been shown to reveal membership information of inputs in the training data. Such membership inference attacks are a serious privacy concern, for example, patients providing medical records to build a model that detects HIV would not want their identity to be leaked. Further, we show that the attack accuracy amplifies when the model is used to predict samples that come from a different distribution than the training set, which is often the case in real world applications. Therefore, we propose the use of causal learning approaches where a model learns the causal relationship between the input features and the outcome. An ideal causal model is known to be invariant to the training distribution and hence generalizes well to shifts between samples from the same distribution and across different distributions. First, we prove that models learned using causal structure provide stronger differential privacy guarantees than associational models under reasonable assumptions. Next, we show that causal models trained on sufficiently large samples are robust to membership inference attacks across different distributions of datasets and those trained on smaller sample sizes always have lower attack accuracy than corresponding associational models. Finally, we confirm our theoretical claims with experimental evaluation on 4 moderately complex Bayesian network datasets and a colored MNIST image dataset. Associational models exhibit upto 80\% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess. Our results confirm the value of the generalizability of causal models in reducing susceptibility to privacy attacks. Paper available at https://arxiv.org/abs/1909.12732
Causal Inference in Data Science and Machine LearningBill Liu
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
This presentation is based on ``Statistical Modeling: The two cultures'' from Leo Breiman. It compares the data modeling culture (statistics) and the algorithmic modeling culture (machine learning).
GradTrack: Getting Started with Statistics September 20, 2018Nancy Garmer
Dr. Gary Burns, Professor, School of Psychology, Florida Institute of Technology Evans Library Introduction to Statistics: Don't be afraid
Video presentation with audio available on YouTube:http://bit.ly/GradTrackStatistics2018
YouTube Presentation: http://bit.ly/GradTrackStatistics2018
Dr. Gary Burns, Professor, School of Psychology, Florida Institute of Technology, Evans Library GradTrack Workshop
Researchers use several tools and procedures for analyzing quantitative data obtained from different types of experimental designs. Different designs call for different methods of analysis. This presentation focuses on:
T-test
Analysis of variance (F-test), and
Chi-square test
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Galit Shmueli
Keynote address by Galit Shmueli at 2016 Israeli Conference on Mechanical Engineering (ICME), Technion, Israel (Nov 23, 2016). http://icme2016.net.technion.ac.il/
E.SUN Academic Award presentation (Jan 2016)Galit Shmueli
This is my presentation at the 2016 Chinese New Year Banquet of NTHU's College of Technology Management. In this 15-min presentation, I describe my entrepreneurial approach to analytics, and the two papers that won me the E.SUN Academic Award.
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
Slides by Galit Shmueli for keynote presentation at 2015 Statistical Challenges in eCommerce Research (SCECR) symposium, Addis Ababa, Ethiopia (www.scecr.org)
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...Galit Shmueli
Prof. Galit Shmueli introduces and describes the NTHU-EZTABLE data mining contest on Kaggle.com (talk at Taiwan's National Tsing Hua University, Oct 29, 2014). https://inclass.kaggle.com/c/predict-repeat-restaurant-bookings
Linear Probability Models and Big Data: Kosher or Not?Galit Shmueli
Slides from Galit Shmueli's talk at the 10th Statistical Challenges in eCommerce Research (SCECR) symposium, Tel Aviv, Israel.
http://scecr.org/scecr2014/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Repurposing Classification & Regression Trees for Causal Research with High-Dimensional Data
1. Repurposing Classification & Regression Trees
for Causal Research with High-Dimensional Data
Galit Shmueli 徐茉莉
Institute of Service Science
WOMBAT 2019
Monash University
2. We tackle 2 key issues in causal research:
Self Selection
Identifying Confounders
3. A Tree-Based Approach
for Addressing Self-Selection
in Impact Studies with Big Data
Yahav, Shmueli & Mani (2016), A Tree-Based Approach for Addressing Self-Selection
in Impact Studies with Big Data, MIS Quarterly, vol 40 no 4, pp. 819-848.
With Inbal Yahav (Tel Aviv U) & Deepa Mani (Indian School of Business)
4. The Challenge in Impact Studies
• Individuals/firms self-select intervention
group/duration (quasi-experiment)
• Even in randomized experiments, some variables
might remain unbalanced in sample
How to identify and adjust for self-selection?
7. Common Approaches
for Addressing Self-Selection
Two steps:
1. Selection model: T = f(X)
2. Performance analysis on matched samples
Y = performance measure(s)
T = treatment
X = pre-intervention variables
Self-selection:
P(T|X) ≠P(T)
• 2SLS modeling (Heckman correction) -- econometrics
• Propensity Score Approach (PS) -- statistics
8. Propensity Scores Approach
Step 1: Estimate selection model logit(T) = f(X)
to compute propensity scores P(T|X)
Step 3: Estimate Effect (compare matched groups)
e.g., t-test or Y = b0 + b1 T+ b2 X+ b3 PS +e
Step 2: Use scores to create matched samples
PSM = use matching algorithm
PSS = divide scores into bins
9. Challenges of PS in Big Data
1. Matching leads to severe data loss
2. PS methods suffer from “data dredging”
3. No variable selection (what drive selection?)
4. Assumes constant intervention effect
5. Sequential process is computationally costly
6. Logistic model requires specifying exact form
of selection model
11. Proposed Method: Tree
Output: T (treat/control)
Inputs: X’s (income, education, family…)
Records in each terminal node
share same profile (X) and same
propensity score P(T=1|X)
12. Tree-Based Approach
Four steps:
1. Run selection model: fit tree T = f(X)
2. Visualize tree; see unbalanced X’s
3. Treat each terminal node as sub-sample;
conduct terminal-node-level performance
analysis
4. Present terminal-node-analyses visually
5. [optional]: combine analyses from nodes with
homogeneous effects
Like PS, assumes observable self-selection
13. Three Applications (MISQ 2016)
1. Impact of labor training on earnings
(Famous) randomized experiment by US gov
2. Impact of new online passport service on
bribing, efficiency,…
Quasi-experiment by India gov
3. Impact of outsourcing contract pricing &
duration on financial performance
14. Study 1: Impact of training on financial gains
In mid-1970’s US govt program randomly
assigned eligible candidates to labor training
program
• Goal: increase future earnings
• LaLonde (1986) showed:
Groups statistically equal in terms of demographic
& pre-train earnings
ATE = $1794 (p<0.004)
15. Tree on Lalonde’s RCT data
If groups are completely
balanced, we expect…
Y = Earnings in 1978
T = Received NSW training (T = 1) or not (T = 0)
X = Demographic information and prior earnings
16. Tree reveals…
LaLonde’s naïve approach
(experiment)
Tree approach
HS dropout
(n=348)
HS degree
(n=97)
Not trained (n=260) $4554 $4,495 $4,855
Trained (n=185) $6349 $5,649 $8,047
Training effect
$1794
(p=0.004)
$1,154
(p=0.063)
$3,192
(p=0.015)
Overall: $1598
(p=0.017)
no yes
High school
degree
1. Unbalanced variable (HS degree)
2. Heterogeneous effect
17. Labor Training effect:
Observational control group
• LaLonde also compared with observational
control groups (PSID, CPS)
– experimental training group vs. obs control
– showed training effect not estimated correctly with
structural equations
• Dehejia & Wahba (1999,2002) re-analyzed CPS
control group (n=15,991), using PSM
– Effects in range [$1122, $1681], depends on settings
– “Best” setting effect: $1360
– Uses only 119 control group members (out of 15,991)
18. Tree for obs control group reveals…
unemployed prior to training
in 1974 (u74=0 )
-> negative effect outlier
eligibility
issue!
some profiles are rare in
trained group but
popular in control group
1. Unbalanced variables
2. Heterogeneous effect in u74
3. Outlier
4. Eligibility issue
19. Study 2:
Impact of eGov Initiative
(India)
Survey commissioned by Govt of India in 2006
• >9500 individuals who used passport services
• Representative sample of 13 Passport Offices
• “Quasi-experimental, non-equivalent groups design”
• Equal number of offline and online users, matched by
geography and demographics
21. % bribe RPO
% use agent
% prefer online
% bribe police
Naive By Aware / Unaware
online onlineonline
Awareness of electronic services
provided by Government of India
Simpson’s
Paradox
1. Demographics properly balanced
2. Unbalanced variable (Awareness)
3. Heterogeneous effects on various y’s
+ even Simpson’s paradox
24. Scaling Up to Big Data
• We inflated eGov dataset by bootstrap
• Up to 9,000,000 records and 360 variables
• 10 runs for each configuration: runtime for tree
20 sec
25. Big Data Simulation
Binary intervention
T = {0, 1}
Continuous intervention
T∼ N
Sample sizes (n) 10K, 100K, 1M
#Pre-intervention
variables (p)
4, 50 (+interactions)
Pre-intervention
variable types
Binary, Likert-scale, continuous
Outcome
variable types
Binary, continuous
Selection models
#1: P (T=1) = logit (b0 + b1 x1 +…+ bp xp)
#2: P (T=1) = logit (b0 + b1 x1 +…+ bp xp + interactions)
Intervention
effects
1. Homogeneous
Control: E(Y | T = 0) = 0.5
Intervention: E(Y | T = 1) = 0.7
2. Heterogeneous
Control: E(Y | T = 0) = 0.5
Intervention: E(Y | T = 1, X1=0) = 0.7
E(Y | T = 1, X1=1) = 0.3
1. Homogeneous
Control: E(Y | T = 0) = 0
Intervention: E(Y | T = 1) = 1
2. Heterogeneous
Control: E(Y | T = 0) = 0
Intervention: E(Y | T = 1, X1=0) = 1
E(Y | T = 1, X1=1) = -1
27. Big Data Scalability
Theoretical Complexity:
• O(mn/p) for binary X
• O(m/p nlog(n) ) for continuous X
Runtime as function of sample size, dimension
28. Scaling Trees Even Further
• “Big Data” in research vs. industry
• Industrial scaling
– Sequential trees: efficient data structure, access
(SPRINT, SLIQ, RainForest)
– Parallel computing (parallel SPRINT, ScalParC,
SPARK, PLANET) “as long as split metric can be
computed on subsets of the training data and
later aggregated, PLANET can be easily extended”
29. Tree Approach Benefits
1. Data-driven selection model
2. Scales up to Big Data
3. Less user choices (data dredging)
4. Nuanced insights
• Detect unbalanced variables
• Detect heterogeneous effect from anticipated outcomes
5. Simple to communicate
6. Automatic variable selection
7. Missing values do not remove record
8. Binary, multiple, continuous interventions
9. Post-analysis of RCT quasi-experiments & observational studies
30. Tree Approach Limits
1. Assumes selection on observables
2. Need sufficient data
3. Continuous variables can lead to large tree
4. Instability
[possible solution: use variable importance scores (forest)]
31. Detecting
Simpson’s Paradox
in Big Data
Using Trees
Shmueli & Yahav (2017), The Forest or the Trees? Tackling Simpson’s Paradox with
Classification Trees, Production & Operations Management Journal, Forthcoming
With Inbal Yahav, Tel Aviv University
32. Simpson’s Paradox
The direction of a cause on an effect appears reversed when
examining aggregate vs. disaggregate of a sample (or population)
Simpson's Paradox is the reversal
of an association between two
variables after a third variable is
taken into account
Schield (1999)
The phenomenon whereby an event B
increases the probability of A in a given
population p, at the same time, decreases the
probability of A in every subpopulation of p.
Pearl (2009)
33. Death Sentence and Race
(Agresti, 1984)
Does defendant's race (X) affect
chance of death sentence (Y)?
Causal explanation:
Black murderers tend to kill blacks;
hence lower overall death sentence rates
Causal effect seems
to reverse when
disaggregating by
victim race (Z)
34. Goal: Does a dataset exhibit SP?
C = confounder
E = effectA = cause
P (E|C ) – P( E|C’ ) P (E|A ) – P(E|A’ )
“If Cornfield’s minimum effect size is not reached,
[you] can assume no causality” Schield, 1999
Cornfield et al’s Criterion
35. Translate Cornfield’s Criterion
into a Tree
Y = outcome of interest
X = causal variable
Z = confounding variable(s)
Tree Predictors
#1
If cause -> effect, then
cause should appear in tree
#2
If Z is confounder, then
Z should appear in tree
36. 5 potential tree structures
- single causal variable (X)
- single confounding variable (Z)
Which might exhibit Simpson’s Paradox?
P (E|C ) – P( E|C’ ) P (E|A ) – P(E|A’ )
37. Simpson’s Paradox on a Tree
#1
If cause -> effect, then cause
should appear in tree
#2
If Z is confounder, then
Z should appear in tree
#3
Z should appear before cause
(Cornfield criterion)
39. Accounting for Sampling Error
Logistic/linear regression:
Interaction X*Z significant?
No → no paradox
Yes → ?
Trees:
Tree structure + significance of interaction
= conditional-inference tree
Tree splits based on statistical tests
(c2, F , permutation tests)
40. Tree Approach #2:
Conditional Inference tree
(Hothorn et al., JCGS 2006)
Variable selection based on statistical test (c2)
• Recursive partitioning with early stopping
• Separate steps for variable selection and split search
• R packages party, partykit (function ctree)
44. Seatbelts and Injuries (Agresti 2012)
Does use of seat-belts (X) reduce chance of injury (Y)?
Z = Passenger gender and accident location
n=68,694 passengers involved in accidents in Maine
Potential Paradox
(by location)
How about logistic regression?
47. Multiple Potential Confounders (Z)
The Challenge
Statistical significance of
Simpson’s paradox
≠
Significance threshold of
tree splits in CI treeCI Tree Full Tree
Solution: X-Terminal Tree
48. Paradox Detection in Big Data
(Tree Approach #3):
X-Terminal Trees
X-Terminal Tree:
Grow tree only
until X-splits
49. Tree paths with terminal X nodes
can indicate…
• Full paradox, statistically significant
• Partial paradox, statistically significant
• Statistically insignificant paradox
• No paradox
Pivot table equivalence:
Filter by Z variables above terminal X node
50. Impact of eGov Initiative
(India)
Survey commissioned by Govt of India in 2006
• >9500 individuals who used passport services
• Representative sample of 13 Passport Offices
• “Quasi-experimental, non-equivalent groups design”
• Equal number of offline and online users, matched by
geography and demographics
51. Y = police bribe (0/1)
X = online/offline
Z = {demographics; survey Qs}
Split
p=.32
Paradox p=0.003Paradox p=0.16
No paradox
52. Kidney Allocation in USA
(104,000 patients, 19 confounders)
Is the kidney allocation system racist?
Type 4 tree, but no significant Simpson’s paradox
detected!
Y = waiting time (days)
X = patient race
Z = {patient demog, health, bio}
53. • Greediness of tree
• Weak paradox or in
small subset of data
can go undetected
• Highly correlated Z’s
might lead to
“wrong” Z choice
Summary & Challenges
Full tree: eliminate non-type-4 trees
Conditional-inference trees: for single Z
X-terminal trees: for multiple Zs
• More efficient than
stepwise regression
• Tree structure more
informative than
interaction terms
• Extends: continuous Y,
>2 subpopulations
54. We tackle 2 key issues in causal research:
Self Selection
Identifying Confounders
55. Anal yt ics
Humanit y
Responsibil it y
Galit Shmueli 徐茉莉
Institute of Service Science
Editor's Notes
Heckman correction builds selection model based on economic theory
Proof for Entropy
Blue: no paradox (same as overall direction), then splitting stops when reaches online/offline (X)
Orange: although p-value is very large, we still have a split