Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
http://chalearn.org/
Causality and
Graph Reconstruction
MLconf 2015
Isabelle Guyon, ChaLearn
2
Motivation
BIG data makes lots of BIG promises, but…
… will the promises be held?
DIFFICULTY
VALUE
Classical statistics ...
What is
Causal graph reconstruction? http://chalearn.org/
4
Problem setting http://chalearn.org/
A
F
I
H
E
B
D G
C
J
INPUT
OUTPUT
5
Causal questions http://chalearn.org/
actions
…your health?
…climate
changes?… the economy?
What affects…
6
Scientific method http://chalearn.org/
7
Thanks to Jonas Peters for this example
Observe correlations http://chalearn.org/
8
Hypothesize causal relationships http://chalearn.org/
Thanks to Jonas Peters for this example
9
Hypothesize causal relationships http://chalearn.org/
Thanks to Jonas Peters for this example
10
Hypothesize causal relationships http://chalearn.org/
Chocolate Nobel
Chocolate Nobel
Chocolate Nobel
Chocolate Nobel
?
11
“Please test your
researchers for ten
years: Randomly pick
half of them and give
them chocolate for
desert and give app...
12







How far can we get
to improve causal hypotheses …
… to minimize the need for experiments?
13
• Pioneer work: Glymour, Scheines, Spirtes, Pearl (Turing
Award, 2011), Rubin, in the US, since the 80’s.
• New wave: H...
14
Game changing work:
Causality challenges http://chalearn.org/
Cause-Effect Pairs (2013)
Neural Connectomics (2014)
Caus...
15
To make a long story short… http://chalearn.org/
1. Discovering dependencies: easiest = classical
feature selection. Ha...
16
Cause-effect pair challenge
(2013) http://chalearn.org/
Initial impulse: Joris Mooij, Dominik Janzing, and Bernhard Sch...
17
Problem setting http://chalearn.org/
A
F
I
H
E
B
D G
C
J
INPUT
OUTPUT
18
Problem setting http://chalearn.org/
A
F
I
H
E
B
D G
C
J
INPUT
OUTPUT
A -> B ?
0 / 1
19
B =Temperature
A = log(Altitude)
A  B ? http://chalearn.org/
20
A  B A  B
Best fit: A  B http://chalearn.org/
21
The data:
A
B
Z
A  B
B
A
Z
A <- B
A B
Z
ZBZA
A  Z  B
A B A | B
Demographics:
Sex  Height
Age  Wages
Country  Educ...
22
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
267
test
The results:
http://chalearn.org/
23
Amazing: an operational
causation coefficient!
http://chalearn.org/
24
Neural connectomics
Challenge (2014) http://chalearn.org/
Coordinator:
Isabelle Guyon
Data Providers:
Demian Battaglia
...
25
Problem setting http://chalearn.org/
A
F
I
H
E
B
D G
C
J
INPUT
OUTPUT
26
Network deconvolution http://chalearn.org/
27
Conclusion
• Causal models:
– Better explain data.
– Make decisions.
• Challenges:
– Fair evaluations.
– Innovation.
• ...
28
http://chalearn.org/
Fully automatic machine learning
without ANY human intervention
automl.chalearn.org
December 2014 ...
Upcoming SlideShare
Loading in …5
×

Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15

1,148 views

Published on

Network Reconstruction: The Contribution of Challenges in Machine Learning: Networks of influence are found at all levels of physical, biological, and societal systems: climate networks, gene networks, neural networks, and social networks are a few examples. These networks are not just descriptive of the “State of Nature”, they allow us to make predictions such as forecasting disruptive weather patterns, evaluating the possible effect of a drug, locating the focus of a neural seizure, and predicting the propagation of epidemics. This, in turns, allows us to device adequate interventions or change in policies to obtain desired outcomes: evacuate people before a region is hit by a hurricane, administer treatment, vaccinate, etc. But knowing the network structure is a prerequisite, and this structure may be very hard and costly to obtain with traditional means. For example, the medical community relies on clinical trials, which cost millions of dollars; the neuroscience community engages in connection tracing with election microscopy, which take years before establishing the connectivity of 100 neurons (the brain contains billions).
This presentation will review recent progresses that have been made in network reconstruction methods based solely on observational data. Great advances have been recently made using machine learning. We will analyze the results of several challenges we organized, which point us to new simple and practical methodologies.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Isabelle Guyon, President, ChaLearn at MLconf SF - 11/13/15

  1. 1. 1 http://chalearn.org/ Causality and Graph Reconstruction MLconf 2015 Isabelle Guyon, ChaLearn
  2. 2. 2 Motivation BIG data makes lots of BIG promises, but… … will the promises be held? DIFFICULTY VALUE Classical statistics Machine learning What happened? How happened? Explicative power Forecasting power http://chalearn.org/ Decisional power What will happen?
  3. 3. What is Causal graph reconstruction? http://chalearn.org/
  4. 4. 4 Problem setting http://chalearn.org/ A F I H E B D G C J INPUT OUTPUT
  5. 5. 5 Causal questions http://chalearn.org/ actions …your health? …climate changes?… the economy? What affects…
  6. 6. 6 Scientific method http://chalearn.org/
  7. 7. 7 Thanks to Jonas Peters for this example Observe correlations http://chalearn.org/
  8. 8. 8 Hypothesize causal relationships http://chalearn.org/ Thanks to Jonas Peters for this example
  9. 9. 9 Hypothesize causal relationships http://chalearn.org/ Thanks to Jonas Peters for this example
  10. 10. 10 Hypothesize causal relationships http://chalearn.org/ Chocolate Nobel Chocolate Nobel Chocolate Nobel Chocolate Nobel ?
  11. 11. 11 “Please test your researchers for ten years: Randomly pick half of them and give them chocolate for desert and give apples to the other half. Then compare the number of Nobel prizes in the two populations.” Perform randomized controlled experiments http://chalearn.org/
  12. 12. 12        How far can we get to improve causal hypotheses … … to minimize the need for experiments?
  13. 13. 13 • Pioneer work: Glymour, Scheines, Spirtes, Pearl (Turing Award, 2011), Rubin, in the US, since the 80’s. • New wave: Hyvärinen, Schölkopf, Bühlmann in the EU. • Nobel prizes in econometrics: Haavelmo (1989), Granger (2003), Sargent and Sims (2011). • DARPA programs: Big mechanisms (2014), upcoming program (Schwartz, program manager). Landmark work http://chalearn.org/
  14. 14. 14 Game changing work: Causality challenges http://chalearn.org/ Cause-Effect Pairs (2013) Neural Connectomics (2014) Causation and Prediction (2007) Pot-luck challenge (2008)
  15. 15. 15 To make a long story short… http://chalearn.org/ 1. Discovering dependencies: easiest = classical feature selection. Hard to beat! 2. Removing spurious dependencies: harder and “dangerous” because removing good features is more harmful than keeping bad ones. 3. Orienting dependencies: hardest.
  16. 16. 16 Cause-effect pair challenge (2013) http://chalearn.org/ Initial impulse: Joris Mooij, Dominik Janzing, and Bernhard Schölkopf. Examples of algorithms and data: Povilas Daniušis, Arthur Gretton, Patrik O. Hoyer, Dominik Janzing, Antti Kerminen, Joris Mooij, Jonas Peters, Bernhard Schölkopf, Shohei Shimizu, Oliver Stegle, and Kun Zhang, Jakob Zscheischler. Datasets and result analysis: Isabelle Guyon + Mehreen Saeed + {Mikael Henaff, Sisi Ma, and Alexander Statnikov}, from NYU. Website and sample code: Isabelle Guyon + Phase 1: Ben Hamner (Kaggle) https://www.kaggle.com/c/cause- effect-pairs Phase 2: Ivan Judson, Christophe Poulain, Evelyne Viegas, Michael Zyskowski https://www.codalab.org/competitions/1381 Review, testing: Marc Boullé, Hugo Jair Escalante, Frederick Eberhardt, Seth Flaxman, Patrik Hoyer, Dominik Janzing, Richard Kennaway, Vincent Lemaire, Joris Mooij, Jonas Peters, Florin Popescu, Peter Spirtes, Ioannis Tsamardinos, Jianxin Yin, Kun Zhang. Mehreen Evelyne Joris Dominik Bernhard Kun Ben Alexander Marc
  17. 17. 17 Problem setting http://chalearn.org/ A F I H E B D G C J INPUT OUTPUT
  18. 18. 18 Problem setting http://chalearn.org/ A F I H E B D G C J INPUT OUTPUT A -> B ? 0 / 1
  19. 19. 19 B =Temperature A = log(Altitude) A  B ? http://chalearn.org/
  20. 20. 20 A  B A  B Best fit: A  B http://chalearn.org/
  21. 21. 21 The data: A B Z A  B B A Z A <- B A B Z ZBZA A  Z  B A B A | B Demographics: Sex  Height Age  Wages Country  Education Latitude  Infant mortality Ecology: City elevation  Temperature Water level  Algal frequency Elevation  Vegetation Dist. to hydrology  Fire Econometrics: Mileage  Car resell price Num.rooms  House price Trade price last day  Trade price Medicine: Cancer vol.  Recurrence Metastasis  Prognosis Age  Blood pressure Genomics (mRNA level): transcription factor  protein induced Engineering: Car model year  Horsepower Number of cylinders  MPG Cache memory  Compute power Roof area  Heating load Cement used  Compressive strength 20% 80% http://chalearn.org/
  22. 22. 22 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 267 test The results: http://chalearn.org/
  23. 23. 23 Amazing: an operational causation coefficient! http://chalearn.org/
  24. 24. 24 Neural connectomics Challenge (2014) http://chalearn.org/ Coordinator: Isabelle Guyon Data Providers: Demian Battaglia Javier Orlandi Jordi Soriano Fradera Olav Stetter Advisors: Gavin Cawley Gideon Dror Hugo-Jair Escalante Alice Guyon Vincent Lemaire Sisi Ma Eric Peskin Florin Popescu Bisakha Ray, Mehreen Saeed Alexander Statnikov Demian Olav Jordi Javier Bisakha Mehreen
  25. 25. 25 Problem setting http://chalearn.org/ A F I H E B D G C J INPUT OUTPUT
  26. 26. 26 Network deconvolution http://chalearn.org/
  27. 27. 27 Conclusion • Causal models: – Better explain data. – Make decisions. • Challenges: – Fair evaluations. – Innovation. • Machine Learning: – Novel approaches to causal discovery. – Operational “causation coefficient”: • First detect oriented pairs, then prune indirect effects and confounders. • First build undirected graph, then orient edges.
  28. 28. 28 http://chalearn.org/ Fully automatic machine learning without ANY human intervention automl.chalearn.org December 2014 – January 2016 $30,000 in prizes Thank you! AutoML Challenge

×