Exploratory	Research	is	More	
Reliable	than	Confirmatory	Research	
Clark	Glymour	
Carnegie	Mellon	University	
1
Confirmatory	“Logic”	
	
•  Hypothesis	H	:	A,	B	are	causally	connected	
•  Null	hypothesis:		A,	B	are	independent	
•  Have	data	D,	choose	test	staIsIc	S,	chose	
alpha		
•  Reject	null	hypothesis	if	S(D)	<	alpha	
•  If	null	is	rejected,	H	is	confirmed.	
2
The	Argument	Against	“Confirmatory	
Research”		
•  Many,	many	possible	posiIve	hypotheses	(typically	of	
causal	effects)	in	a	domain	(psychology,	epidemiology,	
biomedical	science).		
•  Most	are	false		(way	fewer	than	10%	are	true).	
•  SelecIon	of	which	hypotheses	to	test	is	independent	
of	their	truth.	
•  Most	tests	are	at	alpha	=	.05	and	power	<	.8	
•  PosiIve	published	results	are	from	rejected	null	
hypotheses.	
•  Conclusion:		“if	you	use	p=0.05	as	a	criterion	for	
claiming	that	you	have	discovered	an	effect	you	will	
make	a	fool	of	yourself	at	least	30%	of	the	Ime”	,	
Colquhuon,	2012,	J	Roy	Soc.	Open	
3
The	Base	Rate	CalculaIon	Illustrated	
•  Suppose	N	>>>	1	hypotheses,	10	%of	which	are	true	posiIves	(cause-
effect),	for	each	of	which	the	null	hypothesis	of	independence	is	
tested	independently	with	alpha	=	Pr(rejecIng	null		|	null	is	true		)	=		
0.05;	power	w	=		(probability	of	rejecIng	the	null	when	the	
alternaIve	is	true	=	.8	
•  Probability	of	finding	a	false	posiIve	associaIons:	
Pr(reject	null	|	null	true)	x	Pr(null	true)	=.05	x	.9	=	.045	
•  Probability	of	finding	a	posiIve	associaIon:	
[.045	+	Pr(reject	null	|	alt	true)	x	Pr(alt	true)	=	.045	+(	w	x	.1)	
•  RaIo	of	true	posiIves	found	to	all	posiIves	found:	
•  .045	/(.045	+	.8	x	.1)	
•  E.g.,	if	Power	=	.8,	alpha	=	.05	expected	propor9on	of	false	posi9ves	
is	.36	
		 4
Why	Do	ScienIsts	Publish	
“Confirmatory	Studies”	at	.05?	
•  Because	they	think	they	know	most	of	the	
causal	relaIons?	
•  Because	if	they	used	a	lower	alpha	their	
results	would	not	be	“significant”?	
•  Because	a	real	search	would	find	that	they	
can’t	infer	much	from	the	data?	
•  Publish	or	perish!	
5
The	Argument	that	“Exploratory,”	“High	
Throughput”	Research	Is	the	Worst	
•  Because	it	tests	more	hypotheses,	and	so	produces	more	false	
posi:ve	effects:	
•  ‘”The	greater	the	number	and	the	lesser	[sic]	the	selecIon	of	
tested	relaIonships	in	a	scienIfic	field,	the	less	likely	the	
research	findings	are	to	be	true.	Thus	research	findings	are	
more	likely	true	in	confirmatory	designs…than	in	hypothesis	
generaIng	designs”	[because	in	exploratory	studies	a	lot	of	
false	hypotheses	are	tested,	and	the	more	that	are	tested,	the	
more	errors	will	be	made.]	”Fields	considered	highly	
informaIve	and	creaIve	given	the	wealth	of	the	assembled	
and	tested	informaIon,	such	as	microarrays	and	other	high-
throughput	discovery	oriented	research…should	have	
extremely	low	PPV”	[PosiIve	PredicIve	Value,	the	probability	
that	a	reported	result	is	true].	(p.	0698).	(Ioannidis,	2005,	
PLOS	Medicine).	
6
Balderdash!	Ignorance!	Dogma!	
SupersIIon!		
•  Search	for	causal	relaIons	is	just	parameter	
es:ma:on.	
•  Stop	thinking	in	terms	of	tesIng	and	confirmaIon,	
think	accuracies	of	es:mators—hypothesis	tests	are	
just	cogs	in	an	esImaIon	procedure.	
•  Consistent	esImators	for	rare	relaIons	exist,	
employing	either	(quasi)	Bayesian	calculaIons	or	
classical	hypothesis	tests,	or	both.	
•  The	procedures	never	postulate	a	connecIon	without	
mulIple	assessments.	
•  Appropriately	used,	the	procedures	are	amazingly	
accurate.	
	
7
Example:	SGS	Algorithm	
Variables:		X1,	X2,…,XN	
Xp1	–	X2	is	inferred	if	and	only	if	the	null	
hypotheses	X1	||	X2	|	Z	are	rejected	for	each	
and	every	set	Z	⊆	{X2,…,XN}.	
In	PC	subsets	to	test	on	are	selected	
dynamically,	but	the	tests	are	equivalent	to	SGS	
assuming	Faithfulness.	
	
	
	 8
Example:	FGS	Algorithm	
•  IteraIve	algorithm	starIng	with	totally	
disconnected	graph	of	variables.	ConnecIon	is	
added	only	if	it	improves	the	likelihood	more	than	
any	other	addiIon	or	none	and	more	than	k	ln(S)	
where	k	is	posiIve	and	S	is	the	sample	size.	
•  E.g.,	in	the	first	step	a	single	connecIon	is	added	
only	if	it	improves	the	likelihood	sufficiently	and	
more	than	the	addiIon	of	any	other	edge.	For	a	
million	variable	case	a	edge	is	added	only	if	its	
likelihood	is	be{er	than	~	1012	alternaIves.	
9
SimulaIons	
Accuracies	for	causal	graph	recovery	with	Fast	Greedy	Search	
(FGS)	for,	Linear	Gaussian	data,	sample	size	1,000:		
																																																			
Similar	results	with	PC-Max.	
10
The	“AnI-ExploraIon	Argument”	Has	
Everything	Backwards	
•  With	very	sparse	causal	relaIons,	automated	
search	number	of	variables	>>	sample	size	
			has	in	the	simulaIons	
<	2%	false	posiIve	causal	connecIons		
<	2%	false	direcIons	
The	sparser	the	graph,	the	more	accurate	the	
posiIve	results	of	the	procedure.	
	
	 11
Empirical	Results	from	
•  Economics	
•  Ecology	
•  Planetary	science	
•  Climate	science	
•  Gene	regulaIon	
•  EducaIonal	Research	
•  Neuropsychology	
•  Etc.	
12
Why?	
•  The	procedures	are	asymptoIcally	correct.	
•  They	use	data	in	which	LOTS	of	variables	have	been	
measured.	
•  Each	posiIve	causal	claim	is	tested	or	assessed	mulIple	
Imes,	against	mulIple	compeIng	hypotheses	in	mulIple	
subsamples	of	the	data.	
•  The	procedures	are	biased	against	posiIve	results.	
•  The	procedures	have	an	adjustable	bias	against	weak	
effects	and	in	favor	of	strong	effects,	and	can	be	used	to	
find	the	variables	with	the	strongest	total	effect	size	for	an	
outcome	of	interest.	
•  The	reverse	of	Ionnaidis’	concern	about	rare	posiIve	
relaIons	holds:	the	procedures	are	most	reliably	accurate,	
most	informaIve,	and	most	feasible	when	the	true	posiIve	
causal	relaIons	are	rare.	 13
Morals	
•  Research	on	fast,	reliable	algorithms	for	causal	
esImaIon	in	a	variety	of	se}ngs	is	where	the	acIon	is	
and	should	be:	Latent	structure,	feedback	relaIons,	
mixed	populaIons,	sample	selecIon	bias,	Ime	series—
all	in	“high	dimensions.”	
•  Almost	everything	said	and	wri{en	in	staIsIcs	about	
the	superiority	of	“confirmatory”	research	and	the	
evils	of	data	driven	hypothesis	search	is	wrong,	very	
wrong.	
•  “I	and	my	friends	can’t	think	of	a	way	to	do	X,	
therefore	X	is	impossible”	is	a	crummy	inference.	
	 14
Read,	Then	Work	
•  P.	Spirtes,	et	al.,	Causa:on,	Predic:on	and	Search	
•  B.	Shipley,	Causa:on	and	Correla:on	in	Biology,	2nd	
ediIon	
•  J.D.	Ramsey	
•  Buhlmann	
•  Webpages	of:	
	P.	Spirtes	
	T.	Richardson	
	Marloes	Maatuis	
	David	Bessler	
	Kevin	Hoover,			for	a	start	
	
15

Exploratory Research is More Reliable Than Confirmatory Research