Big	Mechanism:	deep	reading	
for	cancer	biology	
Sophia	Ananiadou	
Na-onal	Centre	for	Text	Mining	
School	of	Computer	Science	
Manchester	Ins-tute	of	Biotechnology	
The	University	of	Manchester
Overview	
•  From	text	to	knowledge		
–  Events		
–  Big	Mechanism	
•  Mining	textual	(un)certainty		
•  Visualising	and	ranking	evidence	(LitPathExplorer)	
	
2
Pathway	construc-on	
3	
mTOR	pathway:	964	en--es,	777	
reac-ons,	519	papers	
Caron,	et	al.	Mol	Syst	Biol.,	6(1)	
A	MANUAL	PROCESS	
Inevitable	gaps	building	
models
Knowledge	
4	
•  Key	to	understanding	biological	systems	
•  Models	need	verifica-on	and	maintenance	
(i.e.,	annota-on/cura-on)	
•  Scale	and	speed	of	literature	challenging		
•  Annota-on/cura-on	remains	largely	a	
manual	task	of	incorpora-ng	knowledge	
from	scien-fic	publica-ons	
Pathways
Mo-va-on	
5	
To	support	pathway	construc-on	and	design	of	
experiments	
•  Extrac-ng	evidence	from	literature	
•  Events,	en--es,	contextual	interpreta-on
6	
The	Big	Mechanism:	reading,	assembly,	experiments			
hZp://nactem.ac.uk/big_mechanism/
From	concepts	to	events	
1	Concept	recogni-on	
2	Interac-on	recogni-on	
3	Concept	and	interac-on	iden-fica-on	
DrugBank:DB06712		 DrugBank:DB00682	 DrugBank:DB04610
EventMine	
•  Machine	learning	pipeline	event	extrac-on	system		
–  Rich	linguis-c	features	
•  Several	parse	results:	deep	parser	(Enju),	dependency	parser	
•  Dic-onaries	
•  Coreference	resolu-on,	domain	adapta-on,	filtering	
8	
hZp://www.nactem.ac.uk/EventMine/	
Miwa,	M.	&	Ananiadou,	S.	(2015)	Adaptable,	high	recall,	event	extrac-on	system	with	minimal	
configura-on,	BMC	Bioinforma,cs,	16(10),	S7	
Miwa,	M.,	Thompson,	P.	and	Ananiadou,	S.	(2012)	Boos+ng	automa+c	event	extrac+on	from		
the	literature	using	domain	adapta+on	and	coreference	resolu+on.	Bioinforma,cs,	28(13)	
Miwa,	M.,	Pyysalo,	S.,	Ohta,	T.	and	Ananiadou,	S.	(2013).	Wide	coverage	biomedical	event	
extrac+on	using	mul+ple	par+ally	overlapping	corpora.	BMC	Bioinforma<cs,	14(175)
Event	interpreta-on	
for	
Binding	Protein	
in	binding	to	MUC1	
Theme	1	
RAS	suggest	Results	
Event	trigger	
PKM2	
Protein	
Theme	2	
Event		
argument	
En-ty	
argument	
Chemical	
is	
Regula<on	
BRAF	 required	
Cause	
not	that	
SIMPLE	EVENT	
COMPLEX	EVENT	
Theme	
*Complex	events	have	at	
lest	one	argument	that	is	
an	event	on	its	own	
Event	trigger
Event	interpreta-on	
•  Supports	users	of	search	systems		
–  Discovery	of	new	knowledge,	research	hypotheses	
–  Detec-on	of	uncertainty	as	confidence	measure		
•  Mul-ple	dimensions	(meta-knowledge)	
–  Knowledge	Type	(observa-on,	inves-ga-on,	analysis,	
method,	fact)	
–  Knowledge	Source	(current,	other)	
–  Polarity	(posi-ve,	negated)			
	
10	
Thompson,	P.,	Nawaz,	R.,	McNaught,	J.	and	Ananiadou,	S.	2011.	Enriching	a	biomedical	
event	corpus	with	meta-knowledge	annota+on.	BMC	Bioinforma<cs	12,	393
(Un)certainty:	a	measure	of	confidence	
	
•  Is	this	a	fact,	a	hypothesis,	a	speculated	outcome,	a	
case	under	inves-ga-on,	a	certain	or	uncertain	
interac-on?	
	
•  How	this	informa+on	can	help	pathway	construc+on?	
11	
Zerva,	C.,	Ba-sta-Navarro,	R.,	Day,	P.	and	S.	Ananiadou	(2017)	Using	uncertainty	to	link	and	
rank	evidence	from	biomedical	literature	for	model	reconstruc-on,	Bioinforma+cs
Uncertainty	
Examples	from	Big	Mechanism	data	
These	results	indicate	that	FLCN	can	interact	directly	with	RagA	via	its	GTPase	domain.	
Altogether,	these	results	show	that	cobalt	could	affect	both	p53	and	HIPK2	ac-vity.	
To	test	if	endogenous	hPGAM5	interacts	with	hPINK1,	we	first	generated	an	an--hPGAM5	
an-body	
We	hypothesize	that	unphosphorylated	cdr2	interacts	with	c-myc	to	prevent	c-myc	
degrada-on	
Therefore,	AFP	may	interact	with	STAT3	in	the	signal	pathway	for	chemotherapeu-c	efficiency	
of	agents	on	AFPGC.	
These	data	suggest	that	PI3K	and	βARK1	form	a	macromolecular	complex	within	the	cell.	
Therefore,	LiCl		might	inhibit	GSK3β	in	different	ways	
We	then	examined	whether	netrin-2	enhances	the	interac-on	between	Cdo	and	S-m1.
Uncertainty	cues	
13	
BioNLP-ST,	GENIA-MK
Hybrid	model:	Machine	learning	+	Rules	
Hybrid	model	
Machine	Learner	(Random	Forest)	
1.  Lexical	(e.g.	cues,	POS	tags,	event-trigger	surface	form)	
2.  Syntac-c	(e.g.	shortest	path,	dependency	cue-trigger)	
3.  Seman-c	(e.g.	event	type,	argument	type/role)	
Automated	Rule	Induc<on	(from	corpus)	
1.  EventMine	(to	iden-fy	event	triggers)	
2.  Enju	(to	iden-fy	dependencies)	
3.  Cue	lists	
BioNLP-ST	
GENIA-MK
Dependency	rela-ons	
•  Dependency	rela-ons	between	cues	and	event	triggers	
•  Rule	induc-on:	generic	rule	paZerns	capturing	
dependency	rela-ons	between	cues	and	trigger	
words
Rule	induc-on	
16
Dealing	with	mul-ple	event	men-ons
Results	
•  Event-annotated	corpora:	
	
	
	
	
All	results	obtained	using	10-fold	cross	valida-on
Evalua-on	-	pathway	models	
•  B-cell	acute	lymphoblas-c	leukemia	model	
(Pathway	studio)	
	
– 72	interac-ons,	260	evidence	passages	manually	
selected			
– 12%	flagged	uncertain	by	our	system
Results	
	Leukemia	Pathway	(7	annotators)	~	Pathway	Studio	
•  Average	accuracy	on	sentence	level:	0.96	
•  Average	accuracy	on	interac-on	level:	0.87	
–  1-20	sentences	per	interac+on
Event	interpreta-on		
•  Uncertainty	scoring	as	an	expressive	confidence	
measure	
•  Hybrid	framework		
•  Value	for	each	event	men-oned	in	a	sentence	
–  Consolidated	uncertainty	values	from	different	papers	
•  Effort	to	decrease	manual	effort	and	select	more	
certain	events		
	
21
Deep	Reading:	Integra-ng	uncertainty		
•  LitPathExplorer	
–  Visual	analy-cs	tool;	maps	events	from	literature	to	pathway	
interac-ons	
–  Includes	uncertainty	measure	
•  Robot	Scien-st	
–  Selec-on	of	Gene	expression	and	Regula+on	events	for	wet-
lab	experiments	
–  Selec-on	of	interac-ons	(using	LitPathExplorer)	to	assemble	
network	and	predict	drug	effect	on	cell-lines
LitPathExplorer:	a	confidence-based	
tool	for	exploring	pathway	models	
1.  Enabling	flexible	search	and	explora-on	of	
biomolecular	pathway	networks		
–  different	views	of	the	data	
–  various	interac-ve	func-onali-es	
2.  Provide	a	means	for	making	exis-ng	evidence	in	the	
scien-fic	literature	available	to	support	corrobora-on	
3.  Facilitate	the	discovery	of	new	interac-ons	that	are	
not	yet	part	of	a	given	model	
4.  Allow	the	user	to	become	an	ac-ve	par-cipant	of	the	
analy-cal	process	
quan-fy	confidence	
in	the	events	
23	
Video:	hZp://nactem.ac.uk/LitPathExplorer_BI/LitPathExplorer.mp4
1.	Search	
•  A	pathway	model	can	be	searched	by	
providing:	
•  event	types,		
•  en--es,		
•  and/or	roles	for	each	en-ty	in	the	
reac-on	
•  Mul-ple	queries	can	be	combined	in	
a	Boolean	search	
24
2.	Network	viewer	
Reading	against	the	model	
25	
En--es	
Reac-ons/	
Events	
•  Colour	encodes	event	type	
•  Size	encodes	confidence
3.	Inspector,	event	confidence	
computa-on	
26	
Mapping	IDs	for		
en--es	and	events	
Overall	event	
confidence
3.	Inspector,	quan-fying	the	confidence	
27	
Confidence		
breakdown
Adjus-ng	event	confidence	
28
4.	Text	Analyzer	–	Ar-cles	&	sentences	
29	
Sentence-level	language	
confidence	
Ar-cle-level	language	
confidence
4.	Word	tree	visualisa-on:	Contrast	
event	men-ons	across	the	corpus	
30	
Sentences	can	be	inspected	further	
upon	interac-on	
Ver-cal	arrangement	and	gray	scale	
denotes	event	confidence
Use	case	
•  A	pathway	model	
•  contains	reac-ons	involving	the	Ras	protein	
•  output	of	querying	PathwayCommons	for	one-	and	two-hop	
reac-ons	centred	on	Ras	
•  A	corpus	of	12,660	full	papers	
•  Retrieved	from	the	PubMed	Central	Open	Access	repository		
•  using	as	queries	“breast	cancer”	and	its	synonyms	as	
keywords,	combined	with	names	of	breast	cancer	cell	lines,	
e.g.,	“T-47D”,	“MCF-7”	(and	their	variants).	
•  Methods	for	event	extrac-on,	model	mapping	and	
confidence	computa-on	were	applied	on	the	events	
extracted	from	the	corpus	
31
Network	Viewer:	Discovery	mode	
Extending	the	model	with	events	found	in	the	
literature	
32
33	
Discovery	mode	
Difficult	to	explore	when	too	many	candidate	events	are	found
Verifying	men-ons	in	text	
34	
hZp://nactem.ac.uk/LitPathExplorer/
Na#onal Centre for Text Mining
•  1st	publicly	funded	na-onal	text	
mining	centre		
•  Loca-on:	Manchester	Ins-tute	
of	Biotechnology	
•  Since	2004-	
•  Fully	sustainable	2011-	
•  Biology,	Medicine,	Biodiversity,	
Humani-es,	Social	Sciences	
www.nactem.ac.uk	
BBSRC,	AHRC,	EPSRC,	MRC,	JISC,	NIH,	DARPA,	H2020	
AZ,	Unilever,	Pfizer,	Elsevier,	Nature,	BBC,	KISTI,	AIST

OSFair2017 Workshop | Big Mechanism: deep reading for cancer biology