Citizen Science and Rare
Disease Research
Andrew Su, Ph.D.
@andrewsu
asu@scripps.edu
http://sulab.org
September 22, 2016
Personalized Health in the Digital Age Symposium
Slides: slideshare.net/andrewsu
2
Credit: http://www.slideshare.net/PhRMA/rare-disease-infographics
3
Credit: http://www.slideshare.net/PhRMA/rare-disease-infographics
Rare disease case study #1
4
Photo: Retta Beery
5
Bainbridge et al., STM, 2011
6
Photo: Retta Beery
Rare disease case study #2
7
8
… but no obvious treatments
9
Bainbridge et al., STM, 2011
SPR
What differentiates SPR and NGLY1?
10
SPR
11
Sarah Olmstead
https://flic.kr/p/364dZW
NGLY1
12
NGLY1
(11 PubMed articles)
Congenital disorders of
glycosylation
(822)
PNGase
(686)
ERAD
(1330)
glycosylation
(48,862)
alacrima
(164)
Genetic
interactors
(3016)
symptoms
(109,928)
25 million articles in PubMed
The biomedical literature is massive…
13
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1983 1988 1993 1998 2003 2008 2013
Number of new PubMed-indexed articles
… but it is very hard to query and compute
14
… but it is very hard to query and compute
15
Imatinib
Crizotinib
Erlotinib
Gefitinib
Sorafenib
Lapatinib
Dasatinib
…
Acute myeloid leukemia
Acute lymphoblastic leukemia
Chronic myelogenous leukemia
Chronic lymphocytic leukemia
Hodgkin lymphoma
Non-Hodgkin lymphoma
Myeloma
…
AND
16
Personalized medicine relies on effective
PietroBellini
https://flic.kr/p/k5jmja
KNOWELDGE MANAGEMENT
Information extraction from biomedical text
17
1. Identify biomedical concepts in text
… We report a case of familial systemic
mastocytosis with the rare KIT K509I germ
line mutation. In vitro treatment with imatinib,
dasatinib and PKC412 reduced cell viability
of primary mast cells harboring KIT K509I
mutation. Both patients with familial systemic
mastocytosis had remarkable hematological
and skin improvement after three months of
imatinib treatment.
Leuk Res. 2014 Oct;38(10):1245-51. doi: 10.1016/j.leukres.
GENES
DISEASES
DRUGS
VARIANTS
Information extraction from biomedical text
18
imatinib
dasatinib
PKC412
Familial systemic
mastocytosis
KIT
K509I
1. Identify biomedical concepts in text
2. Identify relationships between concepts
Mutation
of
Mutation
causes
causes
treats
inhibits
19
Goal: Assemble a network of biomedical
knowledge that is comprehensive,
current, computable and traceable.
20
http://www.navy.mil/management/photodb/photos/101104-N-6383T-508.jpg
21
Crowdsourcing
is to data
is to text
biomedical
Provide a database of the world’s
knowledge that anyone can edit
- Denny Vrandečić
23
Subclass of
Regulates
Physically
interacts with
Protein
Neural
development
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
VLDL receptor Q1979313
Amyloid
beta A4 Q423510
Q13561329
http://www.wikidata.org/wiki/Q13561329
Decreased
expression in
Property:P1910
Schizophrenia Q41112
Bipolar disorder Q131755
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
Q1979313
Q423510
Q13561329
https://www.wikidata.org/w/api.php?action=wbgetentitie
s&ids=Q13561329&format=json
Property:P1910
Q41112
Q131755
We are seeding it with
biomedical data
• All human, mouse genes and proteins
• All Gene Ontology terms
• All FDA approved drugs
• 9,000+ human diseases
• 120 reference microbial genomes
Burgstaller et al (2016) Database (preprint in BioRxiv)
Mitraka et al (2015) Semantic Web Applications for the Life Sciences (best paper) (preprint in BioRxiv)
Putman et al (2016) Database (preprint in BioRxiv)
Inter-item links form a giant knowledge graph
Everything is
connected
Reelin, Heart disease,
Barack Obama,
everything..
https://query.wikidata.org
SPARQL endpoint for
Wikidata
28
Crowdsourcing
Question: Can a group of non-scientists collectively
perform concept recognition in biomedical texts?
29
30
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of
“disease concepts”
F = 0.87F = 0.78
$$$
31
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of
“disease concepts”
F = 0.87F = 0.87
$$$
• 9 days
• 145 workers
• Total: $630.96
32
http://mark2cure.org
33
Paid crowdsourcing
• F = 0.84
• 28 days
• 212 workers
• Total cost: $0
$$$
• F = 0.87
• 9 days
• 145 workers
• Total: $630.96
“Help science, please”
Citizen Science
Mapping the biomedical network around NGLY1
34
NGLY1
35
http://mark2cure.org
36
A preliminary view of the NGLY1-
focused biological network
1,200 contributors
3,200 documents
787,400 annotations
37
Personalized medicine relies on effective
PietroBellini
https://flic.kr/p/k5jmja
KNOWELDGE MANAGEMENT
38
If I have seen further than
others, it is by standing on the
shoulders of giants.
- Sir Isaac Newton
39
Jake Bruggeman
Karthik G
Ramya Gamini
Louis Gioia
Toby Li
Greg Stupp
Other group members
Funding and Support
BioGPS: GM83924
Gene Wiki: GM089820
BD2K COE: GM114833
Contact
http://sulab.org
asu@scripps.edu
@andrewsu
Mark2Cure
Jennifer Fouquier
Max Nanis
Ginger Tsueng
AMT volunteers and
Mark2Curators!
Slides: slideshare.net/andrewsu
Icon credits (Noun Project, Wikimedia Commons): Zach VanDeHey, hunotika, Viktorvoigt, Alberto Rojas, Lloyd Humphreys
Matt and Cristina Might
NGLY1 community
Gene Wiki
Ben Good
Sebastian Burgstaller
Tim Putman
Núria Queralt Rosinach
Julia Turner
Andra Waagmeester
BioThings API
Chunlei Wu
Julee Adesara
Cyrus Afrasiabi
Sebastien Lelong
Mike Mayers
Kevin Xin
Why do I Mark2Cure?
40
I am retired, have a doctorate in
medical humanities, and have two
children with Gaucher disease. I
am just looking for some way to
put my education to use.
My 4 year old daughter
Phoebe is living with and
battling rare disease.
I have Ehlers Danlos Syndrome. I hope to help people
learn about this painful and debilitating disorder, so that
others like me can receive more effective medical care.
Take part in
something that
helps humanity.
I Mark2Cure in memory of
my son Mike who had type 1
diabetes.
Studied biology in
college and I really
miss it!
In memory of my daughter
who had Cystic Fibrosis
Give back

Citizen Science and Rare Disease Research