Gene Wiki and Wikimedia Foundation SPARQL workshop
1. CURATING BIOMEDICAL KNOWLEDGE ON WIKIDATA AND WIKIPEDIA
GENE WIKI
Benjamin Good
The Scripps Research Institute,
La Jolla, California
bgood@scripps.edu
Twitter: @bgood
2. Gene Wikidata Team
Andrew Su (Scripps)
Andra Waagmeester (Micelio)
Sebastian Burgstaller (Scripps)
Tim Putman (Scripps) – speaking next
Julia Turner (Scripps)
Elvira Mitraka (U Maryland)
Justin Leong (UBC)
Lynn Schriml (U Maryland)
Paul Pavlidis (UBC)
Ginger Tsueng (Scripps)
ACKNOWLEDGEMENTS
10. Sample of current biomedical content
• All human, mouse genes and proteins
• All Gene Ontology terms (describe function)
• All Human Disease Ontology terms
• All FDA approved drugs
• 109+ reference microbial genomes
Burgstaller-Muelbacher et al (2016) Database
Mitraka et al (2015) Semantic Web Applications for the Life Sciences
Putman et al (2016) Database
11. http://tinyurl.com/biowiki-sparql
Sample queries that are currently possible:
• “where in the cell is the Reelin protein expressed?”
• “What diseases are treated by Metformin”
• “What diseases might be treated by Metformin”
http://query.wikidata.org
12. Example question: repurposing Metformin
http://tinyurl.com/zem3oxz
Metformin
?disease
interacts
with
protein
geneencoded by genetic
association
Might
treat ?
Solute carrier
family 22
member 3
SLC22A3
prostate
cancer
13.
14. A SPARQL powered user interface
for consuming and editing organism
data in Wikidata
Timothy E. Putman Ph.D.
The Scripps Research Institute,
La Jolla, California
tputman@scripps.edu
Twitter: @putmantime
15. Gene Wikidata Team
Andrew Su (Scripps)
Benjamin Good – just spoke
Andra Waagmeester (Micelio)
Sebastian Burgstaller (Scripps)
Elvira Mitraka (U Maryland)
Julia Turner (Scripps)
Justin Leong (UBC)
Lynn Schriml (U Maryland)
Paul Pavlidis (UBC)
Ginger Tsueng (Scripps)
ACKNOWLEDGEMENTS
16. Centralizing and Linking the Data
Bacteria
Q10876
domain
TRPA
Q21153984
protein
C.trachomatis
Q131065
species
trpA
Q21153861
gene
C.
trachomatis
434/BU
Q20800254
strain
Knowledge is either not shared (stuck in your head or your notebook) or it is shared as text and images in journal articles.
There are more than 1 million articles added to PubMed each year
Relying on the entire community of scientists to digest the biomedical literature:
identification
filtering
extraction
summarization
Now we can use a database instead of wikitext to store data. great! and opens up other possibilities
Using this linked data model . For a bacterial genome, each genetic item is linked to the taxonomic hierarchy, and the gene and protein are distinct entities. The gene having genomic annotations, the protein functional annotations, and them both being linked by the encodes and encoded by properties.
So here is the wikidata item that represents the strain or subspecies taxa in our data model.
Now we can navigate through the graph by following the statements that lead to other wikidata items.
So for example if you click on parent taxon, you go to the species level item …
chlamydia trachomatis,if you kept going in that direction, once you have gone through genus, family, order etc.. you would eventually reach bacteria
King Phillip Came Over From Great Spain.
IF you go in the other direction, you get to the genes found in that taxon through the predicate of that name, that gene is linked to its product through encodes
and its product is linked back to its gene through encoded by, and the strain also through found in taxon. On the protein is where you would find functional annotations such as GO terms.