Gene Wiki and Wikimedia Foundation SPARQL workshop

CURATING BIOMEDICAL KNOWLEDGE ON WIKIDATA AND WIKIPEDIA
GENE WIKI
Benjamin Good
The Scripps Research Institute,
La Jolla, California
bgood@scripps.edu
Twitter: @bgood

Gene Wikidata Team
Andrew Su (Scripps)
Andra Waagmeester (Micelio)
Sebastian Burgstaller (Scripps)
Tim Putman (Scripps) – speaking next
Julia Turner (Scripps)
Elvira Mitraka (U Maryland)
Justin Leong (UBC)
Lynn Schriml (U Maryland)
Paul Pavlidis (UBC)
Ginger Tsueng (Scripps)
ACKNOWLEDGEMENTS

“knowledge”
• A lot
• Important
• Text

More than 2 articles published/minute

Documents
Concepts
Gene Wiki: Filtering and summarizing PubMed

GENE WIKI
6
Protein structure
Symbols and
identifiers
Tissue expression
pattern
Gene Ontology
annotations
Links to structured
databases
Gene
summary
Protein
interactions
Linked
references
Huss, PLoS Biol, 2008
Bot!

GENE WIKI TIMELINE
Project
Starts
https://en.wikipedia.org/wiki/Portal:Gene_Wiki

Gene Wiki
Version 1.
{{GNF_Protein_box | Name = Reelin| image = |
image_source = | PDB = {{PDB2|4AD9}} | HGNCid = 18512 |
MGIid = | Symbol = LACTB2 | AltSymbols =; CGI-83 |
IUPHAR = | ChEMBL = | OMIM = None | ECnumber = |
Homologene = 9349 | GeneAtlas_image1 = |
GeneAtlas_image2 = | GeneAtlas_image3 = |
Protein_domain_image = | Function =
{{GNF_GO|id=GO:0005515 |text = protein binding}}
{{GNF_GO|id=GO:0016787 |text = hydrolase activity}}
{{GNF_GO|id=GO:0046872 |text = metal ion binding}} |
Component = {{GNF_GO|id=GO:0005739 |text =
mitochondrion}} | Process = {{GNF_GO|id=GO:0008152
|text = metabolic process}} | Hs_EntrezGene = 51110 |
Hs_Ensembl = ENSG00000147592 | Hs_RefseqmRNA =
NM_016027 | Hs_RefseqProtein = NP_057111 |
Hs_GenLoc_db = hg38 | Hs_GenLoc_chr = 8 |
Hs_GenLoc_start = 70635318 | Hs_GenLoc_end = 70669174
| Hs_Uniprot = Q53H82 | Mm_EntrezGene = 212442 |
Mm_Ensembl = ENSMUSG00000025937 |
Mm_RefseqmRNA = NM_145381 | Mm_RefseqProtein =
NP_663356 | Mm_GenLoc_db = mm10 | Mm_GenLoc_chr =
1 | Mm_GenLoc_start = 13623330 | Mm_GenLoc_end =
13660546 | Mm_Uniprot = Q99KR3 | path = PBB/51110}}
=
Gene Wiki
Version 2.
{{Infobox gene}}
• All data in Wikidata
• 1 Lua script works for
all 11,000+ genes
=
(1 of these for every gene)
IMPACT OF WIKIDATA ON WIKIPEDIA

IMPACT BEYOND WIKIPEDIA
= SPARQL

Sample of current biomedical content
• All human, mouse genes and proteins
• All Gene Ontology terms (describe function)
• All Human Disease Ontology terms
• All FDA approved drugs
• 109+ reference microbial genomes
Burgstaller-Muelbacher et al (2016) Database
Mitraka et al (2015) Semantic Web Applications for the Life Sciences
Putman et al (2016) Database

http://tinyurl.com/biowiki-sparql
Sample queries that are currently possible:
• “where in the cell is the Reelin protein expressed?”
• “What diseases are treated by Metformin”
• “What diseases might be treated by Metformin”
http://query.wikidata.org

Example question: repurposing Metformin
http://tinyurl.com/zem3oxz
Metformin
?disease
interacts
with
protein
geneencoded by genetic
association
Might
treat ?
Solute carrier
family 22
member 3
SLC22A3
prostate
cancer

A SPARQL powered user interface
for consuming and editing organism
data in Wikidata
Timothy E. Putman Ph.D.
The Scripps Research Institute,
La Jolla, California
tputman@scripps.edu
Twitter: @putmantime

Gene Wikidata Team
Andrew Su (Scripps)
Benjamin Good – just spoke
Andra Waagmeester (Micelio)
Sebastian Burgstaller (Scripps)
Elvira Mitraka (U Maryland)
Julia Turner (Scripps)
Justin Leong (UBC)
Lynn Schriml (U Maryland)
Paul Pavlidis (UBC)
Ginger Tsueng (Scripps)
ACKNOWLEDGEMENTS

Centralizing and Linking the Data
Bacteria
Q10876
domain
TRPA
Q21153984
protein
C.trachomatis
Q131065
species
trpA
Q21153861
gene
C.
trachomatis
434/BU
Q20800254
strain

C.
trachomatis
Q131065
species
trpA
Q21153861
gene
TRPA
Q21153984
protein
C. trachomatis
434/BU
Q20800254
strain

trpA
Q21153861
gene
TRPA
Q21153984
protein
C. trachomatis
434/BU
Q20800254
strain
C.
trachomatis
Q131065
species

C.
trachomatis
Q131065
species
TRPA
Q21153984
protein
C. trachomatis
434/BU
Q20800254
strain
trpA
Q21153861
gene

C.
trachomatis
Q131065
species
trpA
Q21153861
gene
C. trachomatis
434/BU
Q20800254
strain
TRPA
Q21153984
protein

SPARQL Query
• On page load
• JQuery execution of SPARQL query as AJAX GET Request

• On organism select
• Get all gene and protein data for
organism by taxid

Gene Wiki and Wikimedia Foundation SPARQL workshop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Gene Wiki and Wikimedia Foundation SPARQL workshop

Similar to Gene Wiki and Wikimedia Foundation SPARQL workshop (20)

More from Benjamin Good

More from Benjamin Good (18)

Recently uploaded

Recently uploaded (20)

Gene Wiki and Wikimedia Foundation SPARQL workshop

Editor's Notes