SlideShare a Scribd company logo
Mouse­Human 
Research Classifier
Presented By: Osama Jomaa
Research Adviser: Dr. Iddo Friedberg
Mouse Models in Research
Mouse Models in Research
Shares 99% of its
genome with humans
Mouse Models in Research
Shares 99% of its
genome with humans
Fewer ethical
concerns than other
mammal models
Mouse Models in Research
InexpensiveShares 99% of its
genome with humans
Fewer ethical
concerns than other
mammal models
Short generation
times
Small
The Mouse Trap. The Danger of Using one Lab Animal to Study Every Disease. Daniel Engber
http:http://www.slate.com/articles/health_and_science/the_mouse_trap/2011/11/lab_mice_are_they_limiting_our_understanding_of_huma
n_disease_.html. November 16, 2011
Designer Mice for Human Research
Photo taken from “Designer mice for human disease - A close view of Nobel Laureate : Oliver Smithies” Yau-Sheng Tsai, Pei-Jane Tsai,
Man-Jin Jiang, Cherng-Shyang Chang. http://proj.ncku.edu.tw/research/commentary/e/20071116/2.html December 9, 2014
Mouse Model is Not Perfect Though
Photo taken from: The Mouse Trap. The Danger of Using one Lab Animal to Study Every Disease. Daniel Engber
http:http://www.slate.com/articles/health_and_science/the_mouse_trap/2011/11/lab_mice_are_they_limiting_our_understanding_of_huma
n_disease_.html. November 16, 2011
Mouse Correlation with Human to Equivalent Diseases
Photo taken from “Genomic responses in mouse models poorly mimic human inflammatory diseases.” Seok, Warren, and Others.
Proceedings of the National Academy of Sciences. 110, no. 9 (2013): 3507-3512.
Rank correlation (R2
)
Percentage of genes changed
in the same direction
Proposed Research
Classify the Mouse-Human scientific literature
in PubMed into different areas of research
Citation Networks + MeSH Thesaurus
Identify and study the popular areas of
Mouse-Human research
What?
How?
Why?
Proposed Research
Classify the proteins in the Mouse-Human
citation pairs into different biological systems
Protein Co-occurrence Networks
+ Gene Ontology
Investigate the biological systems and
proteins for which Mouse is used
as a model organism for Human
What?
How?
Why?
Agenda
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Protein and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
Getting Mouse and Human PubMed IDs
Uniprot
GOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human
papers from Uniprot
Getting Mouse and Human PubMed IDs
Uniprot
GOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human
papers from Uniprot
2. Query PubMed API for the
citation list for each article
Getting Mouse and Human PubMed IDs
Uniprot
GOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human
papers from Uniprot
2. Query PubMed API for the
citation list for each article
.
.
<CitationList>
<PMID> 342342 </PMID>
<PMID> 423545 </PMID>
<PMID> 432598 </PMID>
</CitationList>
.
.
3. Parse PubMed XML response
and get the citation list
Getting Mouse and Human PubMed IDs
Uniprot
GOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human
papers from Uniprot
2. Query PubMed API for the
citation list for each article
.
.
<CitationList>
<PMID> 342342 </PMID>
<PMID> 423545 </PMID>
<PMID> 432598 </PMID>
</CitationList>
.
.
3. Parse PubMed XML response
and get the citation list
Very few PubMed articles have
the citation list in their XML file!
Getting Mouse and Human Citation
List from Scopus
Uniprot
GOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human
papers from Uniprot
2. Author HTTP GET request
with PMIDS
3. Parse Scopus JSON response
and get the citation list
.
.
{CitationList: {PMID: 342342},
{PMID: 423545}, {PMID: 432598}}
.
.
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
Building the Citation Network
H
M
M
H
H
H
H
M
H
H
H
M
H
H
H
H
H
H
M
H
M
M
H
H
H
H
Building the Citation Network
H
M
M
H
H
H
H
M
H
H
H
M
H
H
H
H
H
H
M
H
M
M
H
H
H
H
M → H
H → H
H → M
M → M
Building the Citation Network
H
M
M
H
H
H
H
M
H
H
H
M
H
H
H
H
H
H
M
H
M
M
H
H
H
H
M → H
H → H
H → M
M → M
62%
3%
34%
Mouse Inter and Intra Citations
Mouse-Human Citations Mouse-Mouse Citations
Moue-Others Citations
34%
62%
4%
Human Inter and Intra Citations
Human-Others Citations Human-Human Citations
Human-Mouse Citations
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
Medical Subject Headings
 Controlled vocabulary to index PubMed articles
 Stored in a DAG-like structure
 16 top level concepts at the root
 Includes ~27K concepts (MeSH descriptors) all together
Medical Subject Headings
 Controlled vocabulary to index PubMed articles
 Stored in a DAG-like structure
 16 top level concepts at the root
 Includes ~27K concepts (MeSH descriptors) all together
We used MeSH to group the Mouse and
Human papers in the citation network
into classes of research
MeSH Structure Example
Digestive System Diseases
Gastrointestinal Diseases
Digestive System Neoplasms
Neoplasms by Site
Neoplasms
Stomach Diseases
Gastrointestinal Neoplasms
Stomach Neoplasms
Classifying the Citation Network
H
M
M
H
H
H
M
H
H
H
M
H
H
H
H
H
H
M
H
M
M
H H
H
To Do: Place in research areas
H
M
M
H
H
H
M
H
H
H
M
H
H
H
H
H
H
M
H
M
M
H H
H Digestive
System
Diseases
Eye Diseases
Virus
Diseases
Immune
System
Diseases
Cardiovascular DiseasesSkin
Diseases
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
Number of Mouse and Human Papers in the MeSH
Disease Categories
Number of Mouse-Human Citation Pairs in the MeSH
Disease Categories
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
GenBank
Protein: NP_e342 | PMID: 432432
kicgdkssgihygvitcegckgffrrsqqc
Protein: NP_452u1 | PMID: 483232
Adtltytlglsdgqlplgaspdlpeasacp
…..
1. Get the protein sequences Human
and papers
GenBank
Protein: NP_e342 | PMID: 432432
kicgdkssgihygvitcegckgffrrsqqc
Protein: NP_452u1 | PMID: 483232
Adtltytlglsdgqlplgaspdlpeasacp
…..
1. Get the protein sequences Human
and papers
...
PMID: 3213414
NP_u4323: sgihygvitcegckgffrrsqqc
NP_i4322: lplgaspdlpeasacfewrwts
NP_w3421: kicgdkssgihygvitceg
PMID: 2346414
NP_ti3423: vitcegckgckgffrrsqqc
NP_q4322f: ygvitcegeasacfewrwts
NP_x342u2: kicgdkssgihygvitceg
2. Group the proteins by their PMID
GenBank
Protein: NP_e342 | PMID: 432432
kicgdkssgihygvitcegckgffrrsqqc
Protein: NP_452u1 | PMID: 483232
Adtltytlglsdgqlplgaspdlpeasacp
…..
1. Get the protein sequences Human
and papers
...
PMID: 3213414
NP_u4323: sgihygvitcegckgffrrsqqc
NP_i4322: lplgaspdlpeasacfewrwts
NP_w3421: kicgdkssgihygvitceg
PMID: 2346414
NP_ti3423: vitcegckgckgffrrsqqc
NP_q4322f: ygvitcegeasacfewrwts
NP_x342u2: kicgdkssgihygvitceg
NP_u4323: sgihygvitcegckgffrrsqqc
NP_i4322: lplgaspdlpeasacfewrwts
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
NP_q4322f: ygvitcegeasacfewrwts
NP_x342u2: kicgdkssgihygvitceg
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
2. Group the proteins by their PMID
3. Intersect the Genbank papers with Scopus citations
NP_u4323: sgihygvitcegckgffrrsqqc
NP_i4322: lplgaspdlpeasacfewrwts
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
NP_q4322f: ygvitcegeasacfewrwts
NP_x342u2: kicgdkssgihygvitceg
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
NP_u4323: sgihygvitcegckgffrrsqqc
NP_i4322: lplgaspdlpeasacfewrwts
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
NP_q4322f: ygvitcegeasacfewrwts
NP_x342u2: kicgdkssgihygvitceg
Removing Redundancies
Use CD-HIT with similarity threshold = 0.9
Gene Ontology
Photo taken from: Gene Ontology Consortium. Ontology Structure. http://geneontology.org/page/ontology-structure Last access
December 13, 2014
Gene Ontology Annotation
Biological Process
Cellular Component
Molecular Function
cytochrome c
mitochondrial matrix
oxidoreductase activity
oxidative phosphorylation
NP_u4323: sgihygvitcegckgffrrsqqc
NP_i4322: lplgaspdlpeasacfewrwts
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
NP_q4322f: ygvitcegeasacfewrwts
NP_x342u2: kicgdkssgihygvitceg
FASTA File
BLAST
DB
1. Create BLAST query in FASTA format
2. Create BLAST Database from Swissprot
Human Flat File
Getting GO Terms
NP_u4323: sgihygvitcegckgffrrsqqc
NP_i4322: lplgaspdlpeasacfewrwts
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
NP_q4322f: ygvitcegeasacfewrwts
NP_x342u2: kicgdkssgihygvitceg
FASTA File
BLAST
DB
NP_u4323: GO1, GO5, GO4
NP_i4322: GO5, GO9
NP_w3421: GO4, GO6
...
1. Create BLAST query in FASTA format
2. Create BLAST Database from Swissprot
Human Flat File
3. Do BLAST with e-value = 10-8
4. Parse the BLAST XML response
and get the GO terms for the top hits
Getting GO Terms
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Cited Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
MP
MP
MP
HP
HP
1
12
6
MP
MP
MP
HP
HP
10
5
20
14
1
19
12
18
24
7
8
4
6
MP
MP
MP
HP
HP
Citation Edge
P-P Edge
P-C-P Edge
Building the PCoC Network
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
To Do: Classifying the PCoC Network
To Do: Place in Protein Biological Systems
lactase activity
serotonin
Receptor
activity
signal sequence
binding
signal transducer
activitynucleotide
binding
ATP
binding
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
Summary
 Cit-Net connects citing Mouse papers with cited Human
papers in the PubMed database
 MeSH is used to classify the citation network nodes into
different classes of research
 PCoC network connects the proteins in the citing Mouse
papers with proteins in the cited Human papers
 GO is used to group the P-P and P-C-P network nodes
into different classes of MFs, BPs and Ccs
Timetable
Jan Feb Mar Apr May
Database Creation and
Data migration
Citation Network
Classification
PCoC Networks Building
PCoC Networks
Classification
PCoC Networks Analysis
Thank You!
Q & A

More Related Content

Similar to Mouse-Human Research Classifier

Basics in bioinformatics
Basics in bioinformaticsBasics in bioinformatics
Basics in bioinformatics
Mamun Billah
 
Model organisms
Model organismsModel organisms
Model organisms
Gurvinder Singh
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About It
Anita de Waard
 
Bioinformatics Lecture 1
Bioinformatics  Lecture 1Bioinformatics  Lecture 1
Bioinformatics Lecture 1
Hamid Ur-Rahman
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1
Hamid Ur-Rahman
 
Encode Project
Encode ProjectEncode Project
Encode Project
Anarghya Hegde
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
Jenny Molloy
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Vidya Kalaivani Rajkumar
 
Bioinformatics group presentation
Bioinformatics group presentationBioinformatics group presentation
Bioinformatics group presentation
Naeem Ahmed
 
Bioinformatics group presentation
Bioinformatics group presentationBioinformatics group presentation
Bioinformatics group presentation
Naeem Ahmed
 
EVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionEVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - Introduction
Jonathan Eisen
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
Prianca12
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
Charu Sharma
 
Quality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic ModelingQuality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic Modeling
Stuti Nayak
 
Itqb talkslideshfd deritemplate
Itqb talkslideshfd deritemplateItqb talkslideshfd deritemplate
Itqb talkslideshfd deritemplate
Helena Deus
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
jmoore89
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Neuroscience Information Framework
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
Robert Cormia
 
Data retrieval
Data retrievalData retrieval
Phenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe ratPhenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe rat
Jennifer Smith
 

Similar to Mouse-Human Research Classifier (20)

Basics in bioinformatics
Basics in bioinformaticsBasics in bioinformatics
Basics in bioinformatics
 
Model organisms
Model organismsModel organisms
Model organisms
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About It
 
Bioinformatics Lecture 1
Bioinformatics  Lecture 1Bioinformatics  Lecture 1
Bioinformatics Lecture 1
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1
 
Encode Project
Encode ProjectEncode Project
Encode Project
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics group presentation
Bioinformatics group presentationBioinformatics group presentation
Bioinformatics group presentation
 
Bioinformatics group presentation
Bioinformatics group presentationBioinformatics group presentation
Bioinformatics group presentation
 
EVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - IntroductionEVE161: Microbial Phylogenomics - Class 1 - Introduction
EVE161: Microbial Phylogenomics - Class 1 - Introduction
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
 
Quality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic ModelingQuality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic Modeling
 
Itqb talkslideshfd deritemplate
Itqb talkslideshfd deritemplateItqb talkslideshfd deritemplate
Itqb talkslideshfd deritemplate
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Phenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe ratPhenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe rat
 

Recently uploaded

一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 

Recently uploaded (20)

一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 

Mouse-Human Research Classifier

  • 2. Mouse Models in Research
  • 3. Mouse Models in Research Shares 99% of its genome with humans
  • 4. Mouse Models in Research Shares 99% of its genome with humans Fewer ethical concerns than other mammal models
  • 5. Mouse Models in Research InexpensiveShares 99% of its genome with humans Fewer ethical concerns than other mammal models Short generation times Small
  • 6. The Mouse Trap. The Danger of Using one Lab Animal to Study Every Disease. Daniel Engber http:http://www.slate.com/articles/health_and_science/the_mouse_trap/2011/11/lab_mice_are_they_limiting_our_understanding_of_huma n_disease_.html. November 16, 2011
  • 7. Designer Mice for Human Research Photo taken from “Designer mice for human disease - A close view of Nobel Laureate : Oliver Smithies” Yau-Sheng Tsai, Pei-Jane Tsai, Man-Jin Jiang, Cherng-Shyang Chang. http://proj.ncku.edu.tw/research/commentary/e/20071116/2.html December 9, 2014
  • 8. Mouse Model is Not Perfect Though Photo taken from: The Mouse Trap. The Danger of Using one Lab Animal to Study Every Disease. Daniel Engber http:http://www.slate.com/articles/health_and_science/the_mouse_trap/2011/11/lab_mice_are_they_limiting_our_understanding_of_huma n_disease_.html. November 16, 2011
  • 9. Mouse Correlation with Human to Equivalent Diseases Photo taken from “Genomic responses in mouse models poorly mimic human inflammatory diseases.” Seok, Warren, and Others. Proceedings of the National Academy of Sciences. 110, no. 9 (2013): 3507-3512. Rank correlation (R2 ) Percentage of genes changed in the same direction
  • 10. Proposed Research Classify the Mouse-Human scientific literature in PubMed into different areas of research Citation Networks + MeSH Thesaurus Identify and study the popular areas of Mouse-Human research What? How? Why?
  • 11. Proposed Research Classify the proteins in the Mouse-Human citation pairs into different biological systems Protein Co-occurrence Networks + Gene Ontology Investigate the biological systems and proteins for which Mouse is used as a model organism for Human What? How? Why?
  • 12. Agenda 1. PubMed Articles Classification 1. Collect Mouse and Human Papers 2. Build a Citation Network 3. Classify the Cit-Net Using MeSH Thesaurus 4. Stats Study on MeSH Disease Classification 2. PubMed Proteins Analysis 1. Collect Human Protein and Annotation Data 2. Build the Entity Co-occurrence Networks 3. Classify PCoC Networks Using Gene Ontology 3. Summary
  • 13. 1. PubMed Articles Classification 1. Collect Mouse and Human Papers 2. Build a Citation Network 3. Classify the Cit-Net Using MeSH Thesaurus 4. Stats Study on MeSH Disease Classification 2. PubMed Proteins Analysis 1. Collect Human Proteins and Annotation Data 2. Build the Entity Co-occurrence Networks 3. Classify PCoC Networks Using Gene Ontology 3. Summary
  • 14. Getting Mouse and Human PubMed IDs Uniprot GOA Mouse PubMed Identifiers (PMIDs) Human PubMed Identifiers (PMIDs) 1. Get Mouse & Human papers from Uniprot
  • 15. Getting Mouse and Human PubMed IDs Uniprot GOA Mouse PubMed Identifiers (PMIDs) Human PubMed Identifiers (PMIDs) 1. Get Mouse & Human papers from Uniprot 2. Query PubMed API for the citation list for each article
  • 16. Getting Mouse and Human PubMed IDs Uniprot GOA Mouse PubMed Identifiers (PMIDs) Human PubMed Identifiers (PMIDs) 1. Get Mouse & Human papers from Uniprot 2. Query PubMed API for the citation list for each article . . <CitationList> <PMID> 342342 </PMID> <PMID> 423545 </PMID> <PMID> 432598 </PMID> </CitationList> . . 3. Parse PubMed XML response and get the citation list
  • 17. Getting Mouse and Human PubMed IDs Uniprot GOA Mouse PubMed Identifiers (PMIDs) Human PubMed Identifiers (PMIDs) 1. Get Mouse & Human papers from Uniprot 2. Query PubMed API for the citation list for each article . . <CitationList> <PMID> 342342 </PMID> <PMID> 423545 </PMID> <PMID> 432598 </PMID> </CitationList> . . 3. Parse PubMed XML response and get the citation list Very few PubMed articles have the citation list in their XML file!
  • 18. Getting Mouse and Human Citation List from Scopus Uniprot GOA Mouse PubMed Identifiers (PMIDs) Human PubMed Identifiers (PMIDs) 1. Get Mouse & Human papers from Uniprot 2. Author HTTP GET request with PMIDS 3. Parse Scopus JSON response and get the citation list . . {CitationList: {PMID: 342342}, {PMID: 423545}, {PMID: 432598}} . .
  • 19. 1. PubMed Articles Classification 1. Collect Mouse and Human Papers 2. Build a Citation Network 3. Classify the Cit-Net Using MeSH Thesaurus 4. Stats Study on MeSH Disease Classification 2. PubMed Proteins Analysis 1. Collect Human Proteins and Annotation Data 2. Build the Entity Co-occurrence Networks 3. Classify PCoC Networks Using Gene Ontology 3. Summary
  • 20. Building the Citation Network H M M H H H H M H H H M H H H H H H M H M M H H H H
  • 21. Building the Citation Network H M M H H H H M H H H M H H H H H H M H M M H H H H M → H H → H H → M M → M
  • 22. Building the Citation Network H M M H H H H M H H H M H H H H H H M H M M H H H H M → H H → H H → M M → M 62% 3% 34% Mouse Inter and Intra Citations Mouse-Human Citations Mouse-Mouse Citations Moue-Others Citations 34% 62% 4% Human Inter and Intra Citations Human-Others Citations Human-Human Citations Human-Mouse Citations
  • 23. 1. PubMed Articles Classification 1. Collect Mouse and Human Papers 2. Build a Citation Network 3. Classify the Cit-Net Using MeSH Thesaurus 4. Stats Study on MeSH Disease Classification 2. PubMed Proteins Analysis 1. Collect Human Proteins and Annotation Data 2. Build the Entity Co-occurrence Networks 3. Classify PCoC Networks Using Gene Ontology 3. Summary
  • 24. Medical Subject Headings  Controlled vocabulary to index PubMed articles  Stored in a DAG-like structure  16 top level concepts at the root  Includes ~27K concepts (MeSH descriptors) all together
  • 25. Medical Subject Headings  Controlled vocabulary to index PubMed articles  Stored in a DAG-like structure  16 top level concepts at the root  Includes ~27K concepts (MeSH descriptors) all together We used MeSH to group the Mouse and Human papers in the citation network into classes of research
  • 26. MeSH Structure Example Digestive System Diseases Gastrointestinal Diseases Digestive System Neoplasms Neoplasms by Site Neoplasms Stomach Diseases Gastrointestinal Neoplasms Stomach Neoplasms
  • 27. Classifying the Citation Network H M M H H H M H H H M H H H H H H M H M M H H H
  • 28. To Do: Place in research areas H M M H H H M H H H M H H H H H H M H M M H H H Digestive System Diseases Eye Diseases Virus Diseases Immune System Diseases Cardiovascular DiseasesSkin Diseases
  • 29. 1. PubMed Articles Classification 1. Collect Mouse and Human Papers 2. Build a Citation Network 3. Classify the Cit-Net Using MeSH Thesaurus 4. Stats Study on MeSH Disease Classification 2. PubMed Proteins Analysis 1. Collect Human Proteins and Annotation Data 2. Build the Entity Co-occurrence Networks 3. Classify PCoC Networks Using Gene Ontology 3. Summary
  • 30. Number of Mouse and Human Papers in the MeSH Disease Categories
  • 31. Number of Mouse-Human Citation Pairs in the MeSH Disease Categories
  • 32. 1. PubMed Articles Classification 1. Collect Mouse and Human Papers 2. Build a Citation Network 3. Classify the Cit-Net Using MeSH Thesaurus 4. Stats Study on MeSH Disease Classification 2. PubMed Proteins Analysis 1. Collect Human Proteins and Annotation Data 2. Build the Entity Co-occurrence Networks 3. Classify PCoC Networks Using Gene Ontology 3. Summary
  • 33. GenBank Protein: NP_e342 | PMID: 432432 kicgdkssgihygvitcegckgffrrsqqc Protein: NP_452u1 | PMID: 483232 Adtltytlglsdgqlplgaspdlpeasacp ….. 1. Get the protein sequences Human and papers
  • 34. GenBank Protein: NP_e342 | PMID: 432432 kicgdkssgihygvitcegckgffrrsqqc Protein: NP_452u1 | PMID: 483232 Adtltytlglsdgqlplgaspdlpeasacp ….. 1. Get the protein sequences Human and papers ... PMID: 3213414 NP_u4323: sgihygvitcegckgffrrsqqc NP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitceg PMID: 2346414 NP_ti3423: vitcegckgckgffrrsqqc NP_q4322f: ygvitcegeasacfewrwts NP_x342u2: kicgdkssgihygvitceg 2. Group the proteins by their PMID
  • 35. GenBank Protein: NP_e342 | PMID: 432432 kicgdkssgihygvitcegckgffrrsqqc Protein: NP_452u1 | PMID: 483232 Adtltytlglsdgqlplgaspdlpeasacp ….. 1. Get the protein sequences Human and papers ... PMID: 3213414 NP_u4323: sgihygvitcegckgffrrsqqc NP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitceg PMID: 2346414 NP_ti3423: vitcegckgckgffrrsqqc NP_q4322f: ygvitcegeasacfewrwts NP_x342u2: kicgdkssgihygvitceg NP_u4323: sgihygvitcegckgffrrsqqc NP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitceg NP_ti3423: vitcegckgckgffrrsqqc NP_q4322f: ygvitcegeasacfewrwts NP_x342u2: kicgdkssgihygvitceg NP_w3421: kicgdkssgihygvitceg NP_ti3423: vitcegckgckgffrrsqqc 2. Group the proteins by their PMID 3. Intersect the Genbank papers with Scopus citations
  • 36. NP_u4323: sgihygvitcegckgffrrsqqc NP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitceg NP_ti3423: vitcegckgckgffrrsqqc NP_q4322f: ygvitcegeasacfewrwts NP_x342u2: kicgdkssgihygvitceg NP_w3421: kicgdkssgihygvitceg NP_ti3423: vitcegckgckgffrrsqqc NP_u4323: sgihygvitcegckgffrrsqqc NP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitceg NP_ti3423: vitcegckgckgffrrsqqc NP_q4322f: ygvitcegeasacfewrwts NP_x342u2: kicgdkssgihygvitceg Removing Redundancies Use CD-HIT with similarity threshold = 0.9
  • 37. Gene Ontology Photo taken from: Gene Ontology Consortium. Ontology Structure. http://geneontology.org/page/ontology-structure Last access December 13, 2014
  • 38. Gene Ontology Annotation Biological Process Cellular Component Molecular Function cytochrome c mitochondrial matrix oxidoreductase activity oxidative phosphorylation
  • 39. NP_u4323: sgihygvitcegckgffrrsqqc NP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitceg NP_ti3423: vitcegckgckgffrrsqqc NP_q4322f: ygvitcegeasacfewrwts NP_x342u2: kicgdkssgihygvitceg FASTA File BLAST DB 1. Create BLAST query in FASTA format 2. Create BLAST Database from Swissprot Human Flat File Getting GO Terms
  • 40. NP_u4323: sgihygvitcegckgffrrsqqc NP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitceg NP_ti3423: vitcegckgckgffrrsqqc NP_q4322f: ygvitcegeasacfewrwts NP_x342u2: kicgdkssgihygvitceg FASTA File BLAST DB NP_u4323: GO1, GO5, GO4 NP_i4322: GO5, GO9 NP_w3421: GO4, GO6 ... 1. Create BLAST query in FASTA format 2. Create BLAST Database from Swissprot Human Flat File 3. Do BLAST with e-value = 10-8 4. Parse the BLAST XML response and get the GO terms for the top hits Getting GO Terms
  • 41. 1. PubMed Articles Classification 1. Collect Mouse and Human Papers 2. Build a Citation Network 3. Classify the Cit-Net Using MeSH Thesaurus 4. Stats Study on MeSH Disease Classification 2. PubMed Proteins Analysis 1. Collect Cited Human Proteins and Annotation Data 2. Build the Entity Co-occurrence Networks 3. Classify PCoC Networks Using Gene Ontology 3. Summary
  • 45. Building the PCoC Network
  • 46. 1. PubMed Articles Classification 1. Collect Mouse and Human Papers 2. Build a Citation Network 3. Classify the Cit-Net Using MeSH Thesaurus 4. Stats Study on MeSH Disease Classification 2. PubMed Proteins Analysis 1. Collect Human Proteins and Annotation Data 2. Build the Entity Co-occurrence Networks 3. Classify PCoC Networks Using Gene Ontology 3. Summary
  • 47. To Do: Classifying the PCoC Network
  • 48. To Do: Place in Protein Biological Systems lactase activity serotonin Receptor activity signal sequence binding signal transducer activitynucleotide binding ATP binding
  • 49. 1. PubMed Articles Classification 1. Collect Mouse and Human Papers 2. Build a Citation Network 3. Classify the Cit-Net Using MeSH Thesaurus 4. Stats Study on MeSH Disease Classification 2. PubMed Proteins Analysis 1. Collect Human Proteins and Annotation Data 2. Build the Entity Co-occurrence Networks 3. Classify PCoC Networks Using Gene Ontology 3. Summary
  • 50. Summary  Cit-Net connects citing Mouse papers with cited Human papers in the PubMed database  MeSH is used to classify the citation network nodes into different classes of research  PCoC network connects the proteins in the citing Mouse papers with proteins in the cited Human papers  GO is used to group the P-P and P-C-P network nodes into different classes of MFs, BPs and Ccs
  • 51. Timetable Jan Feb Mar Apr May Database Creation and Data migration Citation Network Classification PCoC Networks Building PCoC Networks Classification PCoC Networks Analysis