Cell-Centered Database for Immunology and Cancer Research
Determining the cellular mechanisms of diseases is a crucial requirement for understanding the causes and progression of diseases, predicting outcomes, and developing new treatments. Often relevant information, e.g. what cells are involved in a disease or what effects does a drug have on cells, is scattered across many papers and journals, which makes it difficult for researchers to be sure they have a complete picture. Using Elsevier’s automated text mining technology, we have created a new cell-centered database consisting of 850 000 facts captured from more than 24 million PubMed abstracts and 3.5 million full text articles for use in Pathway Studio. This database focused primarily on cellular aspects of immunology and immuno-oncology can be used to summarize and visualize published research, and to analyze experimental data. This webinar will provide an overview of the database and examples of workflows that can be facilitated using Pathway Studio.
1. 04.29.2015
Maria Shkrob, PhD
Sr. Bioinformatics Scientist
R&D Solutions, Elsevier
m.shkrob@elsevier.com
Cell-Centered
Database for
Immunology and
Cancer
Research
4. | 4
Below the tip of the iceberg
Image source: Tumor-altered dendritic cell function: implications for antitumor immunity.
Hargadon KM. Front Immunol. 2013 Jul 11;4:192.
Drug Targets
Biomarkers
Cancer
immunotherapy
5. Too much information: how
to survive
Our approach
What Pathway Studio has
to offer
Too much information
about cells in particular
Examples of problems that
the database helps to
solve
Today:
6. | 6
If only we knew what we know…
Amorphous information Structured information
Image Source: http://www.thesocialleader.com/wp-content/uploads/2011/03/paper-piles.jpg
Text mining: analyzing text to extract information that is useful for particular purposes
Text
mining
• Hard to deal with
• Hard to deal with algorithmically
• Not scalable
• Search
• Visualize
• Network analysis
• Scalable
• Compressed
20km
7. | 7
From text to fact
Tregs contribute to the progression of HNSCC
Regulatory
T cell
Head and neck
squamous cell
carcinoma
text
fact
positive
regulation
sentence
5.6 M facts
24 M abstracts
3.5 M full texts
standard
name
standard
link
standard
name
8. | 8
Find cells involved in HNSCC
Pathway Studio: manipulate facts not text
96 publications
• type of connection
• sign
• intersect, combine and expand
Visualize
Summarize
9. | 9
Pathway Studio Overview
Pathway Studio
ToolsKnowledgebase
Manually
curated
pathways
Ontologies
Biological
relations
extracted from
literature
Experiment analysis:
Gene expression
Proteomics
Metabolomics
NGS (beta)
Search
Summarization
Navigation
Visualization
24M abstracts
3.5M full texts
4.8M relations
+836K new
relations
10. | 10
Pathway Studio databases
standard
name
standard
link
standard
name
Mammal
protein-centered
ChemEffect
drug-centered
DiseaseFX
biomarker-centered
+Biomarker
facts
+Drugs
Proteins +
• diseases
• clinical parameters
• small molecules
• cell processes
• treatments
• genetic change
• state change
• quantitative change
Cells
+Cell facts
Cells
12. | 12
Have you seen this cell?
full name
nickname
aka
formerly known as
scars
and
marks
for short
13. | 13
Epitope
From inconsistent names to standard names
Basic cell“Attribute”
CD4+ CD25 regulatory T cell
T-lymphocyte leukocyte
T-cell leucocytehemopoetic
hemopoietic
haemopoetic
haemopoietic
hematopoetic
hematopoietic
haematopoetic
haematopoietic
regulatory
immunoregulatory
CD4+CD25+
CD25+FOXP3+
CD4+ CD25+ FOXP3+
14. | 14
From inconsistent names to standard names
Epitopes
“Attribute”
Basic cell
Standard
cell name
Image source: http://www.biooncology.com/images/therapeutic-targets/b-cell.png
Combination Frequency Standard name
Ep1+ Ep2- BC3 1000 BC4
Ep5+ BC6 2 BC6
Combination Frequency Standard name
Ep1+ Ep2- BC3 1000 BC4
Ep5+ BC6 100 EP5+ BC6
15. | 15
Adding cell processes to the mixture
proliferation of
death of
migration of human polarization
cytotoxicity
quantity
It allows to
• Have more cell processes in the database
• Assign cell processes to rare cells
• Quickly introduce changes if needed
Standard
cell
name
16. | 16
Recognizing cell processes in text
• Information about more specific cell types
• Doubles the number of cell processes compared to Gene Ontology + EmTree
18. | 18
• What cells are involved in the
disease of interest, and how?
• What proteins/small
molecules affect those cells,
and how?
• What proteins are exposed
on the surface of the cell?
• What proteins are secreted
from the cell?
What questions do we want to be answered?
20. | 20
• chemicals naturally occurring
in human body
• pharmaceuticals
• biologically active peptides
• antibody drugs
• environmental chemicals,
• products of metabolism
Small molecules affect
cells
Cell produces small
molecule
Cells and small molecules
21. | 21
State is changed Cells are related to
disease
Quantity is changed
Cells play active role in diseases
Cells and diseases
22. | 22
• General disease
properties
• Measurable
parameters
• Scores
Cells and clinical parameters
Cells affect clinical parameters
23. | 23
(beta)
Database statistics
Objects
Cells 617
Cell Processes 7 K
Diseases 7.3 K
Clinical Parameters 1.5 K
Proteins and
protein classes
17.7 K
Small molecules 13.5 K
Relations
Regulation 540 K
Cell expression 250 K
Quantitative Change 10 K
State change 7.7 K
Functional Association 31.5 K
Over 830 K new relations
On top of 4.8 million previously extracted relations
25. | 25
How proteins secreted from breast carcinoma
affect cells involved in its mechanism?
Step 1
What proteins are secreted from
breast carcinoma cells?
26. | 26
How proteins secreted from breast carcinoma
affect cells involved in its mechanism?
Step 1
What proteins are secreted from
breast carcinoma cells?
Disease Secretion Protein
27. | 27
Disease Secretion Protein
Step 2
What cells are involved
in breast carcinoma?
Cell Regulation Disease
How proteins secreted from breast carcinoma
affect cells involved in its mechanism?
Step 1
28. | 28
Step 3
Find proteins that stimulate
pro-disease cells and inhibit
anti-disease cells
Protein Regulation Cell
How proteins secreted from breast carcinoma
affect cells involved in its mechanism?
Step 2
Step 1
Disease Secretion Protein
Cell Regulation Disease
29. | 29
How proteins secreted from breast carcinoma
affect cells involved in its mechanism?
31. | 31
High grade astrocytoma survival
Background: 3-5% of glioblastoma patients survive longer than 3 years
What are the predictors of long survival?
High grade astrocytoma patient survival: brain tumor (GSE33331)
Comparison of surgical brain tumors of the long survival vs average survival
Increased Immune Gene Expression and Immune Cell Infiltration in High Grade Astrocytoma Distinguish Long from
Short-Term Survivors
Donson AM, Birks DK, Schittone SA, Kleinschmidt-DeMasters BK, Sun DY, Hemenway MF, Handler MH, Waziri AE, Wang M, Foreman NK.
J Immunol. 2012 Aug 15; 189(4): 1920–1927.
32. | 32
What is needed for long survival?
Survival
over 70
months
vs Functionally
annotate
genes
What processes
are they
involved in?
What pathways
are they
involved in?
Standard
survival
Results depend
on the quality
and granularity
of the database
34. | 34
Analysis of genes positively correlating with prognosis
Genes positively
correlated with
survival
Gene Set
Enrichment
Analysis using
Gene Ontology
Immune system
is involved, but
can we be more
specific?
35. | 35
Dig deeper into immune component
Gene Ontology
Mammal
ChemEffect
DIseaseFX
Mammal
ChemEffect
DIseaseFX+cells
28 55 63
T cells
B cells
Leukocytes
Lymphocytes
Neutrofils
+
Macrophages
Dendritic cells
Monocytes
T helpers
+
Th1 cells
Th2 cells
Th17 cells
Tregs
CD8+ T cells
immune
processes
in top 100
immune
cells
in top 100
37. | 37
Does this new information make sense?
Known connections
between immune cells
identified in analysis
and brain tumors
38. | 38
• New database is focused on cell-centered
facts
• It adds over 800K new relations and 4.4K
entities to the knowledgebase
• New data can be used to answer complex
biological questions and analyze
experiments
• Flexible recognition of cells and cell
processes allows customization
Summary