Enrichr is a web server that provides comprehensive gene set enrichment analysis. It contains over 31,000 gene sets across 35 libraries covering categories like transcription, pathways, cell types, and diseases. These gene sets represent human and mouse genomes and proteomes. Enrichr calculates enrichment using methods like Fisher exact test, z-score, and combined score. It offers advantages of being easy to use with interactive visualization, but lacks some flexibility and features of other tools.
Comprehensive gene set enrichment analysis web server Enrichr
1. Enrichr
a comprehensive gene set enrichment analysis web server
INFO-703 Biological Data Management
March 27th, 2019
Presented by Thi Nguyen
http://amp.pharm.mssm.edu/Enrichr/
2. EnrichR gene-set libraries
• 35 gene-set libraries: transcription, pathways, ontologies, diseases/drugs,
cell types and misc.
• total = 31,026 gene-sets that completely cover human and mouse
genome + proteome
• on average, each gene-set has ~ 350 genes and > 6 million connections between
gene and term.
• gene frequencies for most gene-set libraries follow power law.
3. EnrichR gene-set libraries
I. transcription category: link DEG with transcription factors:
1. ChIP-x Enrichment Analysis (ChEA) database
2. Position weight matrices (PWM) from TRANSFAC database
3. transcription factor target genes inferred from PWM
4. ENCODE transcription factor gene-set library
5. Histone modification extracted from from
NIH Roadmap Epigenomics
6. microRNA gene set library from TargetScan
4. EnrichR gene-set libraries
II. Pathway Category includes gene-set libraries from well-known databases:
• WikiPathways
• KEGG
• BioCarta
• Reactome
and other libraries are created from their own resources:
• kinase enrichment analysis (KEA)
• PPI hubs
• CORUM
• complexes from IP-MS study
• mannually assembled lists of phosphoproteins from SILAC phosphoproteomics
III. Ontology Category: contains gene-set libraries from 3 gene ontology threes
and from the knockout mouse phenotypes ontology from MGI-MP browser (Jackson
lab)
5. EnrichR gene-set libraries
V. Cell type category :
• highly expressed genes from Mouse and Human Gene Atlases
• highly expressed genes from cancer cells from Cancer Cell Line Encyclopedia (CCLE)
• NCI-60 Cell line data set
VI. Misc category:
• chromosome location (MSigDB)
• metabolites (HMDB)
• structural domains (PFAM and InterPro)
IV. Disease/drug category:
• CMAP database
• GeneSigDB
• MSigDB
• OMIM
• VirusMINT
6. 3 methods to rank enrichment scores
1.Fisher Exact test
2.z-score of the deviation from the expected
rank by Fisher Exact test
3. combined score that multiplies the log of
p-value (Fisher exact test) by the z-score
8. 2016 NAR update
• 180, 184 annotated gene sets from 102 gene set libraries (GEO)
• new features:
submit fuzzy sets
upload BED files
improve API
visualization tool: clustergram
different scoring scheme
visualize overlap between Enrichr and other gene set libraries
10. 2016 NAR update
comparison of resources
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987924/
User interface pros and cons
There are many other gene set enrichment analysis tools that could be compared with
Enrichr; for example, some leading tools are Fidea (39), DAVID (13), WebGestalt (12), g:Profiler
(12) and GSEA (40). The advantages of Enrichr over some of these tools are its
comprehensiveness, ease of use and interactive visualization of the results. Enrichr is lacking
some of the flexibility available with those other tools. For example, Enrichr merges human,
mouse and rat genes, which has advantages and disadvantages. Enrichr does not have an ID
conversion tool, which is highly desired by many users. Enrichr also does not have the ability
to upload a background list, and it does not have implementation of parametric tests such as
Gene Set Enrichment Analysis (GSEA) (40), Parametric Analysis of Gene set Enrichment (PAGE)
(9), and our own Principal Angle Enrichment Analysis (PAEA) (41). These features are planned.
Editor's Notes
it is a database of gene set libraries
n enrichment analysis tool developed by the Ma'ayan Lab.
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments.
Enrichr includes 35 gene-set libraries totaling 31,026 gene-sets that completely cover the human and mouse genome and proteome (Table 1). On average, each gene-set has ~350 genes and there are over six million connections between terms and genes. Further statistics and information of where the gene-set libraries were derived from can be found in the “Dataset Statistics” tab of the Enrichr main page. Histograms of gene frequencies for most gene-set libraries follow a power law, suggesting that some genes are much more common in gene-set libraries than others (Figure 2a). This has an implication for enrichment computations that we did not consider yet in Enrichr. Some genes are more likely to appear in various enrichment analyses more than others, this tendency can stem from various sources including well-studied genes. This research focus bias is in several of the libraries.
Enrichr contains 35 gene-set libraries where some libraries are borrowed from other tools while many other libraries are newly created and only available in Enrichr.
he gene-set libraries provided by Enrichr are divided into six categories: transcription, pathways, ontologies, diseases/drugs, cell types and miscellaneous. The following is a description of each library and how it was created:
The transcription category provides six gene-set libraries that attempt to link differentially expressed genes with the transcriptional machinery. These six libraries include the ability to identify transcription factors that are enriched for target genes within the input list using four different options: 1) ChEA [10]; 2) position weight matrices (PWMs) from TRANSFAC [11] and JASPAR [12]; 3) target genes generated from PMWs downloaded from the UCSC genome browser [13]; and 4) transcription factor targets extracted from the ENCODE project [14, 15]. In addition, the two other gene-set libraries in the transcription category are gene sets associated with: 5) histone modifications extracted from the Roadmap Epigenomics Project [16]; and 6) microRNAs targets computationally predicted by TargetScan [17].
Enrichr contains 35 gene-set libraries where some libraries are borrowed from other tools while many other libraries are newly created and only available in Enrichr.
he gene-set libraries provided by Enrichr are divided into six categories: transcription, pathways, ontologies, diseases/drugs, cell types and miscellaneous. The following is a description of each library and how it was created:
Enrichr contains 35 gene-set libraries where some libraries are borrowed from other tools while many other libraries are newly created and only available in Enrichr.
he gene-set libraries provided by Enrichr are divided into six categories: transcription, pathways, ontologies, diseases/drugs, cell types and miscellaneous. The following is a description of each library and how it was created:
Enrichr computes three types of enrichment scores to assess the significance of overlap between the input list and the gene sets in each gene-set library for ranking a term’s relevance to the input list. These tests are: 1) the Fisher exact test, a test that is implemented in most gene list enrichment analyses programs; 2) a test statistics that we developed which is the z-score of the deviation from the expected rank by the Fisher exact test; and 3) a combined score that multiplies the log of the p-value computed with the Fisher exact test by the z-score computed by our correction to the tes
Enrichr workflow. Enrichr receives lists of human or mouse genes as input. It uses 35 gene-set libraries to compute enrichment. The enrichment results are interactively displayed as bar graphs, tables, grids of terms with the enriched terms highlighted, and networks of enriched terms.
After submitting the list for analysis, the user is presented with the results page, which is divided into the six different categories: transcription, pathways, ontologies, disease/drugs, cell types, and miscellaneous. Clicking on the name of the gene-set library expands a box that reveals the enrichment analysis results for that gene-set library. Users are first presented with a bar graph that shows the top 10 enriched terms for the selected gene-set library (Figure 1 and Additional file 2: Figure S2). The bar graph provides a visual representation of how significant each term is based on the overlap with the user’s input list. The longer bars and lighter colored bars mean that the term is more significant. It is possible to export the bar graph as a figure for publication or other form of presentation into three formats: JPEG, SVG and PNG. In addition, the color of the bar graph can be customized using a hexagonal color selection wheel populated with colors that provide the best contrast. There are three methods to compute enrichment and the user can toggle between them by clicking on any bar of the bar graph: Fisher exact test based ranking, rank based ranking, and combined score ranking.
verall, Enichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries.
The new gene set libraries that were added include differentially expressed genes after drug, gene, disease and pathogen perturbations extracted from the national center for biotechnology information (NCBI) gene expression omnibus (GEO) through a crowdsourcing project. Furthermore, we have implemented the ability to submit fuzzy sets, upload BED files, a calendar that shows the number of lists submitted each day, an improved application programming interface (API), an enhanced help documentation, an improved Find a Gene feature, and visualization of the results as clustergrams. In this manuscript, we also provide updated benchmarking results of the different scoring schemes implemented in Enrichr and visualize the overlap between the data sets currently within Enrichr compared with other comparable web-server tools and resources that serve gene set libraries.