Multiple Alignment Sequence using Clustal Omega/ Shumaila RiazShumailaRiaz6
Alignment of three or more biological sequences of Protein, DNA, RNA of similar length
Clustal Omega is tool for analyzing the Multiple sequence alignments of proteins
Short tutorials on how to use the web-based tool DAVID - Database for Annotation, Visualization and Integrated Discovery) - http://david.abcc.ncifcrf.gov/
DAVID provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.
Archive of experimentally determined 3D structures of biological macromolecules.
Established in 1971, by Research Collaboratory for Structural Bioinformatics (RCSB), Brookhaven National Laboratories, USA.
Archive contain atomic coordinates, bibliographic citations, primary and secondary structure information, crystallographic structure factors, NMR experimental data.
Multiple Alignment Sequence using Clustal Omega/ Shumaila RiazShumailaRiaz6
Alignment of three or more biological sequences of Protein, DNA, RNA of similar length
Clustal Omega is tool for analyzing the Multiple sequence alignments of proteins
Short tutorials on how to use the web-based tool DAVID - Database for Annotation, Visualization and Integrated Discovery) - http://david.abcc.ncifcrf.gov/
DAVID provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.
Archive of experimentally determined 3D structures of biological macromolecules.
Established in 1971, by Research Collaboratory for Structural Bioinformatics (RCSB), Brookhaven National Laboratories, USA.
Archive contain atomic coordinates, bibliographic citations, primary and secondary structure information, crystallographic structure factors, NMR experimental data.
In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.
Contents:
What does sequence mean?
Examples of sequences
Sequence Homology
Sequence Alignment
What is the use of sequence alignment?
Alignment methods
Tools for Sequence Alignment
FASTA Format
BLAST
Principle of BLAST
Variants of BLAST Program
BLAST input
BLAST output
Multiple sequence alignment
What is the use of multiple alignments?
Multiple Alignment Method
Tool for multiple alignments
ClustalW input
ClustalW output
E (Expectation) value
Demerits of progressive alignment
Course: Bioinformatics for Biomedical Research (2014).
Session: 1.3- Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Functional proteomics, methods and toolsKAUSHAL SAHU
INTRODUCTION
HISTORY
DEFINITION
PROTEOMICS
FUNCTIONAL PROTEOMICS
PROTEOMICS SOFTWARE
PROTEOMICS ANALYSIS
TOOLS FOR PROTEOM ANALYSIS
DIFFERENTS METHODS FOR STUDY OF FUNCTIONAL PROTEOMICS
APLLICATIONS
LIMITATIONS
CONCLUSION
BITS: Overview of important biological databases beyond sequencesBITS
Module 4 Other relevant biological data sources beyond sequences
Part of training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training
A plant genome project aims to discover all genes and their function in a particular plant species.
The main objective of genomic research in any species is to sequence the whole genome and functions of all the different coding and non-coding sequences.
These techniques helped in preparation of molecular maps of many plant genomes.
Plant genome projects initially focused on a few model organisms that are characterized by small genomes or their amenability to genetic studies
Since sequencing technologies have moved on, sequencing cost have dropped and bioinformatics tools advanced, the genomes of many plant species including the enormous genome of bread wheat have been assembled
Genome sequencing projects have been carried out on all three plant genomes: the nuclear, chloroplast and mitochondrial genomes
This opened venues for advanced molecular breeding and manipulation of plant species, but also have accelerated phylogenetics studies amongst species
Several excellent curated plant genome databases, besides the general nucleotide data base archives, allow public access of plant genomes
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal (or lateral) gene transfer event (xenologs).[1]
Homology among DNA, RNA, or proteins is typically inferred from their nucleotide or amino acid sequence similarity. Significant similarity is strong evidence that two sequences are related by evolutionary changes from a common ancestral sequence. Alignments of multiple sequences are used to indicate which regions of each sequence are homologous.
Genomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble and analyze the function and structure of genomes
INTRODUCTION OF BIOINFORMATICS
HISTORY
WHAT IS DATABASE
NEED FOR DATABASE
TYPES OF DATABASE
PRIMARY DATABASE
NUCLEIC ACID SEQUENCE DATABASE
GENE BANK
INTRODUCTION
GENE BANK SUBMISSION TOOL
GENE BANK SUBMISSION TYPE
HOW TO RETRIEVE DATA FROM GENEBANK
APPLICATION
CONCLUSION
REFERENCE
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.
Contents:
What does sequence mean?
Examples of sequences
Sequence Homology
Sequence Alignment
What is the use of sequence alignment?
Alignment methods
Tools for Sequence Alignment
FASTA Format
BLAST
Principle of BLAST
Variants of BLAST Program
BLAST input
BLAST output
Multiple sequence alignment
What is the use of multiple alignments?
Multiple Alignment Method
Tool for multiple alignments
ClustalW input
ClustalW output
E (Expectation) value
Demerits of progressive alignment
Course: Bioinformatics for Biomedical Research (2014).
Session: 1.3- Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensembl, Biomart and IGV.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Functional proteomics, methods and toolsKAUSHAL SAHU
INTRODUCTION
HISTORY
DEFINITION
PROTEOMICS
FUNCTIONAL PROTEOMICS
PROTEOMICS SOFTWARE
PROTEOMICS ANALYSIS
TOOLS FOR PROTEOM ANALYSIS
DIFFERENTS METHODS FOR STUDY OF FUNCTIONAL PROTEOMICS
APLLICATIONS
LIMITATIONS
CONCLUSION
BITS: Overview of important biological databases beyond sequencesBITS
Module 4 Other relevant biological data sources beyond sequences
Part of training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training
A plant genome project aims to discover all genes and their function in a particular plant species.
The main objective of genomic research in any species is to sequence the whole genome and functions of all the different coding and non-coding sequences.
These techniques helped in preparation of molecular maps of many plant genomes.
Plant genome projects initially focused on a few model organisms that are characterized by small genomes or their amenability to genetic studies
Since sequencing technologies have moved on, sequencing cost have dropped and bioinformatics tools advanced, the genomes of many plant species including the enormous genome of bread wheat have been assembled
Genome sequencing projects have been carried out on all three plant genomes: the nuclear, chloroplast and mitochondrial genomes
This opened venues for advanced molecular breeding and manipulation of plant species, but also have accelerated phylogenetics studies amongst species
Several excellent curated plant genome databases, besides the general nucleotide data base archives, allow public access of plant genomes
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal (or lateral) gene transfer event (xenologs).[1]
Homology among DNA, RNA, or proteins is typically inferred from their nucleotide or amino acid sequence similarity. Significant similarity is strong evidence that two sequences are related by evolutionary changes from a common ancestral sequence. Alignments of multiple sequences are used to indicate which regions of each sequence are homologous.
Genomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble and analyze the function and structure of genomes
INTRODUCTION OF BIOINFORMATICS
HISTORY
WHAT IS DATABASE
NEED FOR DATABASE
TYPES OF DATABASE
PRIMARY DATABASE
NUCLEIC ACID SEQUENCE DATABASE
GENE BANK
INTRODUCTION
GENE BANK SUBMISSION TOOL
GENE BANK SUBMISSION TYPE
HOW TO RETRIEVE DATA FROM GENEBANK
APPLICATION
CONCLUSION
REFERENCE
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Rothamsted Research, UK
Workshop within the Integrative Bioinformatics Conference (IB2018, Harpenden, 2018).
We describe how to use Semantic Web Technologies and graph databases like Neo4j to serve life science data and address the FAIR data principles.
Towards semantic systems chemical biology Bin Chen
introduce a semantic framework for studying systems chemical biology / systems pharmacology, in which three major projects (Chem2Bio2RDF, Chem2Bio2OWL, SLAP (semantic link association prediction) are covered.
With the unprecedented growth of chemical databases incorporating up to several hundred billions of synthetically feasible chemicals, modelers are not in shortage of chemicals to process. Importantly, such "Big Chemical Data" offers humongous opportunities for discovering novel bioactive molecules. However, the current generation of cheminformatics software tools is not capable of handling, characterizing, and processing such extremely large chemical libraries. In this presentation, we will discuss the rationale and the main challenges (theoretical and technical) for screening very large repositories of compounds in the current context of drug discovery. We will present several proof-of-concept studies regarding the screening of extremely large libraries (1+ billion compounds) using our novel GPU-accelerated cheminformatics platform to identify molecules with defined bioactivity. Overall, we will show that GPU computing represents an effective and inexpensive architecture to develop, employ, and validate a new generation of cheminformatics methods and tools ready to process billions of compounds.
Step by step tutorial for conducting GO enrichment analysis and then creating a network from the results.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
An update version of the genome assembly including the mention of techniques such as HiC and Bionano. Also include the QC. These are the same slides used in the course for the UNL in Argentina.
Presentation about how to perform data processing for genomics data in population genetics and quantitative genetics studies. It explains how to process the reads, map them, get variants and quantify them. It also presents 25 common Linux commands that are required in order to interact with the Linux system and be able to run different tools.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
GoTermsAnalysisWithR
1. Using R for GO Terms Analysis
Boyce Thompson Institute for Plant
Research
Tower Road
Ithaca, New York 14853-1801
U.S.A.
by
Aureliano Bombarely Gomez
2. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Enrinchment Analysis: TopGO
4.3. GO profiles: GOProfiles
4.4. Gene Similarities: GOSim
5. Example.
5.1. Comparing two arabidopsis chromosomes.
3. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Enrinchment Analysis: TopGO
4.3. GO profiles: GOProfiles
4.4. Gene Similarities: GOSim
5. Example.
5.1. Comparing two arabidopsis chromosomes.
4. 1. What are GO Terms ?
Gene ontologies:
Structured controlled vocabularies
(ontologies) that describe gene products
in terms of their associated
biological processes,
cellular components and
molecular functions
in a species-independent manner
http://www.geneontology.org/GO.doc.shtml
5. 1. What are GO Terms ?
Biological processes,
Recognized series of events or molecular functions. A process is
a collection of molecular events with a defined beginning and end.
Cellular components,
Describes locations, at the levels of subcellular structures and
macromolecular complexes.
Molecular functions
Describes activities, such as catalytic or binding activities, that
occur at the molecular level.
6. 1. What are GO Terms ?
Gene ontologies:
Structured controlled vocabulariesbetween the terms
relationships
(ontologies) that describe gene products
in terms ofbe described in terms of a acyclic graph, where:
GO can their associated
each GO term is a node, and the
biological processes, the terms are arcs between the nodes
relationships between
cellular components and
molecular functions
in a species-independent manner
7. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Enrinchment Analysis: TopGO
4.3. GO profiles: GOProfiles
4.4. Gene Similarities: GOSim
5. Example.
5.1. Comparing two arabidopsis chromosomes.
8. 2. What is R ?
R is a language and environment for statistical
computing and graphics. .
9. 2. What is R ?
WEB:
OFICIAL WEB: http://www.r-project.org/index.html
QUICK-R: http://www.statmethods.net/index.html
BOOKS:
Introductory Statistics with R (Statistics and Computing), P. Dalgaard
[available as manual at R project web]
The R Book, MJ. Crawley
R itself: help() and example()
10. 2. What is R ?
INTEGRATED DEVELOPMENT ENVIRONMENT (IDE):
R-Studio: http://rstudio.org/
11. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Enrinchment Analysis: TopGO
4.3. GO profiles: GOProfiles
4.4. Gene Similarities: GOSim
5. Example.
5.1. Comparing two arabidopsis chromosomes.
12. 3. What is Bioconductor ?
Bioconductor is an open source, open development
software project to provide tools for the analysis and
comprehension of high-throughput genomic data. It is
based primarily on the R programming language.
13. 3. What is Bioconductor ?
WEB:
OFICIAL WEB: http://www.bioconductor.org/
BOOKS:
Bioinformatics and Computational Biology Solutions Using R and
Bioconductor, 2005, Gentleman R. et al. (Springer)
R Programming for Bioinformatics, 2008, Gentleman R. (CRC Press)
MAILING LIST:
http://www.bioconductor.org/help/mailing-list/
14. 3. What is Bioconductor ?
Installation of Bioconductor:
15. 3. What is Bioconductor ?
Use of Bioconductor:
16. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Similarities: GOSim
4.3. GO profiles: GOProfiles
4.4. Gene Enrinchment Analysis:TopGO
5. Example.
5.1. Comparing two arabidopsis chromosomes.
18. 4. Bioconductor modules for GOterms
Bioconductor Packages for GO Terms:
GO.db A set of annotation maps describing the entire Gene Ontology
Gostats Tools for manipulating GO and microarrays
GOSim functional similarities between GO terms and gene products
GOProfiles Statistical analysis of functional profiles
TopGO Enrichment analysis for Gene Ontology
19. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Enrinchment Analysis: TopGO
4.3. GO profiles: GOProfiles
4.4. Gene Similarities: GOSim
5. Example.
5.1. Comparing two arabidopsis chromosomes.
20. 4.1 GO.db
http://www.bioconductor.org/packages/2.9/data/annotation/html/GO.db.html
1) GO terms stored in objects
GOTERM,
GOBPPARENTS, GOCCPARENTS, GOMFPARENTS
GOBPANCESTOR, GOCCANCESTOR, GOMFANCESTOR
GOBPCHILDREN, GOCCCHILDREN, GOMFCHILDREN
GOBPOFFSPRING, GOCCOFFSPRING, GOMFOFFSPRING
More Information:
http://www.bioconductor.org/packages/2.6/bioc/vignettes/annotate/inst/doc/GOusage.pdf
22. 4.1 GO.db
Exercise 1:
1.1 Which term correspond top the GOID: GO:0006720 ?
1.2 How many synonyms has this term ?
1.3 How many children ?
1.4 … and parents ?
1.5 … and offsprings ?
1.6 … and ancestors ?
23. 4.1 GO.db
2) Mapping between gene and GO terms stored in objects or
dataframes.
org.At.tair.db
Genome wide annotation for Arabidopsis, primarily based on mapping
using TAIR identifiers.
24. 4.1 GO.db
2) Mapping between gene and GO terms stored in objects
(annotate package) or dataframes.
> library("org.At.tair.db")
> org.At.tairGO[["AT5G58560"]]
Use a list:
> org.At.tairGO[["AT5G58560"]][[1]]$Ontology
Functions (“annotate” package):
+ getOntology(inlist, gocategorylist)
+ getEvidence(inlist)
> getOntology(org.At.tairGO[["AT5G58560"]])
> getEvidence(org.At.tairGO[["AT5G58560"]])
25. 4.1 GO.db
Exercise 2:
2.1 How many terms map with the gene: AT5G58560 ?
2.2 What biological process terms map with this gene?
2.3 What evidences are associated with these terms ?
26. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Enrinchment Analysis: TopGO
4.3. GO profiles: GOProfiles
4.4. Gene Similarities: GOSim
5. Example.
5.1. Comparing two arabidopsis chromosomes.
27. 4.2 TopGO
Gene Set Enrinchment Analysis (GSEA)
Computational method that determines whether an a priori
defined set of genes shows statistically significant, concordant
differences between two biological states (e.g. phenotypes).
Documentation: http://www.broadinstitute.org/gsea/index.jsp
29. 4.2 TopGO
Methodology:
1) Data preparation:
a) Gene Universe. OBJECT:
b) GO Annotation. topGOdata
c) Criteria to select interesting genes.
> sampleGOdata <- new("topGOdata",
description = "Simple session",
ontology = "BP",
allGenes = geneList, Gene Universe
geneSel = topDiffGenes, Selected Genes
nodeSize = 10,
Function to map data provided
annot = annFUN.db, in the annotation data packages
affyLib = “hgu95av2.db”) Data package
30. 4.2 TopGO
> sampleGOdata
------------------------- topGOdata object ------------------------- OBJECT:
Description: topGOdata
- Simple session
Ontology:
- BP
323 available genes (all genes from the array):
- symbol: 1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at ...
- score : 1 1 0.62238 0.541224 1 ...
- 50 significant genes.
316 feasible genes (genes that can be used in the analysis):
- symbol: 1095_s_at 1130_at 1196_at 1329_s_at 1340_s_at ...
- score : 1 1 0.62238 0.541224 1 ...
- 50 significant genes.
GO graph (nodes with at least 10 genes):
- a graph with directed edges
- number of nodes = 672
- number of edges = 1358
------------------------- topGOdata object -------------------------
31. 4.2 TopGO
Methodology:
2) Running the enrinchment test: (runTest function)
> resultFisher <- runTest(sampleGOdata,
algorithm = "classic", whichAlgorithms()
statistic = "fisher") whichTest()
> resultFisher "fisher"
"ks"
Description: Simple session "t"
Ontology: BP "globaltest"
'classic' algorithm with the 'fisher' test "sum"
672 GO terms scored: 66 terms with p < 0.01 "ks.ties"
Annotation data:
Annotated genes: 316
Significant genes: 50
Min. no. of genes annotated to a GO: 10
Nontrivial nodes: 607
32. 4.2 TopGO
Methodology:
2) Running the enrinchment test:
> resultKS <- runTest(sampleGOdata,
algorithm = "classic", whichAlgorithms()
statistic = "ks") whichTest()
> resultKS "fisher"
"ks"
Description: Simple session "t"
Ontology: BP "globaltest"
'classic' algorithm with the 'ks' test "sum"
672 GO terms scored: 57 terms with p < 0.01 "ks.ties"
Annotation data:
Annotated genes: 316
Significant genes: 50
Min. no. of genes annotated to a GO: 10
Nontrivial nodes: 672
34. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Enrinchment Analysis: TopGO
4.3. GO profiles: GOProfiles
4.4. Gene Similarities: GOSim
5. Example.
5.1. Comparing two arabidopsis chromosomes.
35. 4.3 goProfiles
http://www.bioconductor.org/packages/2.8/bioc/html/goProfiles.html
1) Profiles are built by slicing the GO graph at a given level
Level 1
Level 2
Level 3
Level 4
36. 4.3 goProfiles
2) Functional profile at a given GO level is obtained by counting the
number of identifiers having a hit in each category of this level
3) Profiles comparissons:
1. How different or representative of a given gene universe is a given set of genes?
• Universe: All genes analyzed,
Gene Set: Differentially expressed genes in a microarray experiment
• Universe: All genes in a database,
Gene Set: Arbitrarily selected set of genes
2. How biologically different are two given sets of genes?
• Differentially expressed genes in two experiments
• Arbitrarily chosen lists of genes
37. 4.3 goProfiles
Methodology:
1) Data preparation:
a) Gene GO term map (object or dataframe).
Dataframe with 4 columns:
+ GeneID
+ Ontology
+ Evidence
+ GOID
2) Profile creation:
a) basicProfile function
BasicProfile( genelist, “Entrez” (default),
idType = "Entrez", “BiocProbes”,
onto = "ANY", “GoTermsFrame”
Level = 2,
orgPackage=NULL, Requested for “Entrez”
anotPackage=NULL, Requested for
...) “BiocProbes”
38. 4.3 goProfiles
Methodology:
2) Profile creation:
a) basicProfile function
> printProfiles(bpprofile)
Functional Profile
==================
[1] "BP ontology"
Description GOID Frequency
25 cellular component biogen... GO:0044085 5
14 cellular component organi... GO:0016043 6
12 cellular process GO:0009987 122
32 establishment of localiza... GO:0051234 4
4 immune system process... GO:0002376 1
31 localization GO:0051179 4
2 metabolic process GO:0008152 3
33 multi-organism process... GO:0051704 2
30 response to stimulus GO:0050896 4
40. 4.3 goProfiles
Methodology:
2) Profile creation:
b) expandedProfile function
Used mainly for comparisons of profiles.
ExpandedProfile( genelist,
idType = "Entrez",
onto = "ANY",
Level = 2,
orgPackage=NULL,
anotPackage=NULL
...)
3) Profile comparisons:
- Case I (INCLUSION): One list incluided in the other.
- Case || (DISJOINT): Non overlapping gene sets
- Case III (INTERSECTION): Overlapping genes
41. 4.3 goProfiles
Methodology:
3) Profile comparisons:
- Case I (INCLUSION): compareProfilesLists()
- Case || (DISJOINT): compareGeneLists()
- Case III (INTERSECTION): compareGeneLists()
> comp_ath_genes <- compareGeneLists(
genelist1=ath_chl_list,
genelist2=ath_mit_list,
idType="Entrez", orgPackage="org.At.tair.db",
onto="BP", level=2)
> print(compSummary(comp_ath_genes))
Sqr.Euc.Dist StdErr pValue 0.95CI.low 0.95CI.up
0.031401 0.024101 0.005000 -0.015837 0.078639
44. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Enrinchment Analysis: TopGO
4.3. GO profiles: GOProfiles
4.4. Gene Similarities: GOSim
5. Example.
5.1. Comparing two arabidopsis chromosomes.
46. Using R for GO Terms Analysis:
1. What are GO terms ?
2. What is R ?
3. What is Bioconductor ?
4. Bioconductor modules for GO terms
4.1. Basics: GO.db
4.2. Gene Enrinchment Analysis: TopGO
4.3. GO profiles: GOProfiles
4.4. Gene Similarities: GOSim
5. Example.
5.1. Comparing two arabidopsis chromosomes.
47. 5 Example
Exercise 3:
Compare the goProfiles for A. thaliana chromosomes 1 and 5.
Are they significantly differents ?
Source:
ftp://ftp.arabidopsis.org/home/tair/Genes/
TAIR10_genome_release/TAIR10_gene_lists/TAIR10_all_gene_models
R Package:
GoProfiles