This document provides an overview of comparative genomics. It begins by defining genomics and its subfields, including comparative genomics which compares complete genome sequences across species. Tools for comparative genomics like BLAST and synteny are discussed. The history of comparative genomics from early virus comparisons to current eukaryote analyses is summarized. Methods for comparative analysis include examining genome structure, coding regions, protein content, and non-coding regions. General databases useful for comparative genomics are also listed.
Genome annotation, NGS sequence data, decoding sequence information, The genome contains all the biological information required to build and maintain any given living organism.
description of functional genomics and structural genomics and the techniques involved in it and also decribing the models of forward genetics and techniques involved in it and reverse genetics and techniques involved in it
After sequencing of the genome has been done, the first thing that comes to mind is "Where are the genes?". Genome annotation is the process of attaching information to the biological sequences. It is an active area of research and it would help scientists a lot to undergo with their wet lab projects once they know the coding parts of a genome.
Gene prediction is the process of determining where a coding gene might be in a genomic sequence. Functional proteins must begin with a Start codon (where DNA transcription begins), and end with a Stop codon (where transcription ends).
Genome annotation, NGS sequence data, decoding sequence information, The genome contains all the biological information required to build and maintain any given living organism.
description of functional genomics and structural genomics and the techniques involved in it and also decribing the models of forward genetics and techniques involved in it and reverse genetics and techniques involved in it
After sequencing of the genome has been done, the first thing that comes to mind is "Where are the genes?". Genome annotation is the process of attaching information to the biological sequences. It is an active area of research and it would help scientists a lot to undergo with their wet lab projects once they know the coding parts of a genome.
Gene prediction is the process of determining where a coding gene might be in a genomic sequence. Functional proteins must begin with a Start codon (where DNA transcription begins), and end with a Stop codon (where transcription ends).
Introduction
Transcriptome analysis
Goal of functional genomics
Why we need functional genomics
Technique
1. At DNA level
2.At RNA level
3. At protein level
4. loss of function
5. functional genomic and bioinformatics
Application
Latest research and reviews
Websites of functional genomics
Conclusions
Reference
Automated sequencing of genomes require automated gene assignment
Includes detection of open reading frames (ORFs)
Identification of the introns and exons
Gene prediction a very difficult problem in pattern recognition
Coding regions generally do not have conserved sequences
Much progress made with prokaryotic gene prediction
Eukaryotic genes more difficult to predict correctly
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGPuneet Kulyana
This presentation will give you a brief idea about the various DNA sequencing methods and various strategies used for genome sequencing and much more vital information related to gene expression and analysis
Introduction
Transcriptome analysis
Goal of functional genomics
Why we need functional genomics
Technique
1. At DNA level
2.At RNA level
3. At protein level
4. loss of function
5. functional genomic and bioinformatics
Application
Latest research and reviews
Websites of functional genomics
Conclusions
Reference
Automated sequencing of genomes require automated gene assignment
Includes detection of open reading frames (ORFs)
Identification of the introns and exons
Gene prediction a very difficult problem in pattern recognition
Coding regions generally do not have conserved sequences
Much progress made with prokaryotic gene prediction
Eukaryotic genes more difficult to predict correctly
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGPuneet Kulyana
This presentation will give you a brief idea about the various DNA sequencing methods and various strategies used for genome sequencing and much more vital information related to gene expression and analysis
The evolutionary conserved neurobiology of operant learningBjörn Brembs
Presentation at the 2016 annual meeting of the Mind and Brain College of the University of Lisbon on the multiple learning systems interacting during operant learning.
Guest lecture on comparative genomics for University of Dundee BS32010, delivered 21/3/2016
Workshop/other materials available at DOI:10.5281/zenodo.49447
Slides from a Comparative Genomics and Visualisation course (part 1) presented at the University of Dundee, 7th March 2014. Other materials are available at GitHub (https://github.com/widdowquinn/Teaching)
Comparative genomics: Genomic features are compared, evolutionary relationship
The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. orthologous sequences,
Started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria Haemophilus influenzae and Mycoplasma genitalium) in 1995, comparative genomics is now a standard component of the analysis of every new genome sequence. comparative genomics studies of small model organisms (for example the model Caenorhabditis elegans and closely related Caenorhabditis briggsae) are of great importance to advance our understanding of general mechanisms of evolution
Computational tools for analyzing sequences and complete genomes. Application of comparative genomics in agriculture and medicine.
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
The understanding of gene regulation is the most basic need for the classification of genes within a DNA. These genes
within the DNA are grouped together into clusters also known as Transcription Units. The genes are grouped into transcription units
for the purpose of construction and regulation of gene expression and synthesis of proteins. This knowledge further contributes as
essential information for the process of drug design and to determine the protein functions of newly sequenced genomes. It is possible
to use the diverse biological information across multiple genomes as an input to the classification problem. The purpose of this work is
to show that Particle Swarm Optimization may provide for more efficient classification as compared to other algorithms. To validate
the approach E.Coli complete genome is taken as the benchmark genome.
Genomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble and analyze the function and structure of genomes
Bioinformatics for beginners (exam point of view)Sijo A
. The term bioinformatics is coined by…………………………….
Paulien Hogeweg
2. What is an entry in database?
The process of entering data into a computerised database or spreadsheet.
3. Define BLASTp
BLAST- Basic Local Alignment Search Tool
It is a homology and similarity search tool.
It is provided by NCBI.
It is used to compare a query DNA sequence with a database of sequences.
4. What is Ecogenes?
Ecogene is a database and website and it is developed to improve structural and functional annotation of E.coli K-12 MG 1655.
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
Sequencing projects arising from high throughput technologies including those of sequencing DNA microarrays allowed to simultaneously measure the expression levels of millions of genes of a biological sample as well as annotate and identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information,
bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the hypothesis of researchers throughout the experimental cycle. In this context, this article describes and discusses some of techniques used for the functional analysis of gene expression data.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
2. Genomics
Genomics is an area within genetics that concerns the sequencing and analysis of an
organism’s genome.
Development and application of genetic mapping, sequencing, and computation
(bioinformatics) to analyze the genomes of organisms.
Sub-fields of genomics:
Structural genomics-genetic and physical mapping of genomes.
Functional genomics-analysis of gene function (and non-genes).
Comparative genomics-comparison of genomes across species.
Includes structural and functional genomics.
Evolutionary genomics.
3. Comparative genomics
Comparative genomics is an exciting field of biological research in which
researchers use a variety of tools, including computer-based analysis, to
compare the complete genome sequences of different species
A comparison of gene numbers, gene locations & biological functions of
gene, in the genomes of different organisms, one objective being to
identify groups of genes that play a unique biological role in a particular
organism.
4. History
• Comparative genomics has a root in the comparison of virus genomes in
the early 1980s.
• For example, small RNA viruses infecting animals (picorna viruses) and
those infecting plants ( cowpea mosaic virus) were compared and turned
out to share significant sequence similarity and, in part, the order of their
genes.
• In 1986, the first comparative genomic study at a larger scale was
published, comparing the genomes of varicella-zoster virus and Epstein-
Barr virus that contained more than 100 genes each
5. Contd..
• The first complete genome sequence of a cellular organism, that of
Haemophilus influenzae Rd, was published in 1995.
• The second genome sequencing paper was of the small parasitic
bacterium Mycoplasma genitalium published in the same year.
• Saccharomyces cerevisiae, the baker's yeast, was the first eukaryote
to have its complete genome sequence published in 1996.
• After the publication of the roundworm Caenorhabditis elegans genome
in 1998, and together with the fruit fly Drosophila melanogaster genome
in 2000, Gerald M. Rubin and his team published a paper titled
"Comparative Genomics of the Eukaryotes“.
• In which they compared the genomes of the eukaryotes D. melanogaster,
C. elegans, and S. cerevisiae, as well as the prokaryote H. influenza .
6. Related Terminologies
• Homology is the relationship of any two characters (such as two proteins that have similar
sequences) that have descended, usually through divergence, from a common ancestral
character
• Homologues Homologues can either be orthologues, paralogues
• Orthologues are homologues that have evolved from a common ancestral gene by
speciation. They usually have similar function
• Paralogues are homologues that are related or produced by duplication within a
genome. They often have evolved to perform different functions
7. Comparative Genomics Tools
Similarity search programs
• BLAST2 (Basic Local Alignment Search Tool)
• FASTA
• MUMmer (Maximal Unique Match) (Comparisons and analyses at both
Nucleic acid and protein level)
Other alignment programs
• DBA [DNA Block Aligner]
• Blastz
• BLAT/AVID, – WABA [Wobble Aware Bulk Aligner]
• DIALIGN [Diagonal ALIGNment]
• SSAHA [Sequence Search and Alignment by Hashing Algorithm]
8. Contd..
Comparative gene prediction programs
Twins can
Double scan
SGP-1
Regulatory region prediction
Consite
Visualization/ Sequence analysis programs
Dot plot (e.g. Dotter)
PIP maker (Percent Identity Plot)
Alfresco
VISTA (VISualization Tools for Alignments)
ACT (Artemis comparison tool) S S Jena
9. Comparative Genomics Tool
The UCSC Genome Browser is an on-line genome
browser hosted by the University of California, Santa
Cruz. The UCSC Genome Browser is an on-
line genome browser hosted by the University of
California, Santa Cruz
10. Synteny Regions
Synteny Regions of two genomes that show considerable similarity in
terms of sequence and conservation of the order of genes.
Genes that are in the same relative position on two different
chromosomes.
Closely related species generally have similar order of genes on
chromosomes.
Synteny can be used to identify genes in one species based on map-
position in another
11. Interactive DAGchianer Algorithm:
Tool for mining GenomeDuplication & Synteny
Finding putative genes or regions of homology between two
genomes
Identifying collinear sets of genes or regions of sequence
Generating a dot plot of the results and coloring syntenic pairs.
Comparative Genomics Tool
12.
13. Syntentic dot plot: Syntentic dot plots give biologists very
valuable information about how organisms diverged from a
common ancestor.
Biologists can easily look at one of these dot plots and see
where large sections of DNA have been deleted, inserted,
copied, or moved.
The dot plots are also very good at depicting how closely two
organisms are related through the quantity and linearity of
green dots over an entire genome.
14. Sequence Similarity Search
The most frequently performed type of sequence comparison is the
sequence similarity search
Sequence comparisons that implicate function are widely used:
To determine if newly sequenced cDNA or genomic region encodes gene
of known function.
Search for similar sequence in other species (or in same species)
15. Contd..
Search databases of DNA sequences
Use computer algorithms to align sequences
Don’t require perfect matches between sequences
Most commonly used algorithms:
BLAST
FAST-A Homology searches
16. BLAST
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between
sequences. The program compares nucleotide or protein sequences to sequence databases
and calculates the statistical significance of matches. BLAST can be used to infer functional
and evolutionary relationships between sequences as well as help identify members of gene
families.
17.
18. General Databases Useful for Comparative
Genomics
• Locus Link/Ref Seq:
http://www.ncbi.nih.gov/LocusLink/
• PEDANT-Protein Extraction Description Analysis Tool
http://pedant.gsf.de
• COGs - Cluster of Orthologous Groups (of proteins)
http://www.ncbi.nih.gov/COG/
• KEGG- Kyoto Encyclopedia of Genes and Genomes
http://www.genome.ad.jp/kegg/
• MBGD - Microbial Genome Database
http://mbgd.genome.ad.jp/
• GOLD - Genome Online Database
http://wit.integratedgenomics.com/GOLD/
• TIGR – The Institute of Genome Research
Comparative genomics of Parasites
19. Comparative genomic process
Alignment of DNA sequences is the core process in comparative
genomics.
An alignment is a mapping of the nucleotides in one sequence onto the
nucleotides in the other sequence, with gaps introduced into one or the
other sequence to increase the number of positions with matching
nucleotides.
Several powerful alignment algorithms have been developed to align two
or more sequences
20. Methods for comparative genomics
• Comparative analysis of genome structure
• Comparative analysis of coding regions (exon)
• Comparative analysis of non-coding regions (introns)
21. Comparative analysis of genome structure
Analysis of the global structure of genomes, such as nucleotide
composition, syntenic relationships, and gene ordering offer insight into the
similarities and differences between genomes.
This provide information on the organization and evolution of the
genomes, and highlight the unique features of individual genomes
The structure of different genomes can be compared at three levels:
• Overall nucleotide statistics,
• Genome structure at DNA level
• Genome structure at gene level.
22. Comparison of genome structure at DNA level
Chromosomal breakage and exchange of chromosomal fragments are
common mode of gene evolution. They can be studied by comparing
genome structures at DNA level.
• Identification of conserved Synteny and genome rearrangement events
• Analysis of breakpoints
• Analysis of content and distribution of DNA repeats
23. Comparison of genome structure at gene level
Chromosomal breakage and exchange of chromosomal fragments
cause disruption of gene order
Therefore gene order correlates with evolutionary distance between
genomes
24. Comparative analysis of coding regions
The analysis and comparison of the coding regions starts with the gene
identification algorithm that is used to infer what portions of the genomic
sequence actively code for genes.
There are four basic approaches for gene identification
25. Comparative analysis of coding regions
25
Number of algorithms that have been use in comparative genomics
to aid function prediction of genes.
Identification of gene-coding regions
comparison of gene content
comparison of protein content
Comparative genome based function prediction
26. Comparison of gene content
After the predicted gene set is generated, it is very interesting and important to
compare the content of genes across genomes
The first statistics to compare is the estimated total number of genes in a genome,
elucidate the similarities and differences between the genomes include percentage
of the genome that code for genes, distribution of coding regions across the
genome average gene length, codon usage
This is often done using a pairwise sequence comparison tool such as BLASTN or
TBLASTX
26
27. Comparison of protein content
A second level of analysis that can be performed is to compare the set of
gene products (protein) between the genomes, which has been termed
‘‘comparative proteomics”
It is important to compare the protein contents in critical pathways and
important functional categories across genomes
Two widely used resources for pathways and functional categories are the
KEGG pathway database and the Gene Ontology (GO) hierarchy
28. • Interesting statistics to compare include
• Level of sequence identity between orthologous pairs across genome
• Paralogous pairs within genome,
• Number of replicated copies in corresponding paralog families
• Functions of the paralogs
29. Comparative analysis of noncoding regions
Noncoding regions of the genome gained a lot of attention in recent years
because of its predicted role in regulation of transcription, DNA replication,
and other biological functions
30. Insights into Genome Fluxes and the Processes of Evolution
• From an evolutionary biology perspective, whole genome comparisons
provide molecular insights into the processes of evolution that include the
molecular events responsible for the variations and fluxes that occur through
a genome. These include processes like, inversions, translocations,
deletions, duplications and insertions.
30
31. The Impact of Comparative Genomics in Phylogenetic Analysis
Schematic depiction of Microsporidia's phylogenetic position based on Small Subunit RNA (SSU
rRNA) as an early branching eukaryote that evolved prior to the acquisiton of mitochondria,
and it's subsequent placement based on a composite gene phylogeny where it was placed
closer to fungi. The latter placement has been confirmed by the complete sequenceof the
micro-sporidia, Encephalitozoon cuniculi, where despite the absence of mitochondria, the
presence of several mitochondrial genes could be observed.
31
32.
33. Contd…
We have learned from homologous sequence alignment that the information that
can be gained by comparing two genomes together is largely dependent upon
the phylogenetic distance between them.
Phylogenetic distance is a measure of the degree of separation between two
organisms or their genomes on an evolutionary scale, usually expressed as the
number of accumulated sequence changes, number of years, or number of
generations.
The more distantly related two organisms are, the less sequence similarity or
shared genomic features will be detected between them.
Thus, only general insights about classes of shared genes can be gathered by
genomic comparisons at very long phylogenetic distances (e.g., over one billion
years since their separation). Over such very large distances, the order of genes
and the signatures of sequences that regulate their transcription are rarely
conserved
34. How Are Genomes Compared?
• A simple comparison of the general features of genomes such as genome
size, number of genes, and chromosome number presents an entry point
into comparative genomic analysis.
• Data for several fully-sequenced model organisms is shown in Table 1.
35. Contd…
• For example, while the tiny flowering plant Arabidopsis thaliana has a
smaller genome than that of the fruit fly Drosophila melanogaster
(157 million base pairs v. 165 million base pairs, respectively)
• It possesses nearly twice as many genes (25,000 v. 13,000).
• In fact A. thaliana has approximately the same number of genes as
humans (~25,000).
• Thus, a very early lesson learned in the "genomic era" is that genome
size does not correlate with evolutionary status, nor is the number of
genes proportionate to genome size.
36. Contd..
• Figure 1 depicts a chromosome-level comparison of the human and mouse
genomes that shows the level of Synteny between these two mammals
• Synteny is a situation in which genes are arranged in similar blocks in
different species.
• The nature and extent of conservation of Synteny differs substantially among
chromosomes.
• For example, the X chromosomes are represented as single, reciprocal
syntenic blocks.
• Human chromosome 20 corresponds entirely to a portion of mouse
chromosome 2, with nearly perfect conservation of order along almost the
entire length, disrupted only by a small central segment
• Human chromosome 17 corresponds entirely to a portion of mouse
chromosome 11.
• Other chromosomes, however, show evidence of more extensive
interchromosomal rearrangement.
• Results such as these provide an extraordinary glimpse into the
chromosomal changes that have shaped the mouse and human genomes
since their divergence from a common ancestor 75–80 million years ago.
37. Comparing Human, Chimp, and Mouse Genomes
The graphs below indicate the similarity between the human genome and those of the chimpanzee
and the mouse as they are mapped to identical locations in the human genome.
Since the chimpanzee genome is closer in evolutionary time to the human genome, the chimp
chromosomes map very closely to human chromosomes
The mouse genome is more distant in evolutionary time from human, and thus its chromosomes do
not map as closely as do the chimp chromosomes.
The white areas indicate areas of the human genome that either do not map well to the other
genome, or are areas of centromeres and telomeres where the genome sequence is unknown.
Chromosome numbering is purely arbitrary, based upon early microscopic estimates of
chromosome length.
The chimpanzee genome has 23 numbered chromosomes, the human genome has 22 numbered
chromosomes (chimp chromosomes 2a and 2b map to human chromosome 2), the mouse genome
has 19 numbered chromosomes.
The X and Y sex chromosomes have unique names, as well as other unique characteristics.
38. Mouse genome mapped
on the human genome
• This image shows the 34% of the
mouse genome that maps to identical
sequence in the human genome.
• The matching locations are jumbled,
indicating rearrangements of the two
genomes since their last common
ancestor, approximately 75 million
years before present.
• Data for this figure comes from
assemblies of the human and mouse
genomes available from the UCSC
Genome Browser in June 2006.
39. Chimpanzee genome mapped on the human
genome
• This image shows the 95% of the
chimpanzee genome that maps to identical
sequence in the human genome.
• The consistency of the color indication
demonstrates the close identity between the
two genomes since their last common
ancestor, approximately 5 million years
before present.
• The human chromosome 2 actually aligns
to two separate chimp chromosomes, now
called chr2a and chr2 and represented
here by the same color..
• Data for this figure comes from assemblies
of the human and chimpanzee genomes
available from the UCSC Genome Browser
in June 2006.
40. Benefits of comparative genomics
Identifying DNA sequences that have been "conserved“
It pinpoints genes that are essential to life and highlights genomic signals
that control gene function across many species
Comparative genomics also provides a powerful tool for studying evolution
Applications
• agriculture,
• biotechnology
• and zoology
• evolutionary tree
• Drugs discovery
41. Comparative Genomics in Drug Discovery
Comparative genomic studies throw important light on the
pathogenesis of organisms, throwing up opportunities for
therapeutic intervention as well as help in understanding and
identifying disease genes
One of the most important fallouts of comparative analyses at a
genome-wide scale is in the ability to identify and develop novel
drug targets
42. Comparative genomics in drug discovery programs. A flow chart diagram
explaining how comparative genomics can facilitate drug discovery programs for
the discovery of new antimicrobials
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
The UCSC Genome Browser is an on-line genome browser. It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations.
e Statistics main menu option allows you to calculate Nucleotide Composition, Nucleotide Pair Frequencies and Codon
The distances are often placed on phylogenetic trees, which show the deduced relationships among the organisms
no active moiety that hIf one is looking for antibacterial, antifungal, or antiprotozoal proteins to be used as targets, comparative genome analysis can reveal virulence genes, uncharacterized essential genes, species-specific genes, organism-specific genes, while ensuring that the chosen genes have no homologues in humans
as been approved by the FDA in any other application