1. Genomics – types and applications
Genomes – E.coli and Saccharomyces cereviseae
Dr. P. Samuel
2. History of Genomics
• History of genomics dates back to the
1970s when the scientists determined
the DNA sequence of simple
organisms.
• The greatest breakthrough in the field
of genomics occurred in the mid-
1990s when the scientists sequenced
the entire genome of Haemophilus
influenzae, a free-living organism
which, however, does not cause
influenza.
• The bacterium was thought to be the
cause of flu until 1933 when it was
proven that influenza is caused by a
virus.
• In 2001, the scientists sequenced most
of the human genome.
• Since then, genomes are being
sequenced with relative ease.
3. Contd
• By the end of 2011, scientists
sequenced genomes of over 2,700
viruses, more than 1,200 bacteria and
archaea and 36 eukaryotes about 50
percent of which are fungi.
• Scientists get a number of highly
useful information from sequenced
DNA of organisms.
• But what is most important of all,
they allow the scientists to determine
the relationships between the genes
and different sections of DNA which
in turn allows them to determine
which areas could offer benefits to
science as well as make the
knowledge useful for medical
applications.
4. What is Genomics?• Genomics is the study of whole genomes
of organisms, and incorporates elements
from genetics.
• Genomics uses a combination
of recombinant DNA, DNA sequencing
methods, and bioinformatics to
sequence, assemble, and analyse the
structure and function of genomes.
• Genomics harnesses the availability of
complete DNA sequences for entire
organisms and was made possible by
both the pioneering work of Fred
Sanger and the more recent next-
generation sequencing technology.
• Fred Sanger's group established
techniques of sequencing, genome
mapping, data storage, and
bioinformatic analyses in the 1970s and
1980s. This work paved the way for the
human genome project in the 1990s.
• Today, next-generation sequence
technologies have led to spectacular
improvements in the speed, capacity and
affordability of genome sequencing.
5. Types of Genomics
• Structural genomics: Aims to
determine the structure of every
protein encoded by the genome.
• Functional genomics: Aims to
collect and use data from sequencing
for describing gene and protein
functions.
• Comparative genomics: Aims to
compare genomic features between
different species.
• Mutation genomics: Studies the
genome in terms of mutations that
occur in a person's DNA or genome.
6. Structural genomics
• Structural genomics is a field of
genomics that involves the
characterization of genome
structures.
• This knowledge can be useful in the
practice of manipulating the genes
and DNA segments of a species.
• As an example, it is important to
understand the locus of a gene
within the genome before it is
possible to clone the gene
successfully.
• Likewise, knowledge about the
composition of the gene is useful
when attempting to understand its
function and how it can be altered
for practical purposes, such as to
ultimately improve health.
7. • Structural genomics describes the
3-dimensional structure of each
and every protein that may be
encoded by a genome – when
specifically analyzing proteins,
this is more commonly referred to
as structural proteomics.
• The study is aimed to study the
structure of the entire genome, by
utilizing both experimental and
computational techniques.
• Whilst traditional structural
prediction focuses on the structure
of a particular protein in question,
structural genomics considers a
larger scale by aiming to
determine the structure of every
constituent protein encoded by a
genome.
• Objectives of structural genomics
• It is hoped that more extensive
knowledge of the structure of
genomes, and comparing
different examples, could lead
to the deduction of principles
that govern overall genomic
structure.
• As the protein structure and
function are closely linked, the
importance of structural
genomics in understanding the
function of proteins is
paramount.
• Structural genomics can also
provide insight in dynamic
properties such as protein
folding and identify possible
targets that may be used for
drug discovery.
8. Process and techniques
• De novo methods: every open
reading frame (ORF) can be cloned
and expressed as protein in complete
genome sequences.
• The purified and crystallized proteins
can be analyzed with X-ray
crystallography or Nuclear Magnetic
Resonance.
• This allows the structure of every
protein encoded by the genome to be
determined.
• Ab initio modeling: information
about the protein sequence and amino
acid interactions is used to predict the
3D structure of proteins.
• Sequenced-based modeling:
• compares the gene sequence of the
protein with other protein sequences
of a known structure. It uses protein
homology to create a model for the
structure of the unknown protein.
• Threading: uses similarities in the
structural modeling and folding of the
unknown protein with a protein of a
known structure to model the
structure of the new protein.
• Structural genomics promotes the
ability to share all new findings about
protein structures with other members
of the scientific community
immediately.
https://www.news-medical.net/life-sciences/What-is-Structural-Genomics.aspx
9. Functional genomics
• The aim of functional genomics studies is to understand the complex
relationship between genotype and phenotype on a global (genome-wide)
scale.
10. Questions to be answered
• When and where are genes
expressed?
• How do gene expression levels
differ in various cell types and
states?
• What are the functional roles
of different genes and in what
cellular processes do they
participate?
• How are genes regulated?
Where are the active gene
promoters in a particular cell
type?
• How do genes and gene
products interact?
• How does gene expression
change in various diseases or
following a treatment?
• Functional genomic experiments typically
utilize large-scale, high-throughput assays
to measure and track many genes or
proteins in parallel under different
experimental or environmental conditions
(e.g. with samples from patients and
healthy individuals).
• This "genome-wide" approach allows the
function of different parts of the genome to
be discovered by combining information
from genes, transcripts and proteins.
11. Technologies used in functional genomic studies
• Microarrays
• Expression-profiling - used to measure the expression of
thousands of genes at once, using oligonucleotide probes
(usually ≤50 basepairs in length) designed from transcript
cDNA or exon sequences across the genome.
• Tiling microarrays - often used for mapping transcription
factor binding sites or locations of epigenetic marks (e.g.
histone modifications). They use overlapping oligonucleotide
probes (usually ≤50bp) covering several megabases of
genomic sequences.
12. Technologies used in functional genomic studies
• HTS
• RNA sequencing (RNA-Seq) - is used to sequence cDNA in order to get
information about a sample's RNA content.
• ChIP sequencing (ChIP-Seq) - uses Chromatin
ImmunoPrecipitation (ChIP) with DNA sequencing to identify protein-
binding sites on DNA.
https://www.ebi.ac.uk/arrayexpress/
13. Comparative genomics
• Comparative genomics is a field of biological research in
which the genomic features of different organisms are
compared.
• The major principle of comparative genomics is that common
features of two organisms will often be encoded within
the DNA that is evolutionarily conserved between them.
• Therefore, comparative genomic approaches start with making
some form of alignment of genome sequences and looking
for orthologous sequences (sequences that share a common
ancestry) in the aligned genomes and checking to what extent
those sequences are conserved.
14. Methods
• Computational approaches to genome comparison have
recently become a common research topic in computer
science.
• A public collection of case studies and demonstrations is
growing, ranging from whole genome comparisons to gene
expression analysis.
• This has increased the introduction of different ideas,
including concepts from systems and control, information
theory, strings analysis and data mining.
• It is anticipated that computational approaches will become
and remain a standard topic for research and teaching, while
multiple courses will begin training students to be fluent in
both topics.
15. Tools
• UCSC Browser: This site contains the reference sequence and
working draft assemblies for a large collection of genomes.
• Ensembl: The Ensembl project produces genome databases for
vertebrates and other eukaryotic species, and makes this information
freely available online.
• MapView: The Map Viewer provides a wide variety of genome
mapping and sequencing data.
• VISTA is a comprehensive suite of programs and databases for
comparative analysis of genomic sequences. It was built to visualize
the results of comparative analysis based on DNA alignments. The
presentation of comparative data generated by VISTA can easily suit
both small and large scale of data.
• BlueJay Genome Browser: a stand-alone visualization tool for the
multi-scale viewing of annotated genomes and other genomic
elements.
16. Applications of genomics
• Medical application
• Oral immunization with plants: Oral plant vaccines, which use DNA and
transgenes to create surface antigens that stimulate immunity when
consumed, show promise in the quest to immunize humans against hepatitis
B.
• Heterologous prime-boost vaccine for malaria: Two-part vaccines with
DNA from P. falciparum followed by modified Ankara virus are expected
to reduce the risk of malaria infection by up to 80%.
• Anti-malarial drugs: The chemicals fosmidomycin and FR-900098 are
being tested for their targeted effects in inhibiting DOX reductoisomerase
in the body, which is involved in the lifecycle of P. falciparum, the most
dangerous of the parasites that can cause malaria.
• Screening for thalassemias: Tests have been evolved that use the
polymerase chain reaction to observe the gene mutations that are
responsible for creating the structure of the hemoglobin molecule. Genetic
counseling as a result of the screening test has reduced rates of thalassemia
in Sardinia from 1 in 250 to 1 in 4000 live births.
17. Biotechnology applications
• There are several applications of genomic knowledge in the field of
synthetic biology and bioengineering.
• Some scientific research has demonstrated the creation of a partially
synthetic species of bacteria. For example, the genome
of Mycoplasma genitalium was used to synthesize the
bacterium Mycoplasma laboratorium, which has distinct
characteristics from the original bacteria.
18. Social science applications
• Conservationists have made use of the genomic sequencing
data to evaluate key factors that are involved in the
conversation of a species.
• For example, the genetic diversity of a population or the
heterogeneity of an individual for a hereditary condition with a
recessive inheritance pattern can be used to predict the health
and conservation of the population.
• This data can also be useful in determining the effects of
evolutionary processes and picking up genetic patterns of a
specific population, including human and animal life. Insights
into these patterns can help to devise plans to aid the species
and enable it to thrive into the future.
19. Genome of E.coli
• Physical Characteristics of
the E. coli genome:
• single chromosome/cell (haploid)
• 4.6 x 106 bp (4600 kilobases)
– about 4300 potential coding sequences
– only about 1800 known E. coli proteins
• 70% is composed of single (monocistronic)
genes
• 6% is polycistronic
• Roughly equal number of genes on each strand
• About 30% of the sequenced ORF’s (Open
Reading Frames, areas that look like they could
be the start points of transcription) have
unknown function.
20. • The “100 minute map” is a time-based map of the E. coli genome.
• Based on the assumption/observation that it takes 100 minutes to replicate
the genome, the map is a listing of at what points in time a particular gene
is copied; in this case, it is looking at clusters of genes (it is important to
note that most genes are not clustered).
• From the map, we can say “At time X, genes for traits A, B, and C have
been copied”; conversely, we can say “I need to wait X minutes for my
desired trait to copy.”
21. The E.coli chromosome is shown on
the right side; notice how tightly
wound it all is.
The DNA is supercoiled, and bound
by H-like proteins (basically, they act
like eukaryotic histones) into 50
separate supercoiled domains. This
serves to protect, condense, and
organize the DNA.
22. Genotype Vs Phenotype
Genotype Phenotype
dut-1 no functional deoxyuridine triphosphatase, cannot degrade dUTP, and so it
accumulates and becomes incorporated into DNA during replication.
ung-1 no functional uracil-n-glycosylase, can’t remove U’s once they are
incorporated into the DNA.
thi-1 thiamine auxotroph, cannot produce its own thiamine, and thus must be
supplied with it from the environment.
Camr chloramphenicol resistance
wt: tra wild type F` for transfer genes
wt: pili wild type F` for pilus genes