What is aGenome?
A genome is the complete set of DNA in an organism — kind of like the
instruction manual for building and running a living thing.
• Human genome: Contains all the genetic instructions that make you you.
• Microorganism genome: Includes the genetic code of tiny living things
like bacteria, viruses, and fungi.
What is GenomeAnalysis?
Genome analysis is the process of reading and studying an organism's
DNA to understand:
1. What genes are present
2. How they work
3. How organisms are related to each other
4. What diseases they might cause (in microorganisms)
6.
Comparing Human andMicroorganism
Genomes
Human Genome
• Humans have around 20,000–25,000 genes
• DNA is stored in 23 pairs of chromosomes
• Total length: about 3 billion base pairs
• Almost 99.9% of human DNA is the same in every
person — the tiny differences make us unique!
7.
Comparing Human andMicroorganism
Genomes
Microorganism Genome
(Example: Bacteria)
• Much smaller genome
• Might have only a few thousand
genes
• DNA is circular, not stored in
chromosomes
• Can mutate and evolve very fast
8.
How to Reada DNA Sequence
DNA is made of 4 letters:
A = Adenine
T = Thymine
C = Cytosine
G = Guanine
They pair up:
A with T
C with G
Example:
Original DNA: 5’ ATCGGACT 3’
Complement: 3’ TAGCCTGA 5’
This Photo by Unknown Author is licensed under CC BY
9.
Comparing Human andMicroorganism
Genomes
Feature Humans E. coli Bacteria
Genome Size ~3 billion letters ~4.6 million letters
Chromosomes 23 pairs 1 circular chromosome
Reproduction Slow Very fast (every 20 mins!)
10.
What Scientists Lookfor in a Genome
In Humans:
• Mutations that cause diseases
like cystic fibrosis or cancer
• Ancestry (Where your family
comes from)
• Traits like eye color or height
• Response to medicine (some
drugs work better for certain
genes)
In Microorganisms:
• What species is it?
• Is it dangerous (pathogen)?
• Is it resistant to antibiotics?
• How did it evolve?
11.
Real-Life Applications
Field Useof Genome Analysis
Medicine Find genetic diseases, personalize treatment
Agriculture Make disease-resistant crops
Forensics Solve crimes using DNA evidence
Public Health Track virus outbreaks like COVID-19
Evolution See how species changed over time
STEP 1: DefineYour Goal
What do you want to find out?
Ask yourself:
• Am I looking for genetic mutations in humans?
• Do I want to identify what microorganisms are in a sample?
• Am I comparing two different genomes?
14.
STEP 2: CollectYour Sample
Get DNA from the organism
Human Samples:
• Spit (saliva)
• Blood
• Cheek swab
Microorganism Samples:
• Yogurt (for bacteria)
• Soil/water
• Feces (for gut microbiome)
• Phone or hand swab
📌 Be clean and label your sample!
15.
STEP 3: Extractthe DNA
Break open the cells and pull out the DNA
Basic steps:
• Use detergent to break membranes
• Add salt to help DNA stick together
• Add alcohol (ethanol or isopropanol) —
DNA becomes visible!
You can even do this with a strawberry at home
16.
STEP 3: Extractthe DNA
Scientists take DNA out of human cells or microorganisms.
17.
STEP 4: Choosethe Right Sequencing Method
Choose based on your sample and goal:
Goal Method Gen
Small gene Sanger 1st
Whole genome Illumina 2nd
Full bacterial genome Nanopore or PacBio 3rd
Identify bacteria types 16S rRNA + Illumina 2nd
18.
STEP 4: Choosethe Right Sequencing Method
Sequencing
• This means figuring out the exact order of DNA letters (A, T, C, G).
• Example:
Sequence gene might look like:
19.
Computers analyze theDNA to:
• Find genes
• Compare with other genomes
• Identify mutations
Step 5: Analyze the Data (Bioinformatics)
20.
How to: FindGenes, Compare
Genomes & Identify Mutations
21.
Sequencing output
Step 1stGen (Sanger) 2nd Gen (NGS) 3rd Gen (Long-read)
Data Type
Few long reads (500–1000
bp)
Millions of short reads
(~150–300 bp)
Fewer, ultra-long reads
(1,000–2,000,000 bp)
Output File .ab1 or .fasta .fastq, .bam, .vcf .fastq, .bam, .vcf
Speed Slow Fast Fast
Cost per Base High Low Lower (but high per run)
Real Tools ScientistsUse
• BLAST: Compares DNA sequences.
• CRISPR: Edits genes.
• Genome browsers: Let scientists look at DNA on computers (like
Google Maps for genes).
1st Generation –Sanger Sequencing
Type of Analysis Description
Gene identification Find and confirm single gene sequences.
Mutation validation Confirm mutations/SNPs found from other methods.
Cloning verification Check inserted DNA sequences.
Phylogenetic analysis Compare genes between species.
Barcode/Species ID (COI) For animals/plants using mitochondrial DNA.
26.
2nd Generation –NGS (Next Generation
Sequencing)
Designed for high-throughput and short reads, ideal for large studies.
Type of Analysis Description
Whole Genome Sequencing (WGS) For human, bacteria, virus genomes.
RNA-seq (Transcriptome) Measure gene expression.
Variant Calling (SNP/INDEL) Detect mutations and polymorphisms.
Microbiome/Metagenomics Identify all microbes in a sample.
Epigenomics (ChIP-seq, Bisulfite) Study DNA-protein interaction or methylation.
De novo Assembly Assemble genomes from scratch.
3rd Generation –Long-Read Sequencing
(PacBio, ONT)
Perfect for structural insights, large genomes, and complex regions.
Type of Analysis Description
Structural Variant Detection Large insertions, deletions, translocations.
Full-Length Gene Isoform (Iso-Seq) Identify complete mRNA molecules.
De novo Assembly (High Quality) Assemble large genomes with high continuity.
Epigenetics (direct methylation detection)
ONT and PacBio can detect methylation without
chemical treatment.
Metagenomics (complex samples) Resolve strain-level differences.
Feature 1st GenerationSanger Sequencing 2nd Generation Next-Gen (e.g.
Illumina)
3rd Generation Real-Time Long
Reads (e.g. Nanopore, PacBio)
Start Year 1977 ~2005 ~2010
Read Length 500–1000 bp 50–600 bp
Up to 2 million bp (often 10,000–
100,000 bp)
Throughput Very low (1 at a time) High (millions of reads)
Moderate–High (real-time, long
reads)
Speed Slow (days per sample)
Fast (hours–days for whole
genome)
Real-time (can be fast)
Cost per Base Expensive Cheap
Cheaper over time (especially for
long reads)
Accuracy Very high (99.99%) High (98–99%)
Lower raw accuracy (~90–95%), but
improved with correction
Instruments Capillary electrophoresis machines Illumina, Ion Torrent Oxford Nanopore, PacBio SMRT
Sample Size Needed Small Small Small (can do single cells!)
Best For Small genes, mutation checks,
confirmation
Whole genomes, large-scale
studies
Full genome structure, complex
regions, fieldwork
Example Uses Diagnosing a known gene mutation Cancer, ancestry, microbiome COVID-19 tracking, full microbial
genomes
32.
Key Genome SequencingOutput File Formats
Format Extension What's Inside When It's Used
FASTQ .fastq or .fq Raw reads + quality scores
After sequencing (main raw
data)
FASTA .fasta or .fa
DNA sequences only (no
quality)
Reference genome or
assembled contigs
SAM/BAM .sam (text) / .bam (binary)
Reads aligned to a reference
genome
After mapping/alignment
VCF .vcf
List of genetic variants (SNPs,
indels)
After variant calling (mutation
finding)
GFF/GTF .gff, .gtf Gene feature info (start, stop,
exon)
Genome annotation
FAST5 .fast5 Nanopore raw signal data Oxford Nanopore (3rd gen)
only
BED .bed
Regions of interest in the
genome
For viewing
intervals/coverage
• Rehman, A.,Tian, C., He, S., Li, H., Lu, S., Du, X., & Peng, Z. (2024).
Transcriptome dynamics of Gossypium purpurascens in response to
abiotic stresses by Iso-seq and RNA-seq data. Scientific Data, 11(1), 477.
• Santucci, K., Cheng, Y., Xu, S. M., & Janitz, M. (2024). Enhancing novel
isoform discovery: leveraging nanopore long-read sequencing and
machine learning approaches. Briefings in Functional Genomics, 23(6),
683-694.
• Zhang, R., Kuo, R., Coulter, M., Calixto, C. P., Entizne, J. C., Guo, W., ... &
Brown, J. W. (2022). A high-resolution single-molecule sequencing-based
Arabidopsis transcriptome using novel methods of Iso-seq
analysis. Genome biology, 23(1), 149.