SlideShare a Scribd company logo
Introduction to Bioinformatics
BCH 433
Lecture 1
Dr. J. Ikwebe
What is Bioinformatics?
Molecular Bioinformatics
Molecular Bioinformatics involves the use
of computational tools to discover new
information in complex data sets (from the
one-dimensional information of DNA through
the two-dimensional information of RNA and
the three-dimensional information of proteins,
to the four-dimensional information of
evolving living systems).
Bioinformatics (Oxford English Dictionary):
The branch of science concerned with
information and information flow in biological
systems, especially the use of computational
methods in genetics and genomics.
What is bioinformatics?
• The application of computational tools on
molecular data, including the means to
acquire, analyse, or visualize such data.
• Key tools to handle and analyze the large
amount of data generated by large-scale
DNA, RNA and protein characterization
projects (genomics -transcriptomics -
proteomics).
Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop tools, softwares, algorithms
to store and analyze the data.
Bioinformaticians
Study biological questions by
analyzing molecular data
The field of science in which biology, computer science and
information technology merge into a single discipline
....
• Bioinformatics uses computers, computing technology
and software to manage large amounts of biological data
and enable their analysis.
• At the end of this course students will be expected to:
– understand biological data and data management and
integration
– have a broad knowledge of computing and biological methods in
bioinformatics
– understand genomes, genome sequencing, genomic structure
and comparison
– know about the technology used in modern post-genomic
biology, the data produced and the software to manage it.
Introduction
Large databases that can be accessed and analyzed with
sophisticated tools have become central to biological
research and education.
The information content in the genomes of organisms,
in the molecular dynamics of proteins, and in population
dynamics, to name but a few areas, is enormous.
 Biologists are increasingly finding that the management
of complex data sets is becoming a bottleneck for
scientific advances.
Therefore, bioinformatics is rapidly becoming a key
technology in all fields of biology.
The present bottlenecks in bioinformatics include;
the education of biologists in the use of advanced computing
tools,
the recruitment of computer scientists into this evolving field,
the limited availability of developed databases of biological
information,
the need for more efficient and intelligent search engines for
complex databases.
Bottlenecks
The hereditary information of all living organisms, with
the exception of some viruses, is carried by
deoxyribonucleic acid (DNA) molecules.
2 purines: 2 pyrimidines:
adenine (A) cytosine (C)
guanine (G) thymine (T)
two rings one ring
Eukaryotes may have up to 3
subcellular genomes:
1. Nuclear
2. Mitochondrial
3. Plastid
Bacteria have either circular
or linear genomes and may
also carry plasmids
The entire complement of genetic material carried by
an individual is called the genome.
Human chromosomes
Circular genome
Central dogma: DNA makes RNA makes Protein
Modified dogma: DNA makes DNA and RNA, RNA
makes DNA, RNA an Protein
Amino acids - The protein building blocks
Any region of the DNA sequence can, in principle,
code for six different amino acid sequences, because
any one of three different reading frames can be used
to interpret each of the two strands.
Protein folding
A human Haemoglobin
Some basic definitions
• Genomics---- Genome: The total genetic content contained in a
haploid set of chromosomes in eukaryotes, in a single
chromosome in bacteria, or in the DNA or RNA of viruses.
• Transcriptomics---- Transcriptome: the complete set of genes
encoded on a genome that can be transcribed.
• Proteomics---- Proteome: the complete set of proteins encoded
on a genome that can be expressed and modified by a cell,
tissue, or organism (Etymology: Protein+genome).
– Sub-cellular proteome: the complete set of proteins for a given
membrane or organelle (e.g. mitochondrial proteome).
– Membranome: the complete set of membranes from a cell.
– Metabolome: The metabolic products of the cell, that is, all the
metabolites
– Secretome: The secreted proteins of a cell?
– The phosphome:Total phosphorylated proteins of a cell?
How does it all look like on a computer monitor?
A cDNA sequence
>gi|14456711|ref|NM_000558.3| Homo sapiens hemoglobin, alpha 1 (HBA1), mRNA
ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCG
CCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACC
ACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGA
CGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACA
AGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCC
GAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCG
TTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGT
ACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC
A cDNA sequence (reading frame)
A protein sequence
>gi|14456711|ref|NM_000558.3| Homo sapiens hemoglobin, alpha 1 (HBA1), mRNA
ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCC
GCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCAC
CACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCG
ACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCAC
AAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGC
CGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACC
GTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCC
GTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC
>gi|4504347|ref|NP_000549.1| alpha 1 globin [Homo sapiens]
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH
VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGG
GGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCT
ACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGC
CGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTC
AACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCT
CCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCT
TGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCG
GCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCT
GGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGAC
CTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAAC
GCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGG
TCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGC
CTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTT
CTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGG
CGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGC
CTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAG
ACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCA
ACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCC
GGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCAC
GCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGC
TTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTG
GGCGGCGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGG
ACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGT
GCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCC
ATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTG
AGTGGGCGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAA
GGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACC
ACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCG...
And, a whole genome…
E. coli 4.6 x 106 nucleotides
– Approx. 4,000 genes
Yeast 15 x 106 nucleotides
– Approx. 6,000 genes
Human 3 x 109 nucleotides
– Approx. 30,000 genes
Smallest human chromosome 50 x 106 nucleotides
How big are whole genomes?
What do we actually do with bioinformatics?
From DNA to Genome
Watson and Crick
DNA model
Sanger sequences
insulin protein
Sanger dideoxy
DNA sequencing
PCR (Polymerase
Chain Reaction)
1955
1960
1965
1970
1975
1980
1985
ARPANET
(early Internet)
PDB (Protein
Data Bank)
Sequence
alignment
GenBank database
Dayhoff’s Atlas
1995
1990
2000
SWISS-PROT
database
NCBI
World Wide Web
BLAST
FASTA
EBI
Human Genome
Initiative
First human
genome draft
First bacterial
genome
Yeast genome
The first protein sequence reported was that of
bovine insulin in 1956, consisting of 51
residues.
Origin of bioinformatics and
biological databases:
Nearly a decade later, the first nucleic acid
sequence was reported, that of yeast
tRNAalanine with 77 bases.
In 1965, Dayhoff gathered all the available
sequence data to create the first bioinformatic
database (Atlas of Protein Sequence and
Structure).
The Protein DataBank followed in 1972 with a
collection of ten X-ray crystallographic protein
structures. The SWISSPROT protein sequence
database began in 1987.
Nucleotides
as of August 2011:
Eukaryotes 37
Prokaryotes 1708
Total 1745
Complete Genomes
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG
CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA
CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC
AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA
AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA
TAT GGA CAA TTG GTT TCT TCT CTG AAT ......
.............. TGAAAAACGTA
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG
CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA
CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC
AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA
AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA
TAT GGA CAA TTG GTT TCT TCT CTG AAT .................................
.............. TGAAAAACGTA
TF binding site
promoter
Ribosome binding Site
ORF = Open Reading Frame
CDS = Coding Sequence
Transcription
Start
Site
Biological databases
What is a Database?
A structured collection of data held in computer storage; esp. one
that incorporates software to make it accessible in a variety of ways;
transf., any large collection of information.
database management: the organization and manipulation of data in
a database.
database management system (DBMS): a software package that
provides all the functions required for database management.
database system: a database together with a database
management system.
Oxford Dictionary
What is a database?
• A collection of data
– structured
– searchable (index) -> table of contents
– updated periodically (release) -> new edition
– cross-referenced (hyperlinks) -> links with other db
• Includes also associated tools (software) necessary
for access, updating, information insertion,
information deletion….
• Data storage management: flat files, relational
databases…
Database or databank?
Initially
• Databank (in UK)
• Database (in the USA)
Solution
• The abbreviation db
Why biological databases?
• Exponential growth in biological data.
• Data (genomic sequences, 3D structures, 2D
gel analysis, MS analysis, Microarrays….) are
no longer published in a conventional
manner, but directly submitted to databases.
• Essential tools for biological research. The
only way to publish massive amounts of data
without using all the paper in the world.
Distribution of sequences
• Books, articles 1968 -> 1985
• Computer tapes 1982 -> 1992
• Floppy disks 1984 -> 1990
• CD-ROM 1989 ->
• FTP 1989 ->
• On-line services 1982 -> 1994
• WWW 1993 ->
• DVD 2001 ->
Some statistics
• More than 1000 different ‘biological’ databases
• Variable size: <100Kb to >20Gb
– DNA: > 20 Gb
– Protein: 1 Gb
– 3D structure: 5 Gb
– Other: smaller
• Update frequency: daily to annually to seldom to forget
about it.
• Usually accessible through the web (some free, some not)
International nucleotide data banks
EMBL
Europe
EMBL
EBI
GenBank
USA
NLM
NCBI
DDBJ
Japan
NIG
CIB
International
Advisory Meeting
Collaborative Meeting
TrEMBL NRDB
Databases
• NCBI (National Centre for Biotechnology Information):
http://www.ncbi.nlm.nih.gov/
• EBI: http://www.ebi.ac.uk/
• DDBJ: http://www.ddbj.nig.ac.jp/
• InterPro: http://www.ebi.ac.uk/interpro/
• InterPro is a database of protein families, domains and functional sites in
which identifiable features found in known proteins can be applied to
unknown protein sequences
• b) Search and analytical tools
• ORFFinder: http://www.ncbi.nlm.nih.gov/gorf/gorf.html
• It is an analysis tool which finds all open reading frames in a user's
sequence or in a sequence already in the database.
• InterProScan server: http://www.ebi.ac.uk/InterProScan/
• InterProScan is used to search various protein domain/motifs/functional
sites databases and can combine other analyses such as the identification
of potential transmembrane domains and signal peptides.
……
• PSORT: http://www.psort.org/
• This cite provides links to the PSORT family of programs for
subcellular localization prediction as well as other datasets
and resources relevant to localization prediction.
• SignalP v3.0 Server:
http://www.cbs.dtu.dk/services/SignalP/
• SignalP aims at identifying signal peptides in eukaryotes
and bacteria query proteins.
• TMHMM v2.0 server:
http://www.cbs.dtu.dk/services/TMHMM/
• TMHMM aims at identifying trans-membrane domains in
proteins (eukaryotic or prokaryotic).
 Some databases in the field of molecular biology…
AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb,
ARR, AsDb, BBDB, BCGD, Beanref, Biolmage,
BioMagResBank, BIOMDB, BLOCKS, BovGBASE,
BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,
CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,
ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,
CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb,
Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,
ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,
ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,
GCRDB, GDB, GENATLAS, Genbank, GeneCards,
Genline, GenLink, GENOTK, GenProtEC, GIFTS,
GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,
HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,
HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,
HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,
KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,
Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5
Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,
MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase,
OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB,
PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD,
PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,
PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE,
SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase,
SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,
SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-
MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB,
TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE,
VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD,
YPM, etc .................. !!!!
Categories of databases for Life
Sciences
• Sequences (DNA, protein)
• Genomics
• Mutation/polymorphism
• Protein domain/family
• Proteomics (2D gel, Mass Spectrometry)
• 3D structure
• Metabolic networks
• Regulatory networks
• Bibliography
• Expression (Microarrays,…)
• Specialized
Bookshelf: A collection of searchable biomedical books linked to
PubMed.
PubMed: Allows searching by author names, journal titles, and a
new Preview/Index option. PubMed database provides access to
over 12 million MEDLINE citations back to the mid-1960's. It
includes History and Clipboard options which may enhance your
search session.
PubMed Central: The U.S. National Library of Medicine digital
archive of life science journal literature.
OMIM: Online Mendelian Inheritance in Man is a database of
human genes and genetic disorders (also OMIA).
Literature Databases:
.....
• BLAST is…
Basic Local Alignment Search Tool
• NCBI's sequence similarity search tool
• supports analysis of DNA and protein
databases
• 80,000 searches per day
Why use BLAST?
• BLAST searching is fundamental to understanding
the relatedness of any favourite query sequence to
other known proteins or DNA sequences.
• Applications include:
– identifying orthologs and paralogs
– discovering new genes or proteins
– discovering variants of genes or proteins
– investigating expressed sequence tags (ESTs)
– exploring protein structure and function
....
• TaxBrowser is…
• browser for the major divisions of living
organisms (archaea, bacteria, eukaryota,
viruses).
• taxonomy information such as genetic
codes.
• molecular data on extinct organisms.
What is an accession number?
• An accession number is a label that is used to identify a
sequence. It is a unique string of letters and/or numbers
that corresponds to a given molecular sequence.
• Examples:
 DNA
AF492453 GenBank genomic sequence (same at EBI)
 Protein
AAM97590 GenBank protein
Q8MV55 SwissProt protein
Non Protein Data Bank structure record
 Publication
12192407 PubMed ID - Williams et al. Nature 418: 865-9 (2002).
PubMed (Medline)
• MEDLINE covers the fields of medicine, nursing, dentistry,
veterinary medicine, public health, and preclinical sciences
• Contains citations from approximately 5,200 worldwide journals in
37 languages; 60 languages for older journals.
• Contains over 20 million citations since 1948
• Contains links to biological db and to some journals
• New records are
added to
PreMEDLINE daily!
Type in a Query term
• Enter your search words in the
query box and hit the “Go” button
http://www.ncbi.nlm.nih.gov/entrez/query/static/help/helpdoc.html#Searching
The Syntax …
1. Boolean operators: AND, OR, NOT must be entered in
UPPERCASE (e.g., promoters OR response elements). The default
is AND.
2. Entrez processes all Boolean operators in a left-to-right sequence.
The order in which Entrez processes a search statement can be
changed by enclosing individual concepts in parentheses. The terms
inside the parentheses are processed first. For example, the search
statement: g1p3 OR (response AND element AND promoter).
3. Quotation marks: The term inside the quotation marks is read as one
phrase (e.g. “public health” is different than public health, which will
also include articles on public latrines and their effect on health
workers).
4. Asterisk: Extends the search to all terms that start with the letters
before the asterisk. For example, dia* will include such terms as
diaphragm, dial, and diameter.
Refine the Query
• Often a search finds too many (or too few) sequences, so you
can go back and try again with more (or fewer) keywords in
your query
• The “History” feature allows you to combine any of your past
queries.
• The “Limits” feature allows you to limit a query to specific
organisms, sequences submitted during a specific period of
time, etc.
• [Many other features are designed to search for literature in
MEDLINE]
The OMIM (Online Mendelian
Inheritance in Man)
– Genes and genetic disorders
– Edited by team at Johns Hopkins
– Updated daily
MIM Number Prefixes
* gene with known sequence
+ gene with known sequence and
phenotype
# phenotype description, molecular
basis known
% mendelian phenotype or locus,
molecular basis unknown
no prefix other, mainly phenotypes with
suspected mendelian basis
Searching OMIM
• Search Fields
– Name of trait, e.g., hypertension
– Cytogenetic location, e.g., 1p31.6
– Inheritance, e.g., autosomal dominant
– Gene, e.g., coagulation factor VIII
OMIM search tags
All Fields [ALL]
Allelic Variant [AV] or [VAR]
Chromosome [CH] or [CHR]
Clinical Synopsis [CS] or [CLIN]
Gene Map [GM] or [MAP]
Gene Name [GN] or [GENE]
Reference [RE] or [REF]
Online Literature databases
1. Google Scholar
2. Google Books
3. Web of Science
4. Google Scholar
http://www.scholar.google.com/
Enables you to search specifically for scholarly
literature, including peer-reviewed papers,
theses, books, preprints, abstracts and technical
reports from all broad areas of research.
What is Google Scholar?
Use Google Scholar to find articles from a
wide variety of academic publishers,
professional societies, preprint repositories
and universities, as well as scholarly articles
available across the web.
Google Scholar
orders your
search results by
how relevant they
are to your query,
so the most
useful references
should appear at
the top of the
page
This relevance
ranking takes into
account the: full
text of each article.
the article's author,
the publication in
which the article
appeared and how
often it has been
cited in scholarly
literature.
What other DATA can we retrieve from the record?
5. Google Book Search
6. Web of science
http://http://apps.webofknowledge.com.ezproxy.lib.uh.edu/WOS_GeneralSearch_input.do?product
=WOS&search_mode=GeneralSearch&SID=4FB7LbbLgDMhG9fDiLh&preferencesSaved=
Areas in
Bioinformatics…
Genomics
• Because of the multicellular structure, each cell type
does gene expression in a different way –although each
cell has the same content as far as the genetic
constitution.
• i.e. All the information for a liver cell to be a liver cell is
also present on nose cell, so gene expression is the only
thing that differentiates
Genomics - Finding Genes
• Gene in sequence data – needle in a haystack
• However as the needle is different from the
haystack genes are not diff from the rest of the
sequence data
• Is whole array of nt we try to find and border
mark a set o nt as a gene
• This is one of the challenges of bioinformatics
• Neural networks and dynamic programming are
being employed
Organism Genome
Size (Mb)
bp * 1,000,000
Gene
Number
Web Site
Yeast 13.5 6,241 http://genome-
www.stanford.edu
/Saccharomyces
Fruit Flies 180 13,601 http://flybase.bio.
indiana.edu
Homo
Sapiens
3,000 45,000 http://www.ncbi.n
lm.nih.gov/genom
e/guide
Proteomics
• Proteome is the sum total of an organisms
proteins
• More difficult than genomics
– 4 20
– Simple chemical makeup complex
– Can duplicate can’t
• We are entering into the ‘post genome era’
• Meaning much has been done with the Genes –
not that it’s a over
Proteomics…..
• The relationship between the RNA and the
protein it codes are usually very different
• After translation proteins do change
– So aa sequence do not tell anything about the
post translation changes
• Proteins are not active until they are combined
into a larger complex or moved to a relevant
location inside or outside the cell
• So aa only hint in these things
• Also proteins must be handled more carefully in
labs as they tend to change when in touch with
an inappropriate material
Protein Structure Prediction
• Is one of the biggest challenges of
bioinformatics and esp. biochemistry
• No algorithm is there now to consistently
predict the structure of proteins
Structure Prediction methods
• Comparative Modeling
– Target proteins structure is compared with
related proteins
– Proteins with similar sequences are searched
for structures
Phylogenetics
• The taxonomical system reflects
evolutionary relationships
• Phylogenetics trees are things which
reflect the evolutionary relationship thru a
picture/graph
• Rooted trees where there is only one
ancestor
• Un rooted trees just showing the
relationship
• Phylogenetic tree reconstruction
algorithms are also an area of research
Applications….
Medical Implications
• Pharmacogenomics
– Not all drugs work on all patients, some good drugs
cause death in some patients
– So by doing a gene analysis before the treatment the
offensive drugs can be avoided
– Also drugs which cause death to most can be used
on a minority to whose genes that drug is well suited
– volunteers wanted!
– Customized treatment
• Gene Therapy
– Replace or supply the defective or missing gene
– E.g: Insulin and Factor VIII or Haemophilia
• BioWeapons (??)
Diagnosis of Disease
• Diagnosis of disease
– Identification of genes which cause the
disease will help detect disease at early stage
e.g. Huntington disease -
• Symptoms – uncontrollable dance like
movements, mental disturbance, personality
changes and intellectual impairment
• Death in 10-15 years
• The gene responsible for the disease has been
identified
• Contains excessively repeated sections of CAG
• So once analyzed the couple can be counseled
Drug Design
• Can go up to 15 yrs and $700 million
• One of the goals of bioinformatics is to
reduce the time and cost involved with it.
• The process
– Discovery
• Computational methods can improves this
– Testing
Discovery
Target identification
– Identifying the molecule on which the
germs relies for its survival
– Then we develop another molecule i.e.
drug which will bind to the target
– So the germ will not be able to interact
with the target.
– Proteins are the most common targets
Discovery…
• For example HIV produces HIV protease
which is a protein and which in turn eat
other proteins
• This HIV protease has an active site
where it binds to other molecules
• So HIV drug will go and bind with that
active site
– Easily said than done!
Discovery…
• Lead compounds are the molecules that
go and bind to the target protein’s active
site
• Traditionally this has been a trial and error
method
• Now this is being moved into the realm of
computers
Restriction Analysis of DNA
• Special enzymes termed restriction enzymes have been discovered in
many different bacteria and other single-celled organisms. These
enzymes act as chemical scissors to cut λ DNA into pieces.
• They are able to scan along a length of DNA looking for a particular
sequence of bases that they recognize.
• This recognition site or sequence is generally from 4 to 6 base pairs in
length. Once it is located, the enzyme will attach to the DNA molecule
and cut each strand of the double helix- the first step in a process called
restriction mapping.
• The restriction enzyme will continue to do this along the full length of the
DNA molecule which will then break into fragments. The size of these
fragments is measured in base pairs or kilobase (1000 bases) pairs.
• Since the recognition site or sequence of base pairs is known for each
restriction enzyme, we can use this to form a detailed analysis of the
sequence of bases in specific regions of the DNA in which we are
interested.
• This procedure is one of the most important in modern biology.
.... Restriction analysis
• In the presence of specific DNA repair enzymes, DNA
fragments will re-anneal or stick themselves to other fragments
with cut ends that are complimentary to their own end
sequence.
• It doesn’t matter if the fragment that matches the cut end
comes from the same organism or from a different one.
• This ability of DNA to repair itself has been utilized by scientists
to introduce foreign DNA into an organism.
• This DNA may contain genes that allow the organism to exhibit
a new function or process. This would include transferring
genes that will result in a change in the nutritional quality of a
crop or perhaps allow a plant to grow in a region that is colder
than its usual preferred area.
Example: Restriction Digestion and
Analysis of DNA from Bacteriophage λ
• This small virus is 48,502 base pairs in length which is very
small compared with the human genome of approximately 3
billion base pairs.
• Since the whole sequence of λ is already known we can predict
where each restriction enzyme will cut and thus the expected
size of the fragments that will be produced.
• If the virus DNA is exposed to the restriction enzyme for only a
short time, then not every restriction site will be cut by the
enzyme.
• This will result in fragments ranging in size from the smallest
possible (all sites are cut) to in-between lengths (some of the
sites are cut) to the longest (no sites are cut). This is termed a
partial restriction digestion.
.....
• After overnight digestion, the reaction is
stopped by addition of a loading buffer.
• The DNA fragments are separated by
electrophoresis, a process that involves
application of an electric field to cause the
DNA fragments to migrate into an agarose
gel.
• The gel is then stained with a methylene
blue stain to visualize the DNA bands and
may be photographed.
.....
• The movement of the fragments during electrophoresis
will always be towards the positive electrode because
DNA is a negatively charged molecule.
• The fragments move through the gel at a rate that is
determined by their size and shape, with the smallest
moving the fastest.
• DNA cannot be seen as it moves through the gel. That is
why a loading dye must be added to each of the samples
before it is pipetted into the wells.
• The progress of the dye can be seen in the gel. It will
initially appear as a blue band, eventually resolving into
two bands of different colours.
......
• Restriction enzymes cut at specific sites along the DNA. These sites
are determined by the sequence of bases which usually form
palindromes.
• Palindromes are groups of letters that read the same in both the
forward and backwards orientation.
• In the case of DNA the letters are found on both the forward and the
reverse strands of the DNA.
• For example, the 5’ to 3’ strand may have the sequence GAATTC.
The complimentary bases on the opposite strand will be CTTAAG,
which is the same as reading the first strand backwards!
• Many enzymes recognize these types of sequences and will attach to
the DNA at this site and then cut the strand between two of the
bases. In this example, the DNA was digested with BamHI,
EcoRI and HindIII restriction enzymes, and their sequences are
as follows, with the cut site indicated by the arrow.
λ cut with EcoRI λ cut with HindIII λ cut with BamHI
Restriction map
Assignment: Using the graph in
next slide, address the following
• Calculate the size the resulting fragments will be after
digestion and write them on the map.
• How many fragments would you expect to see for each of the
maps A, B and C?
• Draw these fragments onto the graph in the next slide.
• Now compare the size of the fragments that you have
calculated with the bands shown in the photographs of the
gels and determine which of the enzymes, BamHI, EcoRI and
HindIII were used to cut A, B and C.
• How many times does the sequence GAATTC occur in the λ
DNA sequence? What about AAGCTT and GGATCC?
Lecture 1 Introduction to Bioinformatics BCH 433.ppt

More Related Content

Similar to Lecture 1 Introduction to Bioinformatics BCH 433.ppt

Molecular biology lecture
Molecular biology lectureMolecular biology lecture
Molecular biology lecture
Dr. GURPREET SINGH
 
Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Genes, Genomics and Proteomics
Genes, Genomics and Proteomics
Garry D. Lasaga
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Vidya Kalaivani Rajkumar
 
Basic of bioinformatics
Basic of bioinformaticsBasic of bioinformatics
Basic of bioinformatics
Jayati Shrivastava
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
Rainu Rajeev
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
617....sjuwbwjisjnslosoanwbwbdhidje.pptx
617....sjuwbwjisjnslosoanwbwbdhidje.pptx617....sjuwbwjisjnslosoanwbwbdhidje.pptx
617....sjuwbwjisjnslosoanwbwbdhidje.pptx
AroojSheikh12
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
VinaKhan1
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahu
KAUSHAL SAHU
 
2013 10 23_dna_for_dummies_v_presented
2013 10 23_dna_for_dummies_v_presented2013 10 23_dna_for_dummies_v_presented
2013 10 23_dna_for_dummies_v_presented
Prof. Wim Van Criekinge
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
Ankit Bhardwaj
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
JesminBinti
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
Prof. Wim Van Criekinge
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptx
rnath286
 
Genomics and Bioinformatics
Genomics and BioinformaticsGenomics and Bioinformatics
Genomics and Bioinformatics
Amit Garg
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
kigaruantony
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Owali Shawon
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
Vinitha Nair
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
Vinitha Nair
 

Similar to Lecture 1 Introduction to Bioinformatics BCH 433.ppt (20)

Molecular biology lecture
Molecular biology lectureMolecular biology lecture
Molecular biology lecture
 
Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Genes, Genomics and Proteomics
Genes, Genomics and Proteomics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Basic of bioinformatics
Basic of bioinformaticsBasic of bioinformatics
Basic of bioinformatics
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
617....sjuwbwjisjnslosoanwbwbdhidje.pptx
617....sjuwbwjisjnslosoanwbwbdhidje.pptx617....sjuwbwjisjnslosoanwbwbdhidje.pptx
617....sjuwbwjisjnslosoanwbwbdhidje.pptx
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahu
 
2013 10 23_dna_for_dummies_v_presented
2013 10 23_dna_for_dummies_v_presented2013 10 23_dna_for_dummies_v_presented
2013 10 23_dna_for_dummies_v_presented
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptx
 
Genomics and Bioinformatics
Genomics and BioinformaticsGenomics and Bioinformatics
Genomics and Bioinformatics
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Biotechnology
BiotechnologyBiotechnology
Biotechnology
 

More from KelechiChukwuemeka

A seminar by Mrs Vivian for NAAS 2024.pptx
A seminar by Mrs Vivian for NAAS 2024.pptxA seminar by Mrs Vivian for NAAS 2024.pptx
A seminar by Mrs Vivian for NAAS 2024.pptx
KelechiChukwuemeka
 
THE EFFECT OF DRUG ABUSE ON THE YOUTH__CSC 121__Chukwuemeka Kelechi__200l__BC...
THE EFFECT OF DRUG ABUSE ON THE YOUTH__CSC 121__Chukwuemeka Kelechi__200l__BC...THE EFFECT OF DRUG ABUSE ON THE YOUTH__CSC 121__Chukwuemeka Kelechi__200l__BC...
THE EFFECT OF DRUG ABUSE ON THE YOUTH__CSC 121__Chukwuemeka Kelechi__200l__BC...
KelechiChukwuemeka
 
THE EFFECTS OF DRUG ABUSE ON YOUTHS__ama.pptx
THE EFFECTS OF DRUG ABUSE ON YOUTHS__ama.pptxTHE EFFECTS OF DRUG ABUSE ON YOUTHS__ama.pptx
THE EFFECTS OF DRUG ABUSE ON YOUTHS__ama.pptx
KelechiChukwuemeka
 
Sharing my loaf of sliced bread clifford church real.pptx
Sharing my loaf of sliced bread clifford church real.pptxSharing my loaf of sliced bread clifford church real.pptx
Sharing my loaf of sliced bread clifford church real.pptx
KelechiChukwuemeka
 
Pastor Dr. E.K.K.Uguru.pptx
Pastor Dr. E.K.K.Uguru.pptxPastor Dr. E.K.K.Uguru.pptx
Pastor Dr. E.K.K.Uguru.pptx
KelechiChukwuemeka
 
Thermodynamics Table.docx
Thermodynamics Table.docxThermodynamics Table.docx
Thermodynamics Table.docx
KelechiChukwuemeka
 
GST_102[2].pptx
GST_102[2].pptxGST_102[2].pptx
GST_102[2].pptx
KelechiChukwuemeka
 
A-LECTURE-PRESENTED-AT-THE-YOUNG.pptx
A-LECTURE-PRESENTED-AT-THE-YOUNG.pptxA-LECTURE-PRESENTED-AT-THE-YOUNG.pptx
A-LECTURE-PRESENTED-AT-THE-YOUNG.pptx
KelechiChukwuemeka
 

More from KelechiChukwuemeka (8)

A seminar by Mrs Vivian for NAAS 2024.pptx
A seminar by Mrs Vivian for NAAS 2024.pptxA seminar by Mrs Vivian for NAAS 2024.pptx
A seminar by Mrs Vivian for NAAS 2024.pptx
 
THE EFFECT OF DRUG ABUSE ON THE YOUTH__CSC 121__Chukwuemeka Kelechi__200l__BC...
THE EFFECT OF DRUG ABUSE ON THE YOUTH__CSC 121__Chukwuemeka Kelechi__200l__BC...THE EFFECT OF DRUG ABUSE ON THE YOUTH__CSC 121__Chukwuemeka Kelechi__200l__BC...
THE EFFECT OF DRUG ABUSE ON THE YOUTH__CSC 121__Chukwuemeka Kelechi__200l__BC...
 
THE EFFECTS OF DRUG ABUSE ON YOUTHS__ama.pptx
THE EFFECTS OF DRUG ABUSE ON YOUTHS__ama.pptxTHE EFFECTS OF DRUG ABUSE ON YOUTHS__ama.pptx
THE EFFECTS OF DRUG ABUSE ON YOUTHS__ama.pptx
 
Sharing my loaf of sliced bread clifford church real.pptx
Sharing my loaf of sliced bread clifford church real.pptxSharing my loaf of sliced bread clifford church real.pptx
Sharing my loaf of sliced bread clifford church real.pptx
 
Pastor Dr. E.K.K.Uguru.pptx
Pastor Dr. E.K.K.Uguru.pptxPastor Dr. E.K.K.Uguru.pptx
Pastor Dr. E.K.K.Uguru.pptx
 
Thermodynamics Table.docx
Thermodynamics Table.docxThermodynamics Table.docx
Thermodynamics Table.docx
 
GST_102[2].pptx
GST_102[2].pptxGST_102[2].pptx
GST_102[2].pptx
 
A-LECTURE-PRESENTED-AT-THE-YOUNG.pptx
A-LECTURE-PRESENTED-AT-THE-YOUNG.pptxA-LECTURE-PRESENTED-AT-THE-YOUNG.pptx
A-LECTURE-PRESENTED-AT-THE-YOUNG.pptx
 

Recently uploaded

Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptxMaxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdfARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
Anujkumaranit
 
How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
LanceCatedral
 
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Oleg Kshivets
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
MedicoseAcademics
 
KDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologistsKDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologists
د.محمود نجيب
 
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #GirlsFor Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
Savita Shen $i11
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Prof. Marcus Renato de Carvalho
 
Triangles of Neck and Clinical Correlation by Dr. RIG.pptx
Triangles of Neck and Clinical Correlation by Dr. RIG.pptxTriangles of Neck and Clinical Correlation by Dr. RIG.pptx
Triangles of Neck and Clinical Correlation by Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Dr Jeenal Mistry
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
ACUTE SCROTUM.....pdf. ACUTE SCROTAL CONDITIOND
ACUTE SCROTUM.....pdf. ACUTE SCROTAL CONDITIONDACUTE SCROTUM.....pdf. ACUTE SCROTAL CONDITIOND
ACUTE SCROTUM.....pdf. ACUTE SCROTAL CONDITIOND
DR SETH JOTHAM
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
Sujoy Dasgupta
 
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 UpakalpaniyaadhyayaCharaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Dr KHALID B.M
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
Shweta
 
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
bkling
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Saeid Safari
 
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model SafeSurat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Savita Shen $i11
 
The Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of IIThe Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of II
MedicoseAcademics
 
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness JourneyTom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
greendigital
 

Recently uploaded (20)

Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptxMaxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
 
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdfARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
 
How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
 
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
 
KDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologistsKDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologists
 
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #GirlsFor Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
For Better Surat #ℂall #Girl Service ❤85270-49040❤ Surat #ℂall #Girls
 
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidadeNovas diretrizes da OMS para os cuidados perinatais de mais qualidade
Novas diretrizes da OMS para os cuidados perinatais de mais qualidade
 
Triangles of Neck and Clinical Correlation by Dr. RIG.pptx
Triangles of Neck and Clinical Correlation by Dr. RIG.pptxTriangles of Neck and Clinical Correlation by Dr. RIG.pptx
Triangles of Neck and Clinical Correlation by Dr. RIG.pptx
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
 
ACUTE SCROTUM.....pdf. ACUTE SCROTAL CONDITIOND
ACUTE SCROTUM.....pdf. ACUTE SCROTAL CONDITIONDACUTE SCROTUM.....pdf. ACUTE SCROTAL CONDITIOND
ACUTE SCROTUM.....pdf. ACUTE SCROTAL CONDITIOND
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
 
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 UpakalpaniyaadhyayaCharaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
 
Evaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animalsEvaluation of antidepressant activity of clitoris ternatea in animals
Evaluation of antidepressant activity of clitoris ternatea in animals
 
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
 
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model SafeSurat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
 
The Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of IIThe Normal Electrocardiogram - Part I of II
The Normal Electrocardiogram - Part I of II
 
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness JourneyTom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journey
 

Lecture 1 Introduction to Bioinformatics BCH 433.ppt

  • 1. Introduction to Bioinformatics BCH 433 Lecture 1 Dr. J. Ikwebe
  • 3. Molecular Bioinformatics Molecular Bioinformatics involves the use of computational tools to discover new information in complex data sets (from the one-dimensional information of DNA through the two-dimensional information of RNA and the three-dimensional information of proteins, to the four-dimensional information of evolving living systems).
  • 4. Bioinformatics (Oxford English Dictionary): The branch of science concerned with information and information flow in biological systems, especially the use of computational methods in genetics and genomics.
  • 5. What is bioinformatics? • The application of computational tools on molecular data, including the means to acquire, analyse, or visualize such data. • Key tools to handle and analyze the large amount of data generated by large-scale DNA, RNA and protein characterization projects (genomics -transcriptomics - proteomics).
  • 6. Biologists collect molecular data: DNA & Protein sequences, gene expression, etc. Computer scientists (+Mathematicians, Statisticians, etc.) Develop tools, softwares, algorithms to store and analyze the data. Bioinformaticians Study biological questions by analyzing molecular data The field of science in which biology, computer science and information technology merge into a single discipline
  • 7. .... • Bioinformatics uses computers, computing technology and software to manage large amounts of biological data and enable their analysis. • At the end of this course students will be expected to: – understand biological data and data management and integration – have a broad knowledge of computing and biological methods in bioinformatics – understand genomes, genome sequencing, genomic structure and comparison – know about the technology used in modern post-genomic biology, the data produced and the software to manage it.
  • 8. Introduction Large databases that can be accessed and analyzed with sophisticated tools have become central to biological research and education. The information content in the genomes of organisms, in the molecular dynamics of proteins, and in population dynamics, to name but a few areas, is enormous.  Biologists are increasingly finding that the management of complex data sets is becoming a bottleneck for scientific advances. Therefore, bioinformatics is rapidly becoming a key technology in all fields of biology.
  • 9. The present bottlenecks in bioinformatics include; the education of biologists in the use of advanced computing tools, the recruitment of computer scientists into this evolving field, the limited availability of developed databases of biological information, the need for more efficient and intelligent search engines for complex databases. Bottlenecks
  • 10. The hereditary information of all living organisms, with the exception of some viruses, is carried by deoxyribonucleic acid (DNA) molecules. 2 purines: 2 pyrimidines: adenine (A) cytosine (C) guanine (G) thymine (T) two rings one ring
  • 11. Eukaryotes may have up to 3 subcellular genomes: 1. Nuclear 2. Mitochondrial 3. Plastid Bacteria have either circular or linear genomes and may also carry plasmids The entire complement of genetic material carried by an individual is called the genome. Human chromosomes Circular genome
  • 12. Central dogma: DNA makes RNA makes Protein Modified dogma: DNA makes DNA and RNA, RNA makes DNA, RNA an Protein
  • 13. Amino acids - The protein building blocks
  • 14.
  • 15. Any region of the DNA sequence can, in principle, code for six different amino acid sequences, because any one of three different reading frames can be used to interpret each of the two strands.
  • 16. Protein folding A human Haemoglobin
  • 17. Some basic definitions • Genomics---- Genome: The total genetic content contained in a haploid set of chromosomes in eukaryotes, in a single chromosome in bacteria, or in the DNA or RNA of viruses. • Transcriptomics---- Transcriptome: the complete set of genes encoded on a genome that can be transcribed. • Proteomics---- Proteome: the complete set of proteins encoded on a genome that can be expressed and modified by a cell, tissue, or organism (Etymology: Protein+genome). – Sub-cellular proteome: the complete set of proteins for a given membrane or organelle (e.g. mitochondrial proteome). – Membranome: the complete set of membranes from a cell. – Metabolome: The metabolic products of the cell, that is, all the metabolites – Secretome: The secreted proteins of a cell? – The phosphome:Total phosphorylated proteins of a cell?
  • 18. How does it all look like on a computer monitor?
  • 19. A cDNA sequence >gi|14456711|ref|NM_000558.3| Homo sapiens hemoglobin, alpha 1 (HBA1), mRNA ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCG CCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACC ACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGA CGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACA AGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCC GAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCG TTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGT ACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC
  • 20. A cDNA sequence (reading frame) A protein sequence >gi|14456711|ref|NM_000558.3| Homo sapiens hemoglobin, alpha 1 (HBA1), mRNA ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCC GCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCAC CACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCG ACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCAC AAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGC CGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACC GTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCC GTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC >gi|4504347|ref|NP_000549.1| alpha 1 globin [Homo sapiens] MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
  • 21. ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGG GGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCT ACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGC CGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTC AACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCT CCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCT TGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCG GCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCT GGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGAC CTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAAC GCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGG TCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGC CTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTT CTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGG CGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGC CTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAG ACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCA ACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCC GGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCAC GCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGC TTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTG GGCGGCGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGG ACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGT GCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCC ATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTG AGTGGGCGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAA GGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACC ACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCG... And, a whole genome…
  • 22. E. coli 4.6 x 106 nucleotides – Approx. 4,000 genes Yeast 15 x 106 nucleotides – Approx. 6,000 genes Human 3 x 109 nucleotides – Approx. 30,000 genes Smallest human chromosome 50 x 106 nucleotides How big are whole genomes?
  • 23. What do we actually do with bioinformatics?
  • 24. From DNA to Genome Watson and Crick DNA model Sanger sequences insulin protein Sanger dideoxy DNA sequencing PCR (Polymerase Chain Reaction) 1955 1960 1965 1970 1975 1980 1985 ARPANET (early Internet) PDB (Protein Data Bank) Sequence alignment GenBank database Dayhoff’s Atlas
  • 25. 1995 1990 2000 SWISS-PROT database NCBI World Wide Web BLAST FASTA EBI Human Genome Initiative First human genome draft First bacterial genome Yeast genome
  • 26. The first protein sequence reported was that of bovine insulin in 1956, consisting of 51 residues. Origin of bioinformatics and biological databases: Nearly a decade later, the first nucleic acid sequence was reported, that of yeast tRNAalanine with 77 bases.
  • 27. In 1965, Dayhoff gathered all the available sequence data to create the first bioinformatic database (Atlas of Protein Sequence and Structure). The Protein DataBank followed in 1972 with a collection of ten X-ray crystallographic protein structures. The SWISSPROT protein sequence database began in 1987.
  • 29. as of August 2011: Eukaryotes 37 Prokaryotes 1708 Total 1745 Complete Genomes
  • 31. CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT ................................. .............. TGAAAAACGTA TF binding site promoter Ribosome binding Site ORF = Open Reading Frame CDS = Coding Sequence Transcription Start Site
  • 33. What is a Database? A structured collection of data held in computer storage; esp. one that incorporates software to make it accessible in a variety of ways; transf., any large collection of information. database management: the organization and manipulation of data in a database. database management system (DBMS): a software package that provides all the functions required for database management. database system: a database together with a database management system. Oxford Dictionary
  • 34. What is a database? • A collection of data – structured – searchable (index) -> table of contents – updated periodically (release) -> new edition – cross-referenced (hyperlinks) -> links with other db • Includes also associated tools (software) necessary for access, updating, information insertion, information deletion…. • Data storage management: flat files, relational databases…
  • 35. Database or databank? Initially • Databank (in UK) • Database (in the USA) Solution • The abbreviation db
  • 36. Why biological databases? • Exponential growth in biological data. • Data (genomic sequences, 3D structures, 2D gel analysis, MS analysis, Microarrays….) are no longer published in a conventional manner, but directly submitted to databases. • Essential tools for biological research. The only way to publish massive amounts of data without using all the paper in the world.
  • 37. Distribution of sequences • Books, articles 1968 -> 1985 • Computer tapes 1982 -> 1992 • Floppy disks 1984 -> 1990 • CD-ROM 1989 -> • FTP 1989 -> • On-line services 1982 -> 1994 • WWW 1993 -> • DVD 2001 ->
  • 38. Some statistics • More than 1000 different ‘biological’ databases • Variable size: <100Kb to >20Gb – DNA: > 20 Gb – Protein: 1 Gb – 3D structure: 5 Gb – Other: smaller • Update frequency: daily to annually to seldom to forget about it. • Usually accessible through the web (some free, some not)
  • 39. International nucleotide data banks EMBL Europe EMBL EBI GenBank USA NLM NCBI DDBJ Japan NIG CIB International Advisory Meeting Collaborative Meeting TrEMBL NRDB
  • 40. Databases • NCBI (National Centre for Biotechnology Information): http://www.ncbi.nlm.nih.gov/ • EBI: http://www.ebi.ac.uk/ • DDBJ: http://www.ddbj.nig.ac.jp/ • InterPro: http://www.ebi.ac.uk/interpro/ • InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences • b) Search and analytical tools • ORFFinder: http://www.ncbi.nlm.nih.gov/gorf/gorf.html • It is an analysis tool which finds all open reading frames in a user's sequence or in a sequence already in the database. • InterProScan server: http://www.ebi.ac.uk/InterProScan/ • InterProScan is used to search various protein domain/motifs/functional sites databases and can combine other analyses such as the identification of potential transmembrane domains and signal peptides.
  • 41. …… • PSORT: http://www.psort.org/ • This cite provides links to the PSORT family of programs for subcellular localization prediction as well as other datasets and resources relevant to localization prediction. • SignalP v3.0 Server: http://www.cbs.dtu.dk/services/SignalP/ • SignalP aims at identifying signal peptides in eukaryotes and bacteria query proteins. • TMHMM v2.0 server: http://www.cbs.dtu.dk/services/TMHMM/ • TMHMM aims at identifying trans-membrane domains in proteins (eukaryotic or prokaryotic).
  • 42.  Some databases in the field of molecular biology… AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb, BBDB, BCGD, Beanref, Biolmage, BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS- MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM, etc .................. !!!!
  • 43. Categories of databases for Life Sciences • Sequences (DNA, protein) • Genomics • Mutation/polymorphism • Protein domain/family • Proteomics (2D gel, Mass Spectrometry) • 3D structure • Metabolic networks • Regulatory networks • Bibliography • Expression (Microarrays,…) • Specialized
  • 44. Bookshelf: A collection of searchable biomedical books linked to PubMed. PubMed: Allows searching by author names, journal titles, and a new Preview/Index option. PubMed database provides access to over 12 million MEDLINE citations back to the mid-1960's. It includes History and Clipboard options which may enhance your search session. PubMed Central: The U.S. National Library of Medicine digital archive of life science journal literature. OMIM: Online Mendelian Inheritance in Man is a database of human genes and genetic disorders (also OMIA). Literature Databases:
  • 45. ..... • BLAST is… Basic Local Alignment Search Tool • NCBI's sequence similarity search tool • supports analysis of DNA and protein databases • 80,000 searches per day
  • 46. Why use BLAST? • BLAST searching is fundamental to understanding the relatedness of any favourite query sequence to other known proteins or DNA sequences. • Applications include: – identifying orthologs and paralogs – discovering new genes or proteins – discovering variants of genes or proteins – investigating expressed sequence tags (ESTs) – exploring protein structure and function
  • 47. .... • TaxBrowser is… • browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses). • taxonomy information such as genetic codes. • molecular data on extinct organisms.
  • 48.
  • 49.
  • 50. What is an accession number? • An accession number is a label that is used to identify a sequence. It is a unique string of letters and/or numbers that corresponds to a given molecular sequence. • Examples:  DNA AF492453 GenBank genomic sequence (same at EBI)  Protein AAM97590 GenBank protein Q8MV55 SwissProt protein Non Protein Data Bank structure record  Publication 12192407 PubMed ID - Williams et al. Nature 418: 865-9 (2002).
  • 51. PubMed (Medline) • MEDLINE covers the fields of medicine, nursing, dentistry, veterinary medicine, public health, and preclinical sciences • Contains citations from approximately 5,200 worldwide journals in 37 languages; 60 languages for older journals. • Contains over 20 million citations since 1948 • Contains links to biological db and to some journals • New records are added to PreMEDLINE daily!
  • 52. Type in a Query term • Enter your search words in the query box and hit the “Go” button http://www.ncbi.nlm.nih.gov/entrez/query/static/help/helpdoc.html#Searching
  • 53. The Syntax … 1. Boolean operators: AND, OR, NOT must be entered in UPPERCASE (e.g., promoters OR response elements). The default is AND. 2. Entrez processes all Boolean operators in a left-to-right sequence. The order in which Entrez processes a search statement can be changed by enclosing individual concepts in parentheses. The terms inside the parentheses are processed first. For example, the search statement: g1p3 OR (response AND element AND promoter). 3. Quotation marks: The term inside the quotation marks is read as one phrase (e.g. “public health” is different than public health, which will also include articles on public latrines and their effect on health workers). 4. Asterisk: Extends the search to all terms that start with the letters before the asterisk. For example, dia* will include such terms as diaphragm, dial, and diameter.
  • 54. Refine the Query • Often a search finds too many (or too few) sequences, so you can go back and try again with more (or fewer) keywords in your query • The “History” feature allows you to combine any of your past queries. • The “Limits” feature allows you to limit a query to specific organisms, sequences submitted during a specific period of time, etc. • [Many other features are designed to search for literature in MEDLINE]
  • 55. The OMIM (Online Mendelian Inheritance in Man) – Genes and genetic disorders – Edited by team at Johns Hopkins – Updated daily
  • 56. MIM Number Prefixes * gene with known sequence + gene with known sequence and phenotype # phenotype description, molecular basis known % mendelian phenotype or locus, molecular basis unknown no prefix other, mainly phenotypes with suspected mendelian basis
  • 57. Searching OMIM • Search Fields – Name of trait, e.g., hypertension – Cytogenetic location, e.g., 1p31.6 – Inheritance, e.g., autosomal dominant – Gene, e.g., coagulation factor VIII
  • 58. OMIM search tags All Fields [ALL] Allelic Variant [AV] or [VAR] Chromosome [CH] or [CHR] Clinical Synopsis [CS] or [CLIN] Gene Map [GM] or [MAP] Gene Name [GN] or [GENE] Reference [RE] or [REF]
  • 59.
  • 60. Online Literature databases 1. Google Scholar 2. Google Books 3. Web of Science
  • 62. Enables you to search specifically for scholarly literature, including peer-reviewed papers, theses, books, preprints, abstracts and technical reports from all broad areas of research. What is Google Scholar?
  • 63. Use Google Scholar to find articles from a wide variety of academic publishers, professional societies, preprint repositories and universities, as well as scholarly articles available across the web.
  • 64. Google Scholar orders your search results by how relevant they are to your query, so the most useful references should appear at the top of the page This relevance ranking takes into account the: full text of each article. the article's author, the publication in which the article appeared and how often it has been cited in scholarly literature.
  • 65. What other DATA can we retrieve from the record?
  • 66.
  • 67.
  • 68. 5. Google Book Search
  • 69.
  • 70. 6. Web of science http://http://apps.webofknowledge.com.ezproxy.lib.uh.edu/WOS_GeneralSearch_input.do?product =WOS&search_mode=GeneralSearch&SID=4FB7LbbLgDMhG9fDiLh&preferencesSaved=
  • 71.
  • 72.
  • 74. Genomics • Because of the multicellular structure, each cell type does gene expression in a different way –although each cell has the same content as far as the genetic constitution. • i.e. All the information for a liver cell to be a liver cell is also present on nose cell, so gene expression is the only thing that differentiates
  • 75. Genomics - Finding Genes • Gene in sequence data – needle in a haystack • However as the needle is different from the haystack genes are not diff from the rest of the sequence data • Is whole array of nt we try to find and border mark a set o nt as a gene • This is one of the challenges of bioinformatics • Neural networks and dynamic programming are being employed
  • 76. Organism Genome Size (Mb) bp * 1,000,000 Gene Number Web Site Yeast 13.5 6,241 http://genome- www.stanford.edu /Saccharomyces Fruit Flies 180 13,601 http://flybase.bio. indiana.edu Homo Sapiens 3,000 45,000 http://www.ncbi.n lm.nih.gov/genom e/guide
  • 77. Proteomics • Proteome is the sum total of an organisms proteins • More difficult than genomics – 4 20 – Simple chemical makeup complex – Can duplicate can’t • We are entering into the ‘post genome era’ • Meaning much has been done with the Genes – not that it’s a over
  • 78. Proteomics….. • The relationship between the RNA and the protein it codes are usually very different • After translation proteins do change – So aa sequence do not tell anything about the post translation changes • Proteins are not active until they are combined into a larger complex or moved to a relevant location inside or outside the cell • So aa only hint in these things • Also proteins must be handled more carefully in labs as they tend to change when in touch with an inappropriate material
  • 79. Protein Structure Prediction • Is one of the biggest challenges of bioinformatics and esp. biochemistry • No algorithm is there now to consistently predict the structure of proteins
  • 80. Structure Prediction methods • Comparative Modeling – Target proteins structure is compared with related proteins – Proteins with similar sequences are searched for structures
  • 81. Phylogenetics • The taxonomical system reflects evolutionary relationships • Phylogenetics trees are things which reflect the evolutionary relationship thru a picture/graph • Rooted trees where there is only one ancestor • Un rooted trees just showing the relationship • Phylogenetic tree reconstruction algorithms are also an area of research
  • 83. Medical Implications • Pharmacogenomics – Not all drugs work on all patients, some good drugs cause death in some patients – So by doing a gene analysis before the treatment the offensive drugs can be avoided – Also drugs which cause death to most can be used on a minority to whose genes that drug is well suited – volunteers wanted! – Customized treatment • Gene Therapy – Replace or supply the defective or missing gene – E.g: Insulin and Factor VIII or Haemophilia • BioWeapons (??)
  • 84. Diagnosis of Disease • Diagnosis of disease – Identification of genes which cause the disease will help detect disease at early stage e.g. Huntington disease - • Symptoms – uncontrollable dance like movements, mental disturbance, personality changes and intellectual impairment • Death in 10-15 years • The gene responsible for the disease has been identified • Contains excessively repeated sections of CAG • So once analyzed the couple can be counseled
  • 85. Drug Design • Can go up to 15 yrs and $700 million • One of the goals of bioinformatics is to reduce the time and cost involved with it. • The process – Discovery • Computational methods can improves this – Testing
  • 86. Discovery Target identification – Identifying the molecule on which the germs relies for its survival – Then we develop another molecule i.e. drug which will bind to the target – So the germ will not be able to interact with the target. – Proteins are the most common targets
  • 87. Discovery… • For example HIV produces HIV protease which is a protein and which in turn eat other proteins • This HIV protease has an active site where it binds to other molecules • So HIV drug will go and bind with that active site – Easily said than done!
  • 88. Discovery… • Lead compounds are the molecules that go and bind to the target protein’s active site • Traditionally this has been a trial and error method • Now this is being moved into the realm of computers
  • 89. Restriction Analysis of DNA • Special enzymes termed restriction enzymes have been discovered in many different bacteria and other single-celled organisms. These enzymes act as chemical scissors to cut λ DNA into pieces. • They are able to scan along a length of DNA looking for a particular sequence of bases that they recognize. • This recognition site or sequence is generally from 4 to 6 base pairs in length. Once it is located, the enzyme will attach to the DNA molecule and cut each strand of the double helix- the first step in a process called restriction mapping. • The restriction enzyme will continue to do this along the full length of the DNA molecule which will then break into fragments. The size of these fragments is measured in base pairs or kilobase (1000 bases) pairs. • Since the recognition site or sequence of base pairs is known for each restriction enzyme, we can use this to form a detailed analysis of the sequence of bases in specific regions of the DNA in which we are interested. • This procedure is one of the most important in modern biology.
  • 90. .... Restriction analysis • In the presence of specific DNA repair enzymes, DNA fragments will re-anneal or stick themselves to other fragments with cut ends that are complimentary to their own end sequence. • It doesn’t matter if the fragment that matches the cut end comes from the same organism or from a different one. • This ability of DNA to repair itself has been utilized by scientists to introduce foreign DNA into an organism. • This DNA may contain genes that allow the organism to exhibit a new function or process. This would include transferring genes that will result in a change in the nutritional quality of a crop or perhaps allow a plant to grow in a region that is colder than its usual preferred area.
  • 91. Example: Restriction Digestion and Analysis of DNA from Bacteriophage λ • This small virus is 48,502 base pairs in length which is very small compared with the human genome of approximately 3 billion base pairs. • Since the whole sequence of λ is already known we can predict where each restriction enzyme will cut and thus the expected size of the fragments that will be produced. • If the virus DNA is exposed to the restriction enzyme for only a short time, then not every restriction site will be cut by the enzyme. • This will result in fragments ranging in size from the smallest possible (all sites are cut) to in-between lengths (some of the sites are cut) to the longest (no sites are cut). This is termed a partial restriction digestion.
  • 92. ..... • After overnight digestion, the reaction is stopped by addition of a loading buffer. • The DNA fragments are separated by electrophoresis, a process that involves application of an electric field to cause the DNA fragments to migrate into an agarose gel. • The gel is then stained with a methylene blue stain to visualize the DNA bands and may be photographed.
  • 93. ..... • The movement of the fragments during electrophoresis will always be towards the positive electrode because DNA is a negatively charged molecule. • The fragments move through the gel at a rate that is determined by their size and shape, with the smallest moving the fastest. • DNA cannot be seen as it moves through the gel. That is why a loading dye must be added to each of the samples before it is pipetted into the wells. • The progress of the dye can be seen in the gel. It will initially appear as a blue band, eventually resolving into two bands of different colours.
  • 94. ...... • Restriction enzymes cut at specific sites along the DNA. These sites are determined by the sequence of bases which usually form palindromes. • Palindromes are groups of letters that read the same in both the forward and backwards orientation. • In the case of DNA the letters are found on both the forward and the reverse strands of the DNA. • For example, the 5’ to 3’ strand may have the sequence GAATTC. The complimentary bases on the opposite strand will be CTTAAG, which is the same as reading the first strand backwards! • Many enzymes recognize these types of sequences and will attach to the DNA at this site and then cut the strand between two of the bases. In this example, the DNA was digested with BamHI, EcoRI and HindIII restriction enzymes, and their sequences are as follows, with the cut site indicated by the arrow.
  • 95.
  • 96.
  • 97. λ cut with EcoRI λ cut with HindIII λ cut with BamHI
  • 99. Assignment: Using the graph in next slide, address the following • Calculate the size the resulting fragments will be after digestion and write them on the map. • How many fragments would you expect to see for each of the maps A, B and C? • Draw these fragments onto the graph in the next slide. • Now compare the size of the fragments that you have calculated with the bands shown in the photographs of the gels and determine which of the enzymes, BamHI, EcoRI and HindIII were used to cut A, B and C. • How many times does the sequence GAATTC occur in the λ DNA sequence? What about AAGCTT and GGATCC?