SlideShare a Scribd company logo
1 of 11
Lecture 1
BNFO 135
Usman Roshan
Course overview
• Python progamming language (and some Unix
basics)
– Unix basics
– Intro Python exercises
– Programs for comparing DNA and protein sequences
• Sequence analysis
– Pairwise and multiple sequence comparison
– Sequence alignments
– Application of alignments
– Heuristic alignment (BLAST)
Overview (contd)
• Grade: 15% programming assignments, two 25% mid-
terms and 25% final exam
• Recommended Texts:
– Bioinformatics Programming using Python by Mitchell L Model
Nothing in biology makes sense,
except in the light of evolution
AAGACTT -3 mil yrs
-2 mil yrs
-1 mil yrs
today
AAGACTT
T_GACTT
AAGGCTT
_GGGCTT TAGACCTT A_CACTT
ACCTT
(Cat)
ACACTTC
(Lion)
TAGCCCTTA
(Monkey)
TAGGCCTT
(Human)
GGCTT
(Mouse)
T_GACTT
AAGGCTT
AAGACTT
_GGGCTT TAGACCTT A_CACTT
AAGGCTT T_GACTT
AAGACTT
TAGGCCTT
(Human)
TAGCCCTTA
(Monkey)
A_C_CTT
(Cat)
A_CACTTC
(Lion)
_G_GCTT
(Mouse)
_GGGCTT TAGACCTT A_CACTT
AAGGCTT T_GACTT
AAGACTT
Representing DNA in a format
manipulatable by computers
• DNA is a double-helix molecule made
up of four nucleotides:
– Adenosine (A)
– Cytosine (C)
– Thymine (T)
– Guanine (G)
• Since A (adenosine) always pairs with
T (thymine) and C (cytosine) always
pairs with G (guanine) knowing only
one side of the ladder is enough
• We represent DNA as a sequence of
letters where each letter could be
A,C,G, or T.
• For example, for the helix shown here
we would represent this as CAGT.
Transcription and translation
Amino acids
Proteins are chains of
amino acids. There are
twenty different amino
acids that chain in
different ways to form
different proteins.
For example,
FLLVALCCRFGH
(this is how we could store
it in a file)
This sequence of amino
acids folds to form a 3-D
structure
Protein folding
Protein folding
• The protein folding
problem is to determine
the 3-D protein structure
from the sequence.
• Experimental techniques
are very expensive.
• Computational are cheap
but difficult to solve.
• By comparing sequences
we can deduce the
evolutionary conserved
portions which are also
functional (most of the time).
Protein
structure
• Primary structure: sequence of
amino acids.
• Secondary structure: parts of the
chain organizes itself into alpha
helices, beta sheets, and coils. Helices
and sheets are usually evolutionarily
conserved and can aid sequence
alignment.
• Tertiary structure: 3-D structure of
entire chain
• Quaternary structure: Complex of
several chains
Key points
• DNA can be represented as strings
consisting of four letters: A, C, G, and T.
They could be very long, e.g. thousands
and even millions of letters
• Proteins are also represented as strings of
20 letters (each letter is an amino acid).
Their 3-D structure determines the
function to a large extent.

More Related Content

Similar to bnfo135_lec1.ppt

Lecture3 intro to_proteins (1)
Lecture3 intro to_proteins (1)Lecture3 intro to_proteins (1)
Lecture3 intro to_proteins (1)SourabhKumar240
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformaticsjaumebp
 
Finding Km Values httpwww.brenda-enzymes.org F.docx
Finding Km Values httpwww.brenda-enzymes.org F.docxFinding Km Values httpwww.brenda-enzymes.org F.docx
Finding Km Values httpwww.brenda-enzymes.org F.docxernestc3
 
A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...CSCJournals
 
Genetics (PPT from Mrs. Brenda Lee)
Genetics (PPT from Mrs. Brenda Lee)Genetics (PPT from Mrs. Brenda Lee)
Genetics (PPT from Mrs. Brenda Lee)Carla
 
Evolutionconnection translation
Evolutionconnection translationEvolutionconnection translation
Evolutionconnection translationElsa von Licy
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
Randomizing genome-scale metabolic networks
Randomizing genome-scale metabolic networksRandomizing genome-scale metabolic networks
Randomizing genome-scale metabolic networksAreejit Samal
 
Nucleotides & nucleic acids
Nucleotides  & nucleic acidsNucleotides  & nucleic acids
Nucleotides & nucleic acidsTeklu Tadesse
 
The Human Genome Project - Part I
The Human Genome Project - Part IThe Human Genome Project - Part I
The Human Genome Project - Part Ihhalhaddad
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentSaramita De Chakravarti
 
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATIONAMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATIONcscpconf
 
Representations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data MiningRepresentations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data MiningVijay Raghavan
 

Similar to bnfo135_lec1.ppt (20)

P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
Lecture3 intro to_proteins (1)
Lecture3 intro to_proteins (1)Lecture3 intro to_proteins (1)
Lecture3 intro to_proteins (1)
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Finding Km Values httpwww.brenda-enzymes.org F.docx
Finding Km Values httpwww.brenda-enzymes.org F.docxFinding Km Values httpwww.brenda-enzymes.org F.docx
Finding Km Values httpwww.brenda-enzymes.org F.docx
 
Aadee Srivastava.pptx
Aadee Srivastava.pptxAadee Srivastava.pptx
Aadee Srivastava.pptx
 
A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...
 
Genetics (PPT from Mrs. Brenda Lee)
Genetics (PPT from Mrs. Brenda Lee)Genetics (PPT from Mrs. Brenda Lee)
Genetics (PPT from Mrs. Brenda Lee)
 
Evolutionconnection translation
Evolutionconnection translationEvolutionconnection translation
Evolutionconnection translation
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Translation
TranslationTranslation
Translation
 
CADD meeting 08-30-2016
CADD meeting 08-30-2016CADD meeting 08-30-2016
CADD meeting 08-30-2016
 
Randomizing genome-scale metabolic networks
Randomizing genome-scale metabolic networksRandomizing genome-scale metabolic networks
Randomizing genome-scale metabolic networks
 
Nucleotides & nucleic acids
Nucleotides  & nucleic acidsNucleotides  & nucleic acids
Nucleotides & nucleic acids
 
The Human Genome Project - Part I
The Human Genome Project - Part IThe Human Genome Project - Part I
The Human Genome Project - Part I
 
protein.pptx
protein.pptxprotein.pptx
protein.pptx
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
 
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATIONAMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
AMINO ACID INTERACTION NETWORK PREDICTION USING MULTI-OBJECTIVE OPTIMIZATION
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
Representations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data MiningRepresentations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data Mining
 

Recently uploaded

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

bnfo135_lec1.ppt

  • 2. Course overview • Python progamming language (and some Unix basics) – Unix basics – Intro Python exercises – Programs for comparing DNA and protein sequences • Sequence analysis – Pairwise and multiple sequence comparison – Sequence alignments – Application of alignments – Heuristic alignment (BLAST)
  • 3. Overview (contd) • Grade: 15% programming assignments, two 25% mid- terms and 25% final exam • Recommended Texts: – Bioinformatics Programming using Python by Mitchell L Model
  • 4. Nothing in biology makes sense, except in the light of evolution AAGACTT -3 mil yrs -2 mil yrs -1 mil yrs today AAGACTT T_GACTT AAGGCTT _GGGCTT TAGACCTT A_CACTT ACCTT (Cat) ACACTTC (Lion) TAGCCCTTA (Monkey) TAGGCCTT (Human) GGCTT (Mouse) T_GACTT AAGGCTT AAGACTT _GGGCTT TAGACCTT A_CACTT AAGGCTT T_GACTT AAGACTT TAGGCCTT (Human) TAGCCCTTA (Monkey) A_C_CTT (Cat) A_CACTTC (Lion) _G_GCTT (Mouse) _GGGCTT TAGACCTT A_CACTT AAGGCTT T_GACTT AAGACTT
  • 5. Representing DNA in a format manipulatable by computers • DNA is a double-helix molecule made up of four nucleotides: – Adenosine (A) – Cytosine (C) – Thymine (T) – Guanine (G) • Since A (adenosine) always pairs with T (thymine) and C (cytosine) always pairs with G (guanine) knowing only one side of the ladder is enough • We represent DNA as a sequence of letters where each letter could be A,C,G, or T. • For example, for the helix shown here we would represent this as CAGT.
  • 7. Amino acids Proteins are chains of amino acids. There are twenty different amino acids that chain in different ways to form different proteins. For example, FLLVALCCRFGH (this is how we could store it in a file) This sequence of amino acids folds to form a 3-D structure
  • 9. Protein folding • The protein folding problem is to determine the 3-D protein structure from the sequence. • Experimental techniques are very expensive. • Computational are cheap but difficult to solve. • By comparing sequences we can deduce the evolutionary conserved portions which are also functional (most of the time).
  • 10. Protein structure • Primary structure: sequence of amino acids. • Secondary structure: parts of the chain organizes itself into alpha helices, beta sheets, and coils. Helices and sheets are usually evolutionarily conserved and can aid sequence alignment. • Tertiary structure: 3-D structure of entire chain • Quaternary structure: Complex of several chains
  • 11. Key points • DNA can be represented as strings consisting of four letters: A, C, G, and T. They could be very long, e.g. thousands and even millions of letters • Proteins are also represented as strings of 20 letters (each letter is an amino acid). Their 3-D structure determines the function to a large extent.