SlideShare a Scribd company logo
1 of 25
Genome Data ManagementGenome Data Management
Shabeer Ismaeel
MSC IT II
SEMESTER
Department Of Information
Technology .
• Biological Sciences.Biological Sciences.
• Genetics.Genetics.
• Characteristics of Biological Data.Characteristics of Biological Data.
• What is Bioinformatics?What is Bioinformatics?
• Human Genome and availability ofHuman Genome and availability of
information .information .
• Existing Biological Databases.Existing Biological Databases.
• Various Branches Benefited.Various Branches Benefited.
Contents
Biological Sciences.Biological Sciences.
– The biological sciences encompass an enormousThe biological sciences encompass an enormous
variety of information.variety of information.
• EnvironmentalEnvironmental sciencescience gives us a view of how speciesgives us a view of how species
live and interact in a world filled with natural phenomena.live and interact in a world filled with natural phenomena.
• BiologyBiology andand ecologyecology study particular species.study particular species.
• AnatomyAnatomy focuses on the overall structure of an organism,focuses on the overall structure of an organism,
documenting the physical aspects of individual bodies.documenting the physical aspects of individual bodies.
• Traditional medicine and physiologyTraditional medicine and physiology break the organismbreak the organism
into systems and tissues and strive to collect informationinto systems and tissues and strive to collect information
on the workings of these systems and the organism as aon the workings of these systems and the organism as a
wholewhole..
• Histology and cell biologyHistology and cell biology delve into thedelve into the
tissue and cellular levels and providetissue and cellular levels and provide
knowledge about the inner structure andknowledge about the inner structure and
function of the cell.function of the cell.
-This wealth of information that has been-This wealth of information that has been
generated, classified, and stored forgenerated, classified, and stored for
centuries has only recently become acenturies has only recently become a
major application of database technology.major application of database technology.
Genetics.Genetics.
• GeneticsGenetics has emerged as an ideal fieldhas emerged as an ideal field
for the application of informationfor the application of information
technology.technology.
– In a broad sense, it can be taught of as theIn a broad sense, it can be taught of as the
construction of models based onconstruction of models based on
information about genes and populationinformation about genes and population
and the seeking out of relationships in thatand the seeking out of relationships in that
information.information.
• Genes can be defined as units of heredity.Genes can be defined as units of heredity.
-The study of genetics can be divided into three-The study of genetics can be divided into three
branches:branches:
MendelianMendelian geneticsgenetics is the study of theis the study of the
transmission of traits between generations.transmission of traits between generations.
MolecularMolecular geneticsgenetics is the study of the chemicalis the study of the chemical
structure and function of genes at the molecularstructure and function of genes at the molecular
level.level.
PopulationPopulation geneticsgenetics is the study of how geneticis the study of how genetic
information varies across populations ofinformation varies across populations of
organisms.organisms.
 The origins ofThe origins of molecular geneticsmolecular genetics can be traced tocan be traced to
two important discoveries:two important discoveries:
- In 1869 when Friedrich Miescher discovered- In 1869 when Friedrich Miescher discovered
Nuclein and its primary component,Nuclein and its primary component,
deoxyribonucleic acid (DNA).deoxyribonucleic acid (DNA).
In subsequent research DNA and a related compound,In subsequent research DNA and a related compound,
ribonucleic acid, were found to be composed of nucleotides (aribonucleic acid, were found to be composed of nucleotides (a
sugar, a phosphate, and a base combining to form nucleic acid)sugar, a phosphate, and a base combining to form nucleic acid)
linked into long polymers via the sugar and phosphate.linked into long polymers via the sugar and phosphate.
--The second discovery was the demonstration inThe second discovery was the demonstration in
1944 by Oswald Avery that DNA was indeed the1944 by Oswald Avery that DNA was indeed the
molecular substance carrying genetic information.molecular substance carrying genetic information.
 Genes were shown to be composed of chains ofGenes were shown to be composed of chains of
nucleic acids arranged linearly on chromosomes andnucleic acids arranged linearly on chromosomes and
to serve three primary functions:to serve three primary functions:
-Replicating genetic information between-Replicating genetic information between
generations,generations,
-Providing blueprints for the creation of polypeptides,-Providing blueprints for the creation of polypeptides,
andand
-Accumulating changes– thereby allowing evolution-Accumulating changes– thereby allowing evolution
to occur.to occur.
------------------Watson and Crick found the double-helixWatson and Crick found the double-helix
structure of the DNA in 1953, which gave molecularstructure of the DNA in 1953, which gave molecular
biology a new direction.biology a new direction.
Characteristics of Biological DataCharacteristics of Biological Data
• Biological data exhibits many specialBiological data exhibits many special
characteristics that make managementcharacteristics that make management
of biological information a particularlyof biological information a particularly
challenging problem.challenging problem.
• The characteristics related to biologicalThe characteristics related to biological
information is calledinformation is called Bioinformatics.Bioinformatics.
What is Bioinformatics?What is Bioinformatics?
• Bioinformatics is the field of science in which
biology, computer science, and information
technology merge into a single discipline.
• The ultimate goal of the field is to enable the
discovery of new biological insights as well as to
create a global perspective from which unifying
principles in biology can be detected.
• There are three important sub-disciplines within
bioinformatics which include:
1.1. The development of new algorithms andThe development of new algorithms and
statistics with which to assess relationshipsstatistics with which to assess relationships
among members of large data sets.among members of large data sets.
2.2. The analysis and interpretation of various typesThe analysis and interpretation of various types
of data including nucleotide and amino acidof data including nucleotide and amino acid
sequences, protein domains, and proteinsequences, protein domains, and protein
structures.structures.
3.3. The development and implementation of toolsThe development and implementation of tools
that enable efficient access and management ofthat enable efficient access and management of
different types of information.different types of information.
Biological Data + Computer Calculations
Bioinformatics
The Bioinformatics SpectrumThe Bioinformatics Spectrum
Various characteristicsVarious characteristics
 Biological data is highly complex when comparedBiological data is highly complex when compared
with most other domains or applications.with most other domains or applications.
 The amount and range of variability in data is high.The amount and range of variability in data is high.
 Schemas in biological databases change at a rapidSchemas in biological databases change at a rapid
pace.pace.
 Representations of the same data by differentRepresentations of the same data by different
biologists will likely be different (even using thebiologists will likely be different (even using the
same system).same system).
 Most users of biological data do not require writeMost users of biological data do not require write
access to the database; read-only access isaccess to the database; read-only access is
adequate.adequate.
 Most biologists are not likely to have
knowledge of the internal structure of the
database or about schema design.
 The context of data gives added meaning for
its use in biological applications
 Defining and representing complex queries is
extremely important to the biologist.
 Users of biological information often require
access to “old” values of the data –
particularly when verifying previously reported
results.
What is the Human Genome?What is the Human Genome?
-The term genome is defined as the total genetic
information that can be obtained about an entity.
E.g., the human genome generally refers to the
complete set of genes required to create a
human being.
-The number is estimated to be more than
30,000 genes spread over 23 pairs of
chromosomes, with an estimated 3 to 4
billion nucleotides.
---The goal of the Human Genome Project (HGP
Began in 1990 ) is to obtain the complete
sequence – the ordering of the bases – of those
nucleotides.
Existing Biological Databases.Existing Biological Databases.
• Some of the existing database systems that areSome of the existing database systems that are
supporting or have grown out of the Human Genomesupporting or have grown out of the Human Genome
Project include:Project include:
• GenBankGenBank
– The notable DNA sequence database in the world today isThe notable DNA sequence database in the world today is
GenBank, maintained by the National Center forGenBank, maintained by the National Center for
Biotechnology Information (Biotechnology Information (NCBINCBI) of the National Library of) of the National Library of
Medicine (Medicine (NLMNLM).).
– Established in 1978 as a secret storage for DNA sequenceEstablished in 1978 as a secret storage for DNA sequence
data.data.
– Since 1978 expanded to include sequence tag data, proteinSince 1978 expanded to include sequence tag data, protein
sequence data, three-dimensional protein structure,sequence data, three-dimensional protein structure,
taxonomy, and links to the medical literature (MEDLINE).taxonomy, and links to the medical literature (MEDLINE).
- GenBank contains over 31 billion nucleotide bases of
more than 24 million sequences from over 100,000
species with roughly 1400 new organisms being added
each month.
-The database size in flat file format is over 100 GB
uncompressed and has been doubling every 15 months.
-The system is maintained as a combination of flat files,
relational databases, and files containing Abstract Syntax
Notation One (ASN.1 rules for encoding and decoding
data) .
• The Genome Database (GDB)The Genome Database (GDB)
--Created in 1989, GDB is a catalog of human gene mappingCreated in 1989, GDB is a catalog of human gene mapping
data, a process that associates a piece of information with adata, a process that associates a piece of information with a
particular location on the human genome.particular location on the human genome.
--The GDB system is built around Sybase, aThe GDB system is built around Sybase, a
commercial relational DBMS, and its data arecommercial relational DBMS, and its data are
modeled using standard Entity-Relationshipmodeled using standard Entity-Relationship
techniques.techniques.
------GDB distributes a Database Access Toolkit.------GDB distributes a Database Access Toolkit.
Online Mendelian Inheritance in ManOnline Mendelian Inheritance in Man
• Online Mandelian Inheritance in Man (Online Mandelian Inheritance in Man (OMIMOMIM) is) is
an electronic collection of information on thean electronic collection of information on the
genetic basis of human disease.genetic basis of human disease.
• In 1991 its administration was transferred fromIn 1991 its administration was transferred from
John Hopkins University to the NCBI(John Hopkins University to the NCBI(NationalNational
Center For Biotechnology InformationCenter For Biotechnology Information), and the), and the
entire database was converted to NCBI’sentire database was converted to NCBI’s
GenBank format. Today it contains more thanGenBank format. Today it contains more than
14,000 entries.14,000 entries.
EcoCycEcoCyc
– The Encyclopedia ofThe Encyclopedia of Escherichia coliEscherichia coli
Genes and Metabolism (Genes and Metabolism (EcoCycEcoCyc) is a recent) is a recent
experiment in combining information aboutexperiment in combining information about
the genome and the metabolism of E.coli K-the genome and the metabolism of E.coli K-
12(Bacteria).12(Bacteria).
– The database was created in 1996 as aThe database was created in 1996 as a
collaboration between Stanford Researchcollaboration between Stanford Research
Institute and Marine Biological Laboratory.Institute and Marine Biological Laboratory.
Gene OntologyGene Ontology
– Gene Ontology (GO) Consortium was formed inGene Ontology (GO) Consortium was formed in
1998 as a collaboration among three model1998 as a collaboration among three model
organism databases: FlyBase, Mouse Genomeorganism databases: FlyBase, Mouse Genome
Informatics (MGI) and Saccharomyces or yeastInformatics (MGI) and Saccharomyces or yeast
Genome Database (SGD).Genome Database (SGD).
• The goal is to produce a structured, precisely defined,The goal is to produce a structured, precisely defined,
common, controlled vocabulary for describing the roles ofcommon, controlled vocabulary for describing the roles of
genes and gene products in any organismgenes and gene products in any organism..
• Latest release of GO database has over 13,000 terms and moreLatest release of GO database has over 13,000 terms and more
than 18,000 relationships between terms.than 18,000 relationships between terms.
• GO was implemented using MySQL, an open source relationalGO was implemented using MySQL, an open source relational
database and a monthly database release is available in SQL anddatabase and a monthly database release is available in SQL and
XML(Extensible Markup Language) formats.XML(Extensible Markup Language) formats.
Summary Of the MajorSummary Of the Major
Genome-Related DatabasesGenome-Related Databases
Various Branches Benefited.Various Branches Benefited.
• Medicine
• PharmacogenomicsPharmacogenomics
• Biotechnology
• Bioinformatics
• Proteomics
Genome data management

More Related Content

What's hot (20)

Rasmol
RasmolRasmol
Rasmol
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
 
Protein database
Protein  databaseProtein  database
Protein database
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
 
Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOL
 
Protein Data Bank (PDB)
Protein Data Bank (PDB)Protein Data Bank (PDB)
Protein Data Bank (PDB)
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Swiss prot
Swiss protSwiss prot
Swiss prot
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
Molecular modeling database
Molecular modeling database Molecular modeling database
Molecular modeling database
 
UniProt
UniProtUniProt
UniProt
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
clustal omega.pptx
clustal omega.pptxclustal omega.pptx
clustal omega.pptx
 
UPGMA
UPGMAUPGMA
UPGMA
 

Similar to Genome data management

Human genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifeHuman genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifearjunaa7
 
Genomics and Bioinformatics
Genomics and BioinformaticsGenomics and Bioinformatics
Genomics and BioinformaticsAmit Garg
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSsandeshGM
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)jmoore89
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformaticsbiinoida
 
introduction to bioinfromatics.pptx
introduction to bioinfromatics.pptxintroduction to bioinfromatics.pptx
introduction to bioinfromatics.pptxAbelPhilipJoseph
 
History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)Madan Kumar Ca
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECTNusrat Gulbarga
 
Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdfsirwansleman
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahuKAUSHAL SAHU
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsAmna Jalil
 

Similar to Genome data management (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Basic of bioinformatics
Basic of bioinformaticsBasic of bioinformatics
Basic of bioinformatics
 
Human genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifeHuman genome project - Decoding the codes of life
Human genome project - Decoding the codes of life
 
Genomics and Bioinformatics
Genomics and BioinformaticsGenomics and Bioinformatics
Genomics and Bioinformatics
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
introduction to bioinfromatics.pptx
introduction to bioinfromatics.pptxintroduction to bioinfromatics.pptx
introduction to bioinfromatics.pptx
 
History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECT
 
Computer science history.pdf
Computer science history.pdfComputer science history.pdf
Computer science history.pdf
 
Human genome
Human genomeHuman genome
Human genome
 
The Human Genome Project
The Human Genome Project The Human Genome Project
The Human Genome Project
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahu
 
Genomics types
Genomics typesGenomics types
Genomics types
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Genomics
GenomicsGenomics
Genomics
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 

More from Shareb Ismaeel

More from Shareb Ismaeel (9)

Cybercrimes
CybercrimesCybercrimes
Cybercrimes
 
Unions
UnionsUnions
Unions
 
Spanning trees
Spanning treesSpanning trees
Spanning trees
 
Multiprocessor structures
Multiprocessor structuresMultiprocessor structures
Multiprocessor structures
 
Installation testing
Installation testingInstallation testing
Installation testing
 
E mail systems
E mail systemsE mail systems
E mail systems
 
E commerce
E commerceE commerce
E commerce
 
Disk structure
Disk structureDisk structure
Disk structure
 
.Netframework
.Netframework.Netframework
.Netframework
 

Recently uploaded

Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Studentskannan348865
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfMadan Karki
 
AI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfAI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfmahaffeycheryld
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalSwarnaSLcse
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxMustafa Ahmed
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)NareenAsad
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesRashidFaridChishti
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxMANASINANDKISHORDEOR
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.benjamincojr
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxRashidFaridChishti
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfAshrafRagab14
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...archanaece3
 
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...Nitin Sonavane
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 

Recently uploaded (20)

Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
 
AI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfAI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdf
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdf
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 

Genome data management

  • 1. Genome Data ManagementGenome Data Management Shabeer Ismaeel MSC IT II SEMESTER Department Of Information Technology .
  • 2. • Biological Sciences.Biological Sciences. • Genetics.Genetics. • Characteristics of Biological Data.Characteristics of Biological Data. • What is Bioinformatics?What is Bioinformatics? • Human Genome and availability ofHuman Genome and availability of information .information . • Existing Biological Databases.Existing Biological Databases. • Various Branches Benefited.Various Branches Benefited. Contents
  • 3. Biological Sciences.Biological Sciences. – The biological sciences encompass an enormousThe biological sciences encompass an enormous variety of information.variety of information. • EnvironmentalEnvironmental sciencescience gives us a view of how speciesgives us a view of how species live and interact in a world filled with natural phenomena.live and interact in a world filled with natural phenomena. • BiologyBiology andand ecologyecology study particular species.study particular species. • AnatomyAnatomy focuses on the overall structure of an organism,focuses on the overall structure of an organism, documenting the physical aspects of individual bodies.documenting the physical aspects of individual bodies. • Traditional medicine and physiologyTraditional medicine and physiology break the organismbreak the organism into systems and tissues and strive to collect informationinto systems and tissues and strive to collect information on the workings of these systems and the organism as aon the workings of these systems and the organism as a wholewhole..
  • 4. • Histology and cell biologyHistology and cell biology delve into thedelve into the tissue and cellular levels and providetissue and cellular levels and provide knowledge about the inner structure andknowledge about the inner structure and function of the cell.function of the cell. -This wealth of information that has been-This wealth of information that has been generated, classified, and stored forgenerated, classified, and stored for centuries has only recently become acenturies has only recently become a major application of database technology.major application of database technology.
  • 5. Genetics.Genetics. • GeneticsGenetics has emerged as an ideal fieldhas emerged as an ideal field for the application of informationfor the application of information technology.technology. – In a broad sense, it can be taught of as theIn a broad sense, it can be taught of as the construction of models based onconstruction of models based on information about genes and populationinformation about genes and population and the seeking out of relationships in thatand the seeking out of relationships in that information.information. • Genes can be defined as units of heredity.Genes can be defined as units of heredity.
  • 6. -The study of genetics can be divided into three-The study of genetics can be divided into three branches:branches: MendelianMendelian geneticsgenetics is the study of theis the study of the transmission of traits between generations.transmission of traits between generations. MolecularMolecular geneticsgenetics is the study of the chemicalis the study of the chemical structure and function of genes at the molecularstructure and function of genes at the molecular level.level. PopulationPopulation geneticsgenetics is the study of how geneticis the study of how genetic information varies across populations ofinformation varies across populations of organisms.organisms.
  • 7.  The origins ofThe origins of molecular geneticsmolecular genetics can be traced tocan be traced to two important discoveries:two important discoveries: - In 1869 when Friedrich Miescher discovered- In 1869 when Friedrich Miescher discovered Nuclein and its primary component,Nuclein and its primary component, deoxyribonucleic acid (DNA).deoxyribonucleic acid (DNA). In subsequent research DNA and a related compound,In subsequent research DNA and a related compound, ribonucleic acid, were found to be composed of nucleotides (aribonucleic acid, were found to be composed of nucleotides (a sugar, a phosphate, and a base combining to form nucleic acid)sugar, a phosphate, and a base combining to form nucleic acid) linked into long polymers via the sugar and phosphate.linked into long polymers via the sugar and phosphate. --The second discovery was the demonstration inThe second discovery was the demonstration in 1944 by Oswald Avery that DNA was indeed the1944 by Oswald Avery that DNA was indeed the molecular substance carrying genetic information.molecular substance carrying genetic information.
  • 8.  Genes were shown to be composed of chains ofGenes were shown to be composed of chains of nucleic acids arranged linearly on chromosomes andnucleic acids arranged linearly on chromosomes and to serve three primary functions:to serve three primary functions: -Replicating genetic information between-Replicating genetic information between generations,generations, -Providing blueprints for the creation of polypeptides,-Providing blueprints for the creation of polypeptides, andand -Accumulating changes– thereby allowing evolution-Accumulating changes– thereby allowing evolution to occur.to occur. ------------------Watson and Crick found the double-helixWatson and Crick found the double-helix structure of the DNA in 1953, which gave molecularstructure of the DNA in 1953, which gave molecular biology a new direction.biology a new direction.
  • 9. Characteristics of Biological DataCharacteristics of Biological Data • Biological data exhibits many specialBiological data exhibits many special characteristics that make managementcharacteristics that make management of biological information a particularlyof biological information a particularly challenging problem.challenging problem. • The characteristics related to biologicalThe characteristics related to biological information is calledinformation is called Bioinformatics.Bioinformatics.
  • 10. What is Bioinformatics?What is Bioinformatics? • Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. • The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be detected. • There are three important sub-disciplines within bioinformatics which include:
  • 11. 1.1. The development of new algorithms andThe development of new algorithms and statistics with which to assess relationshipsstatistics with which to assess relationships among members of large data sets.among members of large data sets. 2.2. The analysis and interpretation of various typesThe analysis and interpretation of various types of data including nucleotide and amino acidof data including nucleotide and amino acid sequences, protein domains, and proteinsequences, protein domains, and protein structures.structures. 3.3. The development and implementation of toolsThe development and implementation of tools that enable efficient access and management ofthat enable efficient access and management of different types of information.different types of information.
  • 12. Biological Data + Computer Calculations Bioinformatics
  • 13. The Bioinformatics SpectrumThe Bioinformatics Spectrum
  • 14. Various characteristicsVarious characteristics  Biological data is highly complex when comparedBiological data is highly complex when compared with most other domains or applications.with most other domains or applications.  The amount and range of variability in data is high.The amount and range of variability in data is high.  Schemas in biological databases change at a rapidSchemas in biological databases change at a rapid pace.pace.  Representations of the same data by differentRepresentations of the same data by different biologists will likely be different (even using thebiologists will likely be different (even using the same system).same system).  Most users of biological data do not require writeMost users of biological data do not require write access to the database; read-only access isaccess to the database; read-only access is adequate.adequate.
  • 15.  Most biologists are not likely to have knowledge of the internal structure of the database or about schema design.  The context of data gives added meaning for its use in biological applications  Defining and representing complex queries is extremely important to the biologist.  Users of biological information often require access to “old” values of the data – particularly when verifying previously reported results.
  • 16. What is the Human Genome?What is the Human Genome? -The term genome is defined as the total genetic information that can be obtained about an entity. E.g., the human genome generally refers to the complete set of genes required to create a human being. -The number is estimated to be more than 30,000 genes spread over 23 pairs of chromosomes, with an estimated 3 to 4 billion nucleotides. ---The goal of the Human Genome Project (HGP Began in 1990 ) is to obtain the complete sequence – the ordering of the bases – of those nucleotides.
  • 17. Existing Biological Databases.Existing Biological Databases. • Some of the existing database systems that areSome of the existing database systems that are supporting or have grown out of the Human Genomesupporting or have grown out of the Human Genome Project include:Project include: • GenBankGenBank – The notable DNA sequence database in the world today isThe notable DNA sequence database in the world today is GenBank, maintained by the National Center forGenBank, maintained by the National Center for Biotechnology Information (Biotechnology Information (NCBINCBI) of the National Library of) of the National Library of Medicine (Medicine (NLMNLM).). – Established in 1978 as a secret storage for DNA sequenceEstablished in 1978 as a secret storage for DNA sequence data.data. – Since 1978 expanded to include sequence tag data, proteinSince 1978 expanded to include sequence tag data, protein sequence data, three-dimensional protein structure,sequence data, three-dimensional protein structure, taxonomy, and links to the medical literature (MEDLINE).taxonomy, and links to the medical literature (MEDLINE).
  • 18. - GenBank contains over 31 billion nucleotide bases of more than 24 million sequences from over 100,000 species with roughly 1400 new organisms being added each month. -The database size in flat file format is over 100 GB uncompressed and has been doubling every 15 months. -The system is maintained as a combination of flat files, relational databases, and files containing Abstract Syntax Notation One (ASN.1 rules for encoding and decoding data) .
  • 19. • The Genome Database (GDB)The Genome Database (GDB) --Created in 1989, GDB is a catalog of human gene mappingCreated in 1989, GDB is a catalog of human gene mapping data, a process that associates a piece of information with adata, a process that associates a piece of information with a particular location on the human genome.particular location on the human genome. --The GDB system is built around Sybase, aThe GDB system is built around Sybase, a commercial relational DBMS, and its data arecommercial relational DBMS, and its data are modeled using standard Entity-Relationshipmodeled using standard Entity-Relationship techniques.techniques. ------GDB distributes a Database Access Toolkit.------GDB distributes a Database Access Toolkit.
  • 20. Online Mendelian Inheritance in ManOnline Mendelian Inheritance in Man • Online Mandelian Inheritance in Man (Online Mandelian Inheritance in Man (OMIMOMIM) is) is an electronic collection of information on thean electronic collection of information on the genetic basis of human disease.genetic basis of human disease. • In 1991 its administration was transferred fromIn 1991 its administration was transferred from John Hopkins University to the NCBI(John Hopkins University to the NCBI(NationalNational Center For Biotechnology InformationCenter For Biotechnology Information), and the), and the entire database was converted to NCBI’sentire database was converted to NCBI’s GenBank format. Today it contains more thanGenBank format. Today it contains more than 14,000 entries.14,000 entries.
  • 21. EcoCycEcoCyc – The Encyclopedia ofThe Encyclopedia of Escherichia coliEscherichia coli Genes and Metabolism (Genes and Metabolism (EcoCycEcoCyc) is a recent) is a recent experiment in combining information aboutexperiment in combining information about the genome and the metabolism of E.coli K-the genome and the metabolism of E.coli K- 12(Bacteria).12(Bacteria). – The database was created in 1996 as aThe database was created in 1996 as a collaboration between Stanford Researchcollaboration between Stanford Research Institute and Marine Biological Laboratory.Institute and Marine Biological Laboratory.
  • 22. Gene OntologyGene Ontology – Gene Ontology (GO) Consortium was formed inGene Ontology (GO) Consortium was formed in 1998 as a collaboration among three model1998 as a collaboration among three model organism databases: FlyBase, Mouse Genomeorganism databases: FlyBase, Mouse Genome Informatics (MGI) and Saccharomyces or yeastInformatics (MGI) and Saccharomyces or yeast Genome Database (SGD).Genome Database (SGD). • The goal is to produce a structured, precisely defined,The goal is to produce a structured, precisely defined, common, controlled vocabulary for describing the roles ofcommon, controlled vocabulary for describing the roles of genes and gene products in any organismgenes and gene products in any organism.. • Latest release of GO database has over 13,000 terms and moreLatest release of GO database has over 13,000 terms and more than 18,000 relationships between terms.than 18,000 relationships between terms. • GO was implemented using MySQL, an open source relationalGO was implemented using MySQL, an open source relational database and a monthly database release is available in SQL anddatabase and a monthly database release is available in SQL and XML(Extensible Markup Language) formats.XML(Extensible Markup Language) formats.
  • 23. Summary Of the MajorSummary Of the Major Genome-Related DatabasesGenome-Related Databases
  • 24. Various Branches Benefited.Various Branches Benefited. • Medicine • PharmacogenomicsPharmacogenomics • Biotechnology • Bioinformatics • Proteomics