Bioinformatics involves using computational tools to analyze large datasets from molecular biology, including DNA, RNA, proteins, and genetic information. It combines biology, computer science, and information technology. Key applications include processing and visualizing data from genomics, transcriptomics, and proteomics projects. Challenges include educating biologists in computational tools, developing databases to store biological data, and creating efficient search engines for complex databases. The document provides examples of molecular data like DNA sequences, protein sequences, and describes genomic and proteomic analysis.
Genomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble and analyze the function and structure of genomes
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
Genomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble and analyze the function and structure of genomes
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
The study of nucleic acids began with the discovery of DNA, progressed to the study of genes and small fragments, and has now exploded to the field of genomics. Genomics is the study of entire genomes, including the complete set of genes, their nucleotide sequence and organization, and their interactions within a species and with other species. The advances in genomics have been made possible by DNA sequencing technology. [Source: https://opentextbc.ca/biology/chapter/10-3-genomics-and-proteomics/]
What is bioinformatics?
About human genome
Human genome project
Aim of human genome project
History
Sequencing Strategy
Benefits of Human Genome Project research
Disadvantages of human genome project
Conclusion
References
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
Hi, I am Jesmin, studying MCSE. I think this file will help you if you want to know the basic information about Bioinformatics and the use of BLAST tool. The BLAST tool is the tool that matches the sequences of DNA,RNA and proteins.
The study of nucleic acids began with the discovery of DNA, progressed to the study of genes and small fragments, and has now exploded to the field of genomics. Genomics is the study of entire genomes, including the complete set of genes, their nucleotide sequence and organization, and their interactions within a species and with other species. The advances in genomics have been made possible by DNA sequencing technology. [Source: https://opentextbc.ca/biology/chapter/10-3-genomics-and-proteomics/]
What is bioinformatics?
About human genome
Human genome project
Aim of human genome project
History
Sequencing Strategy
Benefits of Human Genome Project research
Disadvantages of human genome project
Conclusion
References
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
Hi, I am Jesmin, studying MCSE. I think this file will help you if you want to know the basic information about Bioinformatics and the use of BLAST tool. The BLAST tool is the tool that matches the sequences of DNA,RNA and proteins.
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdfAnujkumaranit
Artificial intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems. It encompasses tasks such as learning, reasoning, problem-solving, perception, and language understanding. AI technologies are revolutionizing various fields, from healthcare to finance, by enabling machines to perform tasks that typically require human intelligence.
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...Oleg Kshivets
RESULTS: Overall life span (LS) was 2252.1±1742.5 days and cumulative 5-year survival (5YS) reached 73.2%, 10 years – 64.8%, 20 years – 42.5%. 513 LCP lived more than 5 years (LS=3124.6±1525.6 days), 148 LCP – more than 10 years (LS=5054.4±1504.1 days).199 LCP died because of LC (LS=562.7±374.5 days). 5YS of LCP after bi/lobectomies was significantly superior in comparison with LCP after pneumonectomies (78.1% vs.63.7%, P=0.00001 by log-rank test). AT significantly improved 5YS (66.3% vs. 34.8%) (P=0.00000 by log-rank test) only for LCP with N1-2. Cox modeling displayed that 5YS of LCP significantly depended on: phase transition (PT) early-invasive LC in terms of synergetics, PT N0—N12, cell ratio factors (ratio between cancer cells- CC and blood cells subpopulations), G1-3, histology, glucose, AT, blood cell circuit, prothrombin index, heparin tolerance, recalcification time (P=0.000-0.038). Neural networks, genetic algorithm selection and bootstrap simulation revealed relationships between 5YS and PT early-invasive LC (rank=1), PT N0—N12 (rank=2), thrombocytes/CC (3), erythrocytes/CC (4), eosinophils/CC (5), healthy cells/CC (6), lymphocytes/CC (7), segmented neutrophils/CC (8), stick neutrophils/CC (9), monocytes/CC (10); leucocytes/CC (11). Correct prediction of 5YS was 100% by neural networks computing (area under ROC curve=1.0; error=0.0).
CONCLUSIONS: 5YS of LCP after radical procedures significantly depended on: 1) PT early-invasive cancer; 2) PT N0--N12; 3) cell ratio factors; 4) blood cell circuit; 5) biochemical factors; 6) hemostasis system; 7) AT; 8) LC characteristics; 9) LC cell dynamics; 10) surgery type: lobectomy/pneumonectomy; 11) anthropometric data. Optimal diagnosis and treatment strategies for LC are: 1) screening and early detection of LC; 2) availability of experienced thoracic surgeons because of complexity of radical procedures; 3) aggressive en block surgery and adequate lymph node dissection for completeness; 4) precise prediction; 5) adjuvant chemoimmunoradiotherapy for LCP with unfavorable prognosis.
Title: Sense of Smell
Presenter: Dr. Faiza, Assistant Professor of Physiology
Qualifications:
MBBS (Best Graduate, AIMC Lahore)
FCPS Physiology
ICMT, CHPE, DHPE (STMU)
MPH (GC University, Faisalabad)
MBA (Virtual University of Pakistan)
Learning Objectives:
Describe the primary categories of smells and the concept of odor blindness.
Explain the structure and location of the olfactory membrane and mucosa, including the types and roles of cells involved in olfaction.
Describe the pathway and mechanisms of olfactory signal transmission from the olfactory receptors to the brain.
Illustrate the biochemical cascade triggered by odorant binding to olfactory receptors, including the role of G-proteins and second messengers in generating an action potential.
Identify different types of olfactory disorders such as anosmia, hyposmia, hyperosmia, and dysosmia, including their potential causes.
Key Topics:
Olfactory Genes:
3% of the human genome accounts for olfactory genes.
400 genes for odorant receptors.
Olfactory Membrane:
Located in the superior part of the nasal cavity.
Medially: Folds downward along the superior septum.
Laterally: Folds over the superior turbinate and upper surface of the middle turbinate.
Total surface area: 5-10 square centimeters.
Olfactory Mucosa:
Olfactory Cells: Bipolar nerve cells derived from the CNS (100 million), with 4-25 olfactory cilia per cell.
Sustentacular Cells: Produce mucus and maintain ionic and molecular environment.
Basal Cells: Replace worn-out olfactory cells with an average lifespan of 1-2 months.
Bowman’s Gland: Secretes mucus.
Stimulation of Olfactory Cells:
Odorant dissolves in mucus and attaches to receptors on olfactory cilia.
Involves a cascade effect through G-proteins and second messengers, leading to depolarization and action potential generation in the olfactory nerve.
Quality of a Good Odorant:
Small (3-20 Carbon atoms), volatile, water-soluble, and lipid-soluble.
Facilitated by odorant-binding proteins in mucus.
Membrane Potential and Action Potential:
Resting membrane potential: -55mV.
Action potential frequency in the olfactory nerve increases with odorant strength.
Adaptation Towards the Sense of Smell:
Rapid adaptation within the first second, with further slow adaptation.
Psychological adaptation greater than receptor adaptation, involving feedback inhibition from the central nervous system.
Primary Sensations of Smell:
Camphoraceous, Musky, Floral, Pepperminty, Ethereal, Pungent, Putrid.
Odor Detection Threshold:
Examples: Hydrogen sulfide (0.0005 ppm), Methyl-mercaptan (0.002 ppm).
Some toxic substances are odorless at lethal concentrations.
Characteristics of Smell:
Odor blindness for single substances due to lack of appropriate receptor protein.
Behavioral and emotional influences of smell.
Transmission of Olfactory Signals:
From olfactory cells to glomeruli in the olfactory bulb, involving lateral inhibition.
Primitive, less old, and new olfactory systems with different path
Recomendações da OMS sobre cuidados maternos e neonatais para uma experiência pós-natal positiva.
Em consonância com os ODS – Objetivos do Desenvolvimento Sustentável e a Estratégia Global para a Saúde das Mulheres, Crianças e Adolescentes, e aplicando uma abordagem baseada nos direitos humanos, os esforços de cuidados pós-natais devem expandir-se para além da cobertura e da simples sobrevivência, de modo a incluir cuidados de qualidade.
Estas diretrizes visam melhorar a qualidade dos cuidados pós-natais essenciais e de rotina prestados às mulheres e aos recém-nascidos, com o objetivo final de melhorar a saúde e o bem-estar materno e neonatal.
Uma “experiência pós-natal positiva” é um resultado importante para todas as mulheres que dão à luz e para os seus recém-nascidos, estabelecendo as bases para a melhoria da saúde e do bem-estar a curto e longo prazo. Uma experiência pós-natal positiva é definida como aquela em que as mulheres, pessoas que gestam, os recém-nascidos, os casais, os pais, os cuidadores e as famílias recebem informação consistente, garantia e apoio de profissionais de saúde motivados; e onde um sistema de saúde flexível e com recursos reconheça as necessidades das mulheres e dos bebês e respeite o seu contexto cultural.
Estas diretrizes consolidadas apresentam algumas recomendações novas e já bem fundamentadas sobre cuidados pós-natais de rotina para mulheres e neonatos que recebem cuidados no pós-parto em unidades de saúde ou na comunidade, independentemente dos recursos disponíveis.
É fornecido um conjunto abrangente de recomendações para cuidados durante o período puerperal, com ênfase nos cuidados essenciais que todas as mulheres e recém-nascidos devem receber, e com a devida atenção à qualidade dos cuidados; isto é, a entrega e a experiência do cuidado recebido. Estas diretrizes atualizam e ampliam as recomendações da OMS de 2014 sobre cuidados pós-natais da mãe e do recém-nascido e complementam as atuais diretrizes da OMS sobre a gestão de complicações pós-natais.
O estabelecimento da amamentação e o manejo das principais intercorrências é contemplada.
Recomendamos muito.
Vamos discutir essas recomendações no nosso curso de pós-graduação em Aleitamento no Instituto Ciclos.
Esta publicação só está disponível em inglês até o momento.
Prof. Marcus Renato de Carvalho
www.agostodourado.com
Ethanol (CH3CH2OH), or beverage alcohol, is a two-carbon alcohol
that is rapidly distributed in the body and brain. Ethanol alters many
neurochemical systems and has rewarding and addictive properties. It
is the oldest recreational drug and likely contributes to more morbidity,
mortality, and public health costs than all illicit drugs combined. The
5th edition of the Diagnostic and Statistical Manual of Mental Disorders
(DSM-5) integrates alcohol abuse and alcohol dependence into a single
disorder called alcohol use disorder (AUD), with mild, moderate,
and severe subclassifications (American Psychiatric Association, 2013).
In the DSM-5, all types of substance abuse and dependence have been
combined into a single substance use disorder (SUD) on a continuum
from mild to severe. A diagnosis of AUD requires that at least two of
the 11 DSM-5 behaviors be present within a 12-month period (mild
AUD: 2–3 criteria; moderate AUD: 4–5 criteria; severe AUD: 6–11 criteria).
The four main behavioral effects of AUD are impaired control over
drinking, negative social consequences, risky use, and altered physiological
effects (tolerance, withdrawal). This chapter presents an overview
of the prevalence and harmful consequences of AUD in the U.S.,
the systemic nature of the disease, neurocircuitry and stages of AUD,
comorbidities, fetal alcohol spectrum disorders, genetic risk factors, and
pharmacotherapies for AUD.
Acute scrotum is a general term referring to an emergency condition affecting the contents or the wall of the scrotum.
There are a number of conditions that present acutely, predominantly with pain and/or swelling
A careful and detailed history and examination, and in some cases, investigations allow differentiation between these diagnoses. A prompt diagnosis is essential as the patient may require urgent surgical intervention
Testicular torsion refers to twisting of the spermatic cord, causing ischaemia of the testicle.
Testicular torsion results from inadequate fixation of the testis to the tunica vaginalis producing ischemia from reduced arterial inflow and venous outflow obstruction.
The prevalence of testicular torsion in adult patients hospitalized with acute scrotal pain is approximately 25 to 50 percent
Couples presenting to the infertility clinic- Do they really have infertility...Sujoy Dasgupta
Dr Sujoy Dasgupta presented the study on "Couples presenting to the infertility clinic- Do they really have infertility? – The unexplored stories of non-consummation" in the 13th Congress of the Asia Pacific Initiative on Reproduction (ASPIRE 2024) at Manila on 24 May, 2024.
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?bkling
Are you curious about what’s new in cervical cancer research or unsure what the findings mean? Join Dr. Emily Ko, a gynecologic oncologist at Penn Medicine, to learn about the latest updates from the Society of Gynecologic Oncology (SGO) 2024 Annual Meeting on Women’s Cancer. Dr. Ko will discuss what the research presented at the conference means for you and answer your questions about the new developments.
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists Saeid Safari
Preoperative Management of Patients on GLP-1 Receptor Agonists like Ozempic and Semiglutide
ASA GUIDELINE
NYSORA Guideline
2 Case Reports of Gastric Ultrasound
These lecture slides, by Dr Sidra Arshad, offer a quick overview of physiological basis of a normal electrocardiogram.
Learning objectives:
1. Define an electrocardiogram (ECG) and electrocardiography
2. Describe how dipoles generated by the heart produce the waveforms of the ECG
3. Describe the components of a normal electrocardiogram of a typical bipolar leads (limb II)
4. Differentiate between intervals and segments
5. Enlist some common indications for obtaining an ECG
Study Resources:
1. Chapter 11, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 9, Human Physiology - From Cells to Systems, Lauralee Sherwood, 9th edition
3. Chapter 29, Ganong’s Review of Medical Physiology, 26th edition
4. Electrocardiogram, StatPearls - https://www.ncbi.nlm.nih.gov/books/NBK549803/
5. ECG in Medical Practice by ABM Abdullah, 4th edition
6. ECG Basics, http://www.nataliescasebook.com/tag/e-c-g-basics
Tom Selleck Health: A Comprehensive Look at the Iconic Actor’s Wellness Journeygreendigital
Tom Selleck, an enduring figure in Hollywood. has captivated audiences for decades with his rugged charm, iconic moustache. and memorable roles in television and film. From his breakout role as Thomas Magnum in Magnum P.I. to his current portrayal of Frank Reagan in Blue Bloods. Selleck's career has spanned over 50 years. But beyond his professional achievements. fans have often been curious about Tom Selleck Health. especially as he has aged in the public eye.
Follow us on: Pinterest
Introduction
Many have been interested in Tom Selleck health. not only because of his enduring presence on screen but also because of the challenges. and lifestyle choices he has faced and made over the years. This article delves into the various aspects of Tom Selleck health. exploring his fitness regimen, diet, mental health. and the challenges he has encountered as he ages. We'll look at how he maintains his well-being. the health issues he has faced, and his approach to ageing .
Early Life and Career
Childhood and Athletic Beginnings
Tom Selleck was born on January 29, 1945, in Detroit, Michigan, and grew up in Sherman Oaks, California. From an early age, he was involved in sports, particularly basketball. which played a significant role in his physical development. His athletic pursuits continued into college. where he attended the University of Southern California (USC) on a basketball scholarship. This early involvement in sports laid a strong foundation for his physical health and disciplined lifestyle.
Transition to Acting
Selleck's transition from an athlete to an actor came with its physical demands. His first significant role in "Magnum P.I." required him to perform various stunts and maintain a fit appearance. This role, which he played from 1980 to 1988. necessitated a rigorous fitness routine to meet the show's demands. setting the stage for his long-term commitment to health and wellness.
Fitness Regimen
Workout Routine
Tom Selleck health and fitness regimen has evolved. adapting to his changing roles and age. During his "Magnum, P.I." days. Selleck's workouts were intense and focused on building and maintaining muscle mass. His routine included weightlifting, cardiovascular exercises. and specific training for the stunts he performed on the show.
Selleck adjusted his fitness routine as he aged to suit his body's needs. Today, his workouts focus on maintaining flexibility, strength, and cardiovascular health. He incorporates low-impact exercises such as swimming, walking, and light weightlifting. This balanced approach helps him stay fit without putting undue strain on his joints and muscles.
Importance of Flexibility and Mobility
In recent years, Selleck has emphasized the importance of flexibility and mobility in his fitness regimen. Understanding the natural decline in muscle mass and joint flexibility with age. he includes stretching and yoga in his routine. These practices help prevent injuries, improve posture, and maintain mobilit
3. Molecular Bioinformatics
Molecular Bioinformatics involves the use
of computational tools to discover new
information in complex data sets (from the
one-dimensional information of DNA through
the two-dimensional information of RNA and
the three-dimensional information of proteins,
to the four-dimensional information of
evolving living systems).
4. Bioinformatics (Oxford English Dictionary):
The branch of science concerned with
information and information flow in biological
systems, especially the use of computational
methods in genetics and genomics.
5. What is bioinformatics?
• The application of computational tools on
molecular data, including the means to
acquire, analyse, or visualize such data.
• Key tools to handle and analyze the large
amount of data generated by large-scale
DNA, RNA and protein characterization
projects (genomics -transcriptomics -
proteomics).
6. Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop tools, softwares, algorithms
to store and analyze the data.
Bioinformaticians
Study biological questions by
analyzing molecular data
The field of science in which biology, computer science and
information technology merge into a single discipline
7. ....
• Bioinformatics uses computers, computing technology
and software to manage large amounts of biological data
and enable their analysis.
• At the end of this course students will be expected to:
– understand biological data and data management and
integration
– have a broad knowledge of computing and biological methods in
bioinformatics
– understand genomes, genome sequencing, genomic structure
and comparison
– know about the technology used in modern post-genomic
biology, the data produced and the software to manage it.
8. Introduction
Large databases that can be accessed and analyzed with
sophisticated tools have become central to biological
research and education.
The information content in the genomes of organisms,
in the molecular dynamics of proteins, and in population
dynamics, to name but a few areas, is enormous.
Biologists are increasingly finding that the management
of complex data sets is becoming a bottleneck for
scientific advances.
Therefore, bioinformatics is rapidly becoming a key
technology in all fields of biology.
9. The present bottlenecks in bioinformatics include;
the education of biologists in the use of advanced computing
tools,
the recruitment of computer scientists into this evolving field,
the limited availability of developed databases of biological
information,
the need for more efficient and intelligent search engines for
complex databases.
Bottlenecks
10. The hereditary information of all living organisms, with
the exception of some viruses, is carried by
deoxyribonucleic acid (DNA) molecules.
2 purines: 2 pyrimidines:
adenine (A) cytosine (C)
guanine (G) thymine (T)
two rings one ring
11. Eukaryotes may have up to 3
subcellular genomes:
1. Nuclear
2. Mitochondrial
3. Plastid
Bacteria have either circular
or linear genomes and may
also carry plasmids
The entire complement of genetic material carried by
an individual is called the genome.
Human chromosomes
Circular genome
12. Central dogma: DNA makes RNA makes Protein
Modified dogma: DNA makes DNA and RNA, RNA
makes DNA, RNA an Protein
15. Any region of the DNA sequence can, in principle,
code for six different amino acid sequences, because
any one of three different reading frames can be used
to interpret each of the two strands.
17. Some basic definitions
• Genomics---- Genome: The total genetic content contained in a
haploid set of chromosomes in eukaryotes, in a single
chromosome in bacteria, or in the DNA or RNA of viruses.
• Transcriptomics---- Transcriptome: the complete set of genes
encoded on a genome that can be transcribed.
• Proteomics---- Proteome: the complete set of proteins encoded
on a genome that can be expressed and modified by a cell,
tissue, or organism (Etymology: Protein+genome).
– Sub-cellular proteome: the complete set of proteins for a given
membrane or organelle (e.g. mitochondrial proteome).
– Membranome: the complete set of membranes from a cell.
– Metabolome: The metabolic products of the cell, that is, all the
metabolites
– Secretome: The secreted proteins of a cell?
– The phosphome:Total phosphorylated proteins of a cell?
18. How does it all look like on a computer monitor?
19. A cDNA sequence
>gi|14456711|ref|NM_000558.3| Homo sapiens hemoglobin, alpha 1 (HBA1), mRNA
ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCG
CCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACC
ACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGA
CGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACA
AGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCC
GAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCG
TTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGT
ACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC
20. A cDNA sequence (reading frame)
A protein sequence
>gi|14456711|ref|NM_000558.3| Homo sapiens hemoglobin, alpha 1 (HBA1), mRNA
ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCC
GCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCAC
CACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCG
ACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCAC
AAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGC
CGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACC
GTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCC
GTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC
>gi|4504347|ref|NP_000549.1| alpha 1 globin [Homo sapiens]
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH
VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
22. E. coli 4.6 x 106 nucleotides
– Approx. 4,000 genes
Yeast 15 x 106 nucleotides
– Approx. 6,000 genes
Human 3 x 109 nucleotides
– Approx. 30,000 genes
Smallest human chromosome 50 x 106 nucleotides
How big are whole genomes?
24. From DNA to Genome
Watson and Crick
DNA model
Sanger sequences
insulin protein
Sanger dideoxy
DNA sequencing
PCR (Polymerase
Chain Reaction)
1955
1960
1965
1970
1975
1980
1985
ARPANET
(early Internet)
PDB (Protein
Data Bank)
Sequence
alignment
GenBank database
Dayhoff’s Atlas
26. The first protein sequence reported was that of
bovine insulin in 1956, consisting of 51
residues.
Origin of bioinformatics and
biological databases:
Nearly a decade later, the first nucleic acid
sequence was reported, that of yeast
tRNAalanine with 77 bases.
27. In 1965, Dayhoff gathered all the available
sequence data to create the first bioinformatic
database (Atlas of Protein Sequence and
Structure).
The Protein DataBank followed in 1972 with a
collection of ten X-ray crystallographic protein
structures. The SWISSPROT protein sequence
database began in 1987.
33. What is a Database?
A structured collection of data held in computer storage; esp. one
that incorporates software to make it accessible in a variety of ways;
transf., any large collection of information.
database management: the organization and manipulation of data in
a database.
database management system (DBMS): a software package that
provides all the functions required for database management.
database system: a database together with a database
management system.
Oxford Dictionary
34. What is a database?
• A collection of data
– structured
– searchable (index) -> table of contents
– updated periodically (release) -> new edition
– cross-referenced (hyperlinks) -> links with other db
• Includes also associated tools (software) necessary
for access, updating, information insertion,
information deletion….
• Data storage management: flat files, relational
databases…
36. Why biological databases?
• Exponential growth in biological data.
• Data (genomic sequences, 3D structures, 2D
gel analysis, MS analysis, Microarrays….) are
no longer published in a conventional
manner, but directly submitted to databases.
• Essential tools for biological research. The
only way to publish massive amounts of data
without using all the paper in the world.
38. Some statistics
• More than 1000 different ‘biological’ databases
• Variable size: <100Kb to >20Gb
– DNA: > 20 Gb
– Protein: 1 Gb
– 3D structure: 5 Gb
– Other: smaller
• Update frequency: daily to annually to seldom to forget
about it.
• Usually accessible through the web (some free, some not)
39. International nucleotide data banks
EMBL
Europe
EMBL
EBI
GenBank
USA
NLM
NCBI
DDBJ
Japan
NIG
CIB
International
Advisory Meeting
Collaborative Meeting
TrEMBL NRDB
40. Databases
• NCBI (National Centre for Biotechnology Information):
http://www.ncbi.nlm.nih.gov/
• EBI: http://www.ebi.ac.uk/
• DDBJ: http://www.ddbj.nig.ac.jp/
• InterPro: http://www.ebi.ac.uk/interpro/
• InterPro is a database of protein families, domains and functional sites in
which identifiable features found in known proteins can be applied to
unknown protein sequences
• b) Search and analytical tools
• ORFFinder: http://www.ncbi.nlm.nih.gov/gorf/gorf.html
• It is an analysis tool which finds all open reading frames in a user's
sequence or in a sequence already in the database.
• InterProScan server: http://www.ebi.ac.uk/InterProScan/
• InterProScan is used to search various protein domain/motifs/functional
sites databases and can combine other analyses such as the identification
of potential transmembrane domains and signal peptides.
41. ……
• PSORT: http://www.psort.org/
• This cite provides links to the PSORT family of programs for
subcellular localization prediction as well as other datasets
and resources relevant to localization prediction.
• SignalP v3.0 Server:
http://www.cbs.dtu.dk/services/SignalP/
• SignalP aims at identifying signal peptides in eukaryotes
and bacteria query proteins.
• TMHMM v2.0 server:
http://www.cbs.dtu.dk/services/TMHMM/
• TMHMM aims at identifying trans-membrane domains in
proteins (eukaryotic or prokaryotic).
43. Categories of databases for Life
Sciences
• Sequences (DNA, protein)
• Genomics
• Mutation/polymorphism
• Protein domain/family
• Proteomics (2D gel, Mass Spectrometry)
• 3D structure
• Metabolic networks
• Regulatory networks
• Bibliography
• Expression (Microarrays,…)
• Specialized
44. Bookshelf: A collection of searchable biomedical books linked to
PubMed.
PubMed: Allows searching by author names, journal titles, and a
new Preview/Index option. PubMed database provides access to
over 12 million MEDLINE citations back to the mid-1960's. It
includes History and Clipboard options which may enhance your
search session.
PubMed Central: The U.S. National Library of Medicine digital
archive of life science journal literature.
OMIM: Online Mendelian Inheritance in Man is a database of
human genes and genetic disorders (also OMIA).
Literature Databases:
45. .....
• BLAST is…
Basic Local Alignment Search Tool
• NCBI's sequence similarity search tool
• supports analysis of DNA and protein
databases
• 80,000 searches per day
46. Why use BLAST?
• BLAST searching is fundamental to understanding
the relatedness of any favourite query sequence to
other known proteins or DNA sequences.
• Applications include:
– identifying orthologs and paralogs
– discovering new genes or proteins
– discovering variants of genes or proteins
– investigating expressed sequence tags (ESTs)
– exploring protein structure and function
47. ....
• TaxBrowser is…
• browser for the major divisions of living
organisms (archaea, bacteria, eukaryota,
viruses).
• taxonomy information such as genetic
codes.
• molecular data on extinct organisms.
48.
49.
50. What is an accession number?
• An accession number is a label that is used to identify a
sequence. It is a unique string of letters and/or numbers
that corresponds to a given molecular sequence.
• Examples:
DNA
AF492453 GenBank genomic sequence (same at EBI)
Protein
AAM97590 GenBank protein
Q8MV55 SwissProt protein
Non Protein Data Bank structure record
Publication
12192407 PubMed ID - Williams et al. Nature 418: 865-9 (2002).
51. PubMed (Medline)
• MEDLINE covers the fields of medicine, nursing, dentistry,
veterinary medicine, public health, and preclinical sciences
• Contains citations from approximately 5,200 worldwide journals in
37 languages; 60 languages for older journals.
• Contains over 20 million citations since 1948
• Contains links to biological db and to some journals
• New records are
added to
PreMEDLINE daily!
52. Type in a Query term
• Enter your search words in the
query box and hit the “Go” button
http://www.ncbi.nlm.nih.gov/entrez/query/static/help/helpdoc.html#Searching
53. The Syntax …
1. Boolean operators: AND, OR, NOT must be entered in
UPPERCASE (e.g., promoters OR response elements). The default
is AND.
2. Entrez processes all Boolean operators in a left-to-right sequence.
The order in which Entrez processes a search statement can be
changed by enclosing individual concepts in parentheses. The terms
inside the parentheses are processed first. For example, the search
statement: g1p3 OR (response AND element AND promoter).
3. Quotation marks: The term inside the quotation marks is read as one
phrase (e.g. “public health” is different than public health, which will
also include articles on public latrines and their effect on health
workers).
4. Asterisk: Extends the search to all terms that start with the letters
before the asterisk. For example, dia* will include such terms as
diaphragm, dial, and diameter.
54. Refine the Query
• Often a search finds too many (or too few) sequences, so you
can go back and try again with more (or fewer) keywords in
your query
• The “History” feature allows you to combine any of your past
queries.
• The “Limits” feature allows you to limit a query to specific
organisms, sequences submitted during a specific period of
time, etc.
• [Many other features are designed to search for literature in
MEDLINE]
55. The OMIM (Online Mendelian
Inheritance in Man)
– Genes and genetic disorders
– Edited by team at Johns Hopkins
– Updated daily
56. MIM Number Prefixes
* gene with known sequence
+ gene with known sequence and
phenotype
# phenotype description, molecular
basis known
% mendelian phenotype or locus,
molecular basis unknown
no prefix other, mainly phenotypes with
suspected mendelian basis
57. Searching OMIM
• Search Fields
– Name of trait, e.g., hypertension
– Cytogenetic location, e.g., 1p31.6
– Inheritance, e.g., autosomal dominant
– Gene, e.g., coagulation factor VIII
58. OMIM search tags
All Fields [ALL]
Allelic Variant [AV] or [VAR]
Chromosome [CH] or [CHR]
Clinical Synopsis [CS] or [CLIN]
Gene Map [GM] or [MAP]
Gene Name [GN] or [GENE]
Reference [RE] or [REF]
62. Enables you to search specifically for scholarly
literature, including peer-reviewed papers,
theses, books, preprints, abstracts and technical
reports from all broad areas of research.
What is Google Scholar?
63. Use Google Scholar to find articles from a
wide variety of academic publishers,
professional societies, preprint repositories
and universities, as well as scholarly articles
available across the web.
64. Google Scholar
orders your
search results by
how relevant they
are to your query,
so the most
useful references
should appear at
the top of the
page
This relevance
ranking takes into
account the: full
text of each article.
the article's author,
the publication in
which the article
appeared and how
often it has been
cited in scholarly
literature.
70. 6. Web of science
http://http://apps.webofknowledge.com.ezproxy.lib.uh.edu/WOS_GeneralSearch_input.do?product
=WOS&search_mode=GeneralSearch&SID=4FB7LbbLgDMhG9fDiLh&preferencesSaved=
74. Genomics
• Because of the multicellular structure, each cell type
does gene expression in a different way –although each
cell has the same content as far as the genetic
constitution.
• i.e. All the information for a liver cell to be a liver cell is
also present on nose cell, so gene expression is the only
thing that differentiates
75. Genomics - Finding Genes
• Gene in sequence data – needle in a haystack
• However as the needle is different from the
haystack genes are not diff from the rest of the
sequence data
• Is whole array of nt we try to find and border
mark a set o nt as a gene
• This is one of the challenges of bioinformatics
• Neural networks and dynamic programming are
being employed
76. Organism Genome
Size (Mb)
bp * 1,000,000
Gene
Number
Web Site
Yeast 13.5 6,241 http://genome-
www.stanford.edu
/Saccharomyces
Fruit Flies 180 13,601 http://flybase.bio.
indiana.edu
Homo
Sapiens
3,000 45,000 http://www.ncbi.n
lm.nih.gov/genom
e/guide
77. Proteomics
• Proteome is the sum total of an organisms
proteins
• More difficult than genomics
– 4 20
– Simple chemical makeup complex
– Can duplicate can’t
• We are entering into the ‘post genome era’
• Meaning much has been done with the Genes –
not that it’s a over
78. Proteomics…..
• The relationship between the RNA and the
protein it codes are usually very different
• After translation proteins do change
– So aa sequence do not tell anything about the
post translation changes
• Proteins are not active until they are combined
into a larger complex or moved to a relevant
location inside or outside the cell
• So aa only hint in these things
• Also proteins must be handled more carefully in
labs as they tend to change when in touch with
an inappropriate material
79. Protein Structure Prediction
• Is one of the biggest challenges of
bioinformatics and esp. biochemistry
• No algorithm is there now to consistently
predict the structure of proteins
80. Structure Prediction methods
• Comparative Modeling
– Target proteins structure is compared with
related proteins
– Proteins with similar sequences are searched
for structures
81. Phylogenetics
• The taxonomical system reflects
evolutionary relationships
• Phylogenetics trees are things which
reflect the evolutionary relationship thru a
picture/graph
• Rooted trees where there is only one
ancestor
• Un rooted trees just showing the
relationship
• Phylogenetic tree reconstruction
algorithms are also an area of research
83. Medical Implications
• Pharmacogenomics
– Not all drugs work on all patients, some good drugs
cause death in some patients
– So by doing a gene analysis before the treatment the
offensive drugs can be avoided
– Also drugs which cause death to most can be used
on a minority to whose genes that drug is well suited
– volunteers wanted!
– Customized treatment
• Gene Therapy
– Replace or supply the defective or missing gene
– E.g: Insulin and Factor VIII or Haemophilia
• BioWeapons (??)
84. Diagnosis of Disease
• Diagnosis of disease
– Identification of genes which cause the
disease will help detect disease at early stage
e.g. Huntington disease -
• Symptoms – uncontrollable dance like
movements, mental disturbance, personality
changes and intellectual impairment
• Death in 10-15 years
• The gene responsible for the disease has been
identified
• Contains excessively repeated sections of CAG
• So once analyzed the couple can be counseled
85. Drug Design
• Can go up to 15 yrs and $700 million
• One of the goals of bioinformatics is to
reduce the time and cost involved with it.
• The process
– Discovery
• Computational methods can improves this
– Testing
86. Discovery
Target identification
– Identifying the molecule on which the
germs relies for its survival
– Then we develop another molecule i.e.
drug which will bind to the target
– So the germ will not be able to interact
with the target.
– Proteins are the most common targets
87. Discovery…
• For example HIV produces HIV protease
which is a protein and which in turn eat
other proteins
• This HIV protease has an active site
where it binds to other molecules
• So HIV drug will go and bind with that
active site
– Easily said than done!
88. Discovery…
• Lead compounds are the molecules that
go and bind to the target protein’s active
site
• Traditionally this has been a trial and error
method
• Now this is being moved into the realm of
computers
89. Restriction Analysis of DNA
• Special enzymes termed restriction enzymes have been discovered in
many different bacteria and other single-celled organisms. These
enzymes act as chemical scissors to cut λ DNA into pieces.
• They are able to scan along a length of DNA looking for a particular
sequence of bases that they recognize.
• This recognition site or sequence is generally from 4 to 6 base pairs in
length. Once it is located, the enzyme will attach to the DNA molecule
and cut each strand of the double helix- the first step in a process called
restriction mapping.
• The restriction enzyme will continue to do this along the full length of the
DNA molecule which will then break into fragments. The size of these
fragments is measured in base pairs or kilobase (1000 bases) pairs.
• Since the recognition site or sequence of base pairs is known for each
restriction enzyme, we can use this to form a detailed analysis of the
sequence of bases in specific regions of the DNA in which we are
interested.
• This procedure is one of the most important in modern biology.
90. .... Restriction analysis
• In the presence of specific DNA repair enzymes, DNA
fragments will re-anneal or stick themselves to other fragments
with cut ends that are complimentary to their own end
sequence.
• It doesn’t matter if the fragment that matches the cut end
comes from the same organism or from a different one.
• This ability of DNA to repair itself has been utilized by scientists
to introduce foreign DNA into an organism.
• This DNA may contain genes that allow the organism to exhibit
a new function or process. This would include transferring
genes that will result in a change in the nutritional quality of a
crop or perhaps allow a plant to grow in a region that is colder
than its usual preferred area.
91. Example: Restriction Digestion and
Analysis of DNA from Bacteriophage λ
• This small virus is 48,502 base pairs in length which is very
small compared with the human genome of approximately 3
billion base pairs.
• Since the whole sequence of λ is already known we can predict
where each restriction enzyme will cut and thus the expected
size of the fragments that will be produced.
• If the virus DNA is exposed to the restriction enzyme for only a
short time, then not every restriction site will be cut by the
enzyme.
• This will result in fragments ranging in size from the smallest
possible (all sites are cut) to in-between lengths (some of the
sites are cut) to the longest (no sites are cut). This is termed a
partial restriction digestion.
92. .....
• After overnight digestion, the reaction is
stopped by addition of a loading buffer.
• The DNA fragments are separated by
electrophoresis, a process that involves
application of an electric field to cause the
DNA fragments to migrate into an agarose
gel.
• The gel is then stained with a methylene
blue stain to visualize the DNA bands and
may be photographed.
93. .....
• The movement of the fragments during electrophoresis
will always be towards the positive electrode because
DNA is a negatively charged molecule.
• The fragments move through the gel at a rate that is
determined by their size and shape, with the smallest
moving the fastest.
• DNA cannot be seen as it moves through the gel. That is
why a loading dye must be added to each of the samples
before it is pipetted into the wells.
• The progress of the dye can be seen in the gel. It will
initially appear as a blue band, eventually resolving into
two bands of different colours.
94. ......
• Restriction enzymes cut at specific sites along the DNA. These sites
are determined by the sequence of bases which usually form
palindromes.
• Palindromes are groups of letters that read the same in both the
forward and backwards orientation.
• In the case of DNA the letters are found on both the forward and the
reverse strands of the DNA.
• For example, the 5’ to 3’ strand may have the sequence GAATTC.
The complimentary bases on the opposite strand will be CTTAAG,
which is the same as reading the first strand backwards!
• Many enzymes recognize these types of sequences and will attach to
the DNA at this site and then cut the strand between two of the
bases. In this example, the DNA was digested with BamHI,
EcoRI and HindIII restriction enzymes, and their sequences are
as follows, with the cut site indicated by the arrow.
95.
96.
97. λ cut with EcoRI λ cut with HindIII λ cut with BamHI
99. Assignment: Using the graph in
next slide, address the following
• Calculate the size the resulting fragments will be after
digestion and write them on the map.
• How many fragments would you expect to see for each of the
maps A, B and C?
• Draw these fragments onto the graph in the next slide.
• Now compare the size of the fragments that you have
calculated with the bands shown in the photographs of the
gels and determine which of the enzymes, BamHI, EcoRI and
HindIII were used to cut A, B and C.
• How many times does the sequence GAATTC occur in the λ
DNA sequence? What about AAGCTT and GGATCC?