This document provides an introduction to the field of bioinformatics. It discusses how bioinformatics originated from the need to collect, annotate, and analyze biological sequence data. One pioneering researcher, Margaret Dayhoff, collected all known protein structures and sequences in 1965 and developed early algorithms to compare sequences and study evolutionary relationships. The document outlines several subdisciplines of bioinformatics including sequence alignment, genome annotation, analysis of gene expression, pathway analysis, and literature analysis. It provides examples of using tools like BLAST for sequence alignment and the UCSC Genome Browser for comparative genomics analysis.
Los proyectos a continuación son de estudiantes de clases de Computadora Básica Adultos que realizaron trabajos en los programas de Microsoft Excel, Word, Power Point, Publisher y Blogs individuales. Felicidades y gracias por la Oportunidad de Servir.
Los proyectos a continuación son de estudiantes de clases de Computadora Básica Adultos que realizaron trabajos en los programas de Microsoft Excel, Word, Power Point, Publisher y Blogs individuales. Felicidades y gracias por la Oportunidad de Servir.
Interoperabilidade de Documentos Arquivísticos: dos Sistemas de Negócio ao SI...Daniel Flores
FLORES, Daniel. Interoperabilidade de documentos arquivísticos: dos sistemas de negócio ao SIGAD e ao RDC-Arq. Palestra. Rio de Janeiro - RJ. 73 slides, color, Padrão Slides Google Drive/Docs 4x3. Material elaborado para a Palestra no Comissão Nacional de Energia Nuclear - CNEN. 13 de maio de 2016. Disponível em: <http: />. Acesso em: 13 de maio 2016.
En esta presentacion hablaremos de lo que es y para que sirve Google Desktop al igual que sus requisitos para su instalacion y una pequeña practica de como manejarlo.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Interoperabilidade de Documentos Arquivísticos: dos Sistemas de Negócio ao SI...Daniel Flores
FLORES, Daniel. Interoperabilidade de documentos arquivísticos: dos sistemas de negócio ao SIGAD e ao RDC-Arq. Palestra. Rio de Janeiro - RJ. 73 slides, color, Padrão Slides Google Drive/Docs 4x3. Material elaborado para a Palestra no Comissão Nacional de Energia Nuclear - CNEN. 13 de maio de 2016. Disponível em: <http: />. Acesso em: 13 de maio 2016.
En esta presentacion hablaremos de lo que es y para que sirve Google Desktop al igual que sus requisitos para su instalacion y una pequeña practica de como manejarlo.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
4. Margaret Dayhoff (1925-1983)
● Collected all known protein
structures & sequences
● Published Atlas in 1965
● Pioneered algorithm development
for:
○ Comparing protein sequences
○ Deriving evolutionary history from
alignments
“In this paper we shall describe a completed
computer program for the IBM 7090, which to
our knowledge is the first successful attempt
at aiding the analysis of the amino acid chain
structure of protein.”
6. “There is a tremendous amount of information
regarding evolutionary history and biochemical
function implicit in each sequence and the
number of known sequences is growing
explosively. We feel it is important to collect
this significant information, correlate it into a
unified whole and interpret it.”
M. Dayhoff, February 27, 1967
8. ted ed
en nt
t inv ve
ne et in
PA er n
W
W
AR Int W
1960 1970 1980 1990 2000 2010
s g k g
At
la ci
n an L in
f n B B nc
f
qu
e en EM ue
ho e G I- eq
ay S EB S
D
g er en
an t -G
S ex
N
9.
10. Definition
From Wikipedia: Bioinformatics is a branch of biological science which deals with the study of methods for storing,
retrieving and analyzing biological data, such as nucleic acid (DNA/RNA) and protein sequence, structure, function,
pathways and genetic interactions. It generates new knowledge that is useful in such fields as drug design and
development of new software tools to create that knowledge. Bioinformatics also deals with algorithms, databases and
information systems, web technologies, artificial intelligence and soft computing, information and computation theory,
structural biology, software engineering, data mining, image processing, modeling and simulation, discrete
mathematics, control and system theory, circuit theory, and statistics.
Our definition: using computer science and
statistics to answer biological questions.
12. Central Dogma
Reverse RNA
transcription Silencing Prions
DNA RNA Protein
Post-translational
modification
Methylation
13. Protein folding determines
molecular function
DNA provides assembly
instructions for proteins
Networks of interacting
proteins determine
tissue/organ function
14. Protein folding determines
molecular function
DNA variant analysis
Gene expression analysis
Genome annotation Pathway analysis
Epigenetics Systems biology
DNA provides assembly Biomarker ID'n
instructions for proteins
Networks of interacting
miRNA analysis proteins determine
Quantitative MS tissue/organ function
Proteomics
25. Genetic Epidemiology
Epidemiology: the study of the patterns,
causes, and effects of health and disease
conditions in defined populations.
Genetic epidemiology: the study of genetic
factors in determining health and disease in
families and populations.
26. Protein folding determines
molecular function
DNA provides assembly
instructions for proteins
Networks of interacting
proteins determine
tissue/organ function
27. Genetic epidemiology
● Linkage: finding genetic loci that segregate
with the disease in families.
● Association: finding alleles that co-occur with
disease in populations.
○ Common disease - common variant hypothesis:
■ Common variants (e.g. >1-5% in the population)
contribute to common, complex disease).
○ Common disease - rare variant hypothesis:
■ Polymorphisms that cause disease are under
purifying selection, and will thus be rare.
○ Really, it's a mix of both
28. Candidate gene study
● Select candidate genes based on:
○ Known biology
○ Previous linkage/association evidence
○ Pathways
○ Evidence from model organisms
● Genotype variants (SNPs) in those genes
● Statistical association
Genotype at position rs12345: A/A Genotype at position rs12345: A/T Genotype at position rs12345: T/T
29. Genome-wide association study
● Genotype >500,000 SNPs
● Statistical test at each one
● Manhattan plot of results
● GWAS does not inform:
○ Which gene affected
○ How gene function perturbed
○ How biological function altered
39. RNA-seq challenges
● Library construction
○ Size selection (messenger, small)
○ Strand specificity?
● Bioinformatic challenges
○ Spliced alignment
○ Transcript deconvolution
● Statistical Challenges
○ Highly variable abundance
○ Sample size: never, ever, plan n=1
● Normalization (RPKM)
○ Compare features of different lengths
○ Compare conditions with different
sequence depth
40. Common question #1: Depth
● Question: how much sequence do I need?
● Answer: it’s complicated.
● Depends on:
○ Size & complexity of transcriptome
○ Application: differential gene expression, transcript
discovery, aberrant splicing, etc.
○ Tissue type, RNA quality, library preparation
○ Sequencing type: length, single-/paired-end, etc.
● Find publication in your field w/ similar goals.
● Good news: 1 GA or ½ HiSeq lane is
sufficient for most applications
41. Common question #2: Sample Size
● Question: How many samples should I
sequence?
● Oversimplified Answer: At least 3 biological
replicates per condition.
● Depends on:
○ Sequencing depth
○ Application
○ Goals (prioritization, biomarker discovery, etc.)
○ Effect size, desired power, statistical significance
● Find a publication with similar goals
42. Common question #3: Workflow
● How do I analyze the data?
● No standards!
○ Unspliced aligners: BWA, Bowtie, Stampy, SHRiMP
○ Spliced aligners: Tophat, MapSplice, SpliceMap, GSNAP, QPALMA
○ Reference builds & annotations: UCSC, Entrez, Ensembl
○ Assembly: Cufflinks, Scripture, Trinity, G.Mor.Se, Velvet, TransABySS
○ Quantification: Cufflinks, RSEM, MISO, ERANGE, NEUMA, Alexa-Seq
○ Differential expression: Cuffdiff, DegSeq, DESeq, EdgeR, Myrna
● Like early microarray days: lots of excitement, lots of
tools, little knowledge of integrating tools in pipeline!
● Benchmarks
● Microarray: Spike-ins (Irizarry)
● RNA-Seq: ???, simulation, ???
43. Phases of NGS analysis
● Primary
○ Conversion of raw machine signal into sequence and qualities
● Secondary
○ Alignment of reads to reference genome or transcriptome
○ De novo assembly of reads into contigs
● Tertiary
○ SNP discovery/genotyping
○ Peak discovery/quantification (ChIP, MeDIP)
○ Transcript assembly/quantification (RNA-seq)
● Quaternary
○ Differential expression
○ Enrichment, pathways, correlation, clustering, visualization, etc.
44. Extra credit (not really): RNA-seq
http://bit.ly/galaxy-rnaseq
● #1: learn to use galaxy: bit.ly/uva-galaxy
● #2: Run through an RNA-seq exercise in 1 hour:
○ Read some background material on RNA-seq
○ Read the tophat/cufflinks method paper
○ Get some data (Illumina BodyMap)
○ QC / trim your reads
○ Map to hg19 with tophat
○ Visualize where reads map
○ Assemble with cufflinks
○ Differential expression with cuffdiff
46. How are genes regulated?
● Transcription factors (ChIP-seq)
● Micro-RNAs (RNA-seq)
● Chromatin accessibility (DNAse-Seq)
● DNA Methylation (RRBS-seq, MeDIP-seq)
● RNA processing
● RNA transport
● Translation
● Post-translational modification
47. Importance of DNA methylation
● Occurs most frequently at CpG sites
● High methylation at promoters ≈ silencing
● Methylation perturbed in cancer
● Methylation associated with many other
complex diseases: neural, autoimmune,
response to env.
● Mapping DNA methylation → new disease
genes & drug targets.
48. DNA Methylation Challenges
● Dynamic and tissue-specific
● DNA → Collection of cells which vary in
5meC patterns → 5meC pattern is complex.
● Further, uneven distribution of CpG targets
● Multiple classes of methods:
○ Bisulfite, sequence-based: Assay methylated target
sequences across individual DNAs.
○ Affinity enrichment, count-based: Assay methylation
level across many genomic loci.
● Many methods
● Many algorithms
49. Many methylation methods
Gene RNA-Seq High-throughput cDNA sequencing
Expression
BS-Seq Whole-genome bisulfite sequencing
RRBS-Seq Reduced representation bisulfite sequencing
BC-Seq Bisulfite capture sequencing
BSPP Bisulfite specific padlock probes
Methyl-Seq Restriction enzyme based methyl-seq
DNA MSCC Methyl sensitive cut counting
Methylation HELP-Seq HpaII fragment enrichment by ligation PCR
MCA-Seq Methylated CpG island amplification
MeDIP-Seq Methylated DNA immunoprecipitation
MBP-Seq Methyl-binding protein sequencing
MethylCap-seq Methylated DNA capture by affinity purification
MIRA-Seq Methylated CpG island recovery assay
51. Methylation: Bioinformatics Resources
Resource Purpose URL Refs
Batman MeDIP DNA methylation analysis tool http://td-blade.gurdon.cam.ac.uk/software/batman
BDPC DNA methylation analysis platform http://biochem.jacobs-university.de/BDPC
BSMAP Whole-genome bisulphite sequence mapping http://code.google.com/p/bsmap
CpG Analyzer Windows-based program for bisulphite DNA -
CpGcluster CpG island identification http://bioinfo2.ugr.es/CpGcluster
CpGFinder Online program for CpG island identification http://linux1.softberry.com
CpG Island Explorer Online program for CpG Island identification http://bioinfo.hku.hk/cpgieintro.html
CpG Island Searcher Online program for CpG Island identification http://cpgislands.usc.edu
CpG PatternFinder Windows-based program for bisulphite DNA -
CpG Promoter Large-scale promoter mapping using CpG islands http://www.cshl.edu/OTT/html/cpg_promoter.html
CpG ratio and GC content Plotter Online program for plotting the observed:expected ratio of CpG http://mwsross.bms.ed.ac.uk/public/cgi-bin/cpg.pl
CpGviewer Bisulphite DNA sequencing viewer http://dna.leeds.ac.uk/cpgviewer
CyMATE Bisulphite-based analysis of plant genomic DNA http://www.gmi.oeaw.ac.at/en/cymate-index/
EMBOSS CpGPlot/ CpGReport Online program for plotting CpG-rich regions http://www.ebi.ac.uk/Tools/emboss/cpgplot/index.html
Epigenomics Roadmap NIH Epigenomics Roadmap Initiative homepage http://nihroadmap.nih.gov/epigenomics
Epinexus DNA methylation analysis tools http://epinexus.net/home.html
MEDME Software package (using R) for modelling MeDIP experimental data http://espresso.med.yale.edu/medme
methBLAST Similarity search program for bisulphite-modified DNA http://medgen.ugent.be/methBLAST
MethDB Database for DNA methylation data http://www.methdb.de
MethPrimer Primer design for bisulphite PCR http://www.urogene.org/methprimer
methPrimerDB PCR primers for DNA methylation analysis http://medgen.ugent.be/methprimerdb
MethTools Bisulphite sequence data analysis tool http://www.methdb.de
MethyCancer Database Database of cancer DNA methylation data http://methycancer.psych.ac.cn
Methyl Primer Express Primer design for bisulphite PCR http://www.appliedbiosystems.com/
Methylumi Bioconductor pkg for DNA methylation data from Illumina http://www.bioconductor.org/packages/bioc/html/
Methylyzer Bisulphite DNA sequence visualization tool http://ubio.bioinfo.cnio.es/Methylyzer/main/index.html
mPod DNA methylation viewer integrated w/ Ensembl genome browser http://www.compbio.group.cam.ac.uk/Projects/
PubMeth Database of DNA methylation literature http://www.pubmeth.org
QUMA Quantification tool for methylation analysis http://quma.cdb.riken.jp
TCGA Data Portal Database of TCGA DNA methylation data http://cancergenome.nih.gov/dataportal
53. One gene, one enzyme, one function?
Zhu X. et al. (2007). Genes & Dev 21:1010-1024. Jeong, H. et al.. (2001) Nature 411:41–42.
Ptacek, J. et al. (2005) Nature 438:679–684. Guimera and Amaral. (2005). Nature 433:895-900. Tong, A.H. et al. (2001). Science 294:2364-2368.
54. Distribution of disease genes
Diseases connected if same
gene implicated in both.
Genes connected if
implicated in the same
disorder.
Goh et al. (2007). PNAS 104:8685.
55. Distribution of disease genes
Overlay with PPI data
Genes contributing to a common
disease interact through protein-
protein interactions.
Genes connected if
implicated in the same
disorder.
Goh et al. (2007). PNAS 104:8685.
56. Distribution of disease genes
Seebacher and Gavin (2011). Cell 144:1000-
1001
● “Essential” genes
k = degree ● Encode hubs
= # interaction partners ● Are expressed globally
● “Non-essential” disease genes
● Do not encode hubs
● Tissue specific expression
57. Distribution of disease genes
● Disease genes at functional periphery of cellular networks (Goh PNAS 2007).
● Genes contributing to a common disease interact through protein-protein
interactions (Goh PNAS 2007).
● Diseaseome analysis: Pt 2x likely to develop another disease if that
disease shares gene with pt’s primary disease (Park et al. 2009. The Impact of Cellular
Networks on Disease Comorbidity. Mol Syst Biol 5:262).
● miRNA analysis: If connect diseases with associated genes regulated by
common miRNA, get disease-class segregation. E.g. cancers share similar
associations at miRNA level (Lu et al. 2009. An analysis of human microRNA and disease associations.
PLoS ONE 3:e3420).
Nonrandom placement of
disease genes in interactome!
59. Distribution of disease genes
● Data is cheap and diverse.
○ Genetic variation: GWAS, next-gen sequencing
○ Gene expression: Microarray, RNA-seq
○ Proteomics: Y2H, CoAP/MS
● Cellular components interact in a network
with other cellular components.
● Disease is the result of an abnormality in
that network.
● Integrate multiple data types, understand
network, understand disease.
60. Pathway Analysis
● You’ve done your microarray/RNA-Seq experiment
○ You have a list of genes
○ Want to put these into functional context
○ What biological processes are perturbed?
○ What pathways are being dysregulated?
○ Data reduction: hundreds or thousands of genes can be reduced to
10s of pathways
○ Identifying active pathways = more explanatory power
● “Pathway analysis” encompasses many, many
techniques:
○ 1st Generation: Overrepresentation Analysis (E.g. GO ORA)
○ 2nd Generation: Functional Class Scoring (e.g. GSEA)
○ 3rd Generation (in development): Pathway Topology (E.g. SPIA)
● http://gettinggeneticsdone.com/2012/03/pathway-analysis-for-high-throughput.html
61. Pathway Analysis: Over-
representation analysis
● Many variations on the same theme:
statistically evaluates the fraction of genes in
particular pathway that show changes in
expression.
● Algorithm:
○ Create input list (e.g. “significant at p<0.05”)
○ For each gene set:
■ Count number of input genes
■ Count number of “background” genes (e.g. all genes on platform).
○ Test each pathway for over-representation of input
genes
● Gene Set: typically gene ontology (GO)
term.
62. Pathway analysis: over-
representation analysis
● Ontology = formal representation of a knowledge
domain.
● Gene ontology = cell biology.
● GO represented by directed acyclic graph (DAG).
○ Terms are nodes, relationships are edges.
○ Parent terms are more general than their child terms.
○ Unlike a simple tree, terms can have multiple parents.
Rhee, S. Y., Wood, V., Dolinski, K., & Draghici, S. (2008). Use and misuse of the gene ontology annotations. Nature Reviews Genetics, 9(7), 509-15.
63. Pathway analysis:
Over-representation analysis
● Algorithm:
○ Create input list (e.g. “significant at p<0.05”)
○ For each gene set:
■ Count number of input genes
■ Count number of “background” genes (e.g. all genes on platform).
○ Test each pathway for over-representation of input genes
● Ex: GO “Purine Ribonucleotide Biosynthetic Process”
○ 1% of input (significant) genes are annotated with this term.
○ 1% of genes on the chip are annotated with this term.
○ Not significantly overrepresented.
● Ex: GO “V(D)J Recombination”
○ 20% of input (significant) genes are annotated with this term.
○ 1% of genes on the chip are annotated with this term.
○ Highly significantly over-represented!
64. Pathway analysis
● Pathway analysis gives you more biological
insight than staring at lists of genes.
● Pathway analysis is complex, and has many
limitations.
● Pathway analysis is still more of an
exploratory procedure rather than a pure
statistical endpoint.
● The best conclusions are made by viewing
enrichment analysis results through the lens
of the investigator’s expert biological
knowledge.
66. Resources: Online community &
discussion forum
● Seqanswers
○ http://SEQanswers.com
○ Twitter: @SEQquestions
○ Format: Forum
○ Li et al. SEQanswers : An open access community
for collaboratively decoding genomes. Bioinformatics
(2012).
● BioStar:
○ http://biostar.stackexchange.com
○ Twitter: @BioStarQuestion
○ Format: Q&A
○ Parnell et al. BioStar: an online question & answer
resource for the bioinformatics community. PLoS
Comp Bio (2011) 7:e1002216.
67. Resources: further education
stephenturner.us/p/edu
Regularly updated, comprehensive list of over 20 in-
person and free online workshops in bioinformatics,
programming, statistics, genetics, etc.
68. Publicly Available Data: NCBI
● Genbank: http://www.ncbi.nlm.nih.gov/genbank/
○ Collection of all publicly available DNA sequences.
○ Feb 2013: 150,141,354,858 bases from 162,886,727 sequences.
● NCBI Genomes: http://www.ncbi.nlm.nih.gov/genome/
○ Public repository for sequenced genomes.
○ March 2013: 3,005 eukaryotes, 19,125 prokaryotes, 3,570 viruses.
● NCBI Taxonomy: http://www.ncbi.nlm.nih.gov/taxonomy
○ Publicly available classification and nomenclature database for all organisms in the public
sequences database.
○ Phylogenetic lineages for >160,000 organisms (est. ~10% life on the planet)
● GEO: http://www.ncbi.nlm.nih.gov/geo/
○ Public repository of sequence- and array-based gene expression data, free for the taking.
○ 900,000+ samples, 3,200+ datasets.
● dbGaP: http://www.ncbi.nlm.nih.gov/gap
○ Public repository for genetic studies.
○ 2,500+ datasets, 100,000+ variables.
● SRA: http://www.ncbi.nlm.nih.gov/sra
○ Public repository for raw sequencing data from NGS platforms.
○ 3,500,000,000,000,000 bases sequenced.