SlideShare a Scribd company logo
Next Generation Sequencing Analysis Series
February 11, 2015
Andrew Oler, PhD
High-throughput Sequencing Bioinformatics Specialist
BCBB/OCICB/NIAID/NIH
Bioinformatics and Computational
Biosciences Branch
§  “BCBB”
§  Group of ~30
§  Bioinformatics Software
Developers
§  Computational Biologists
§  Project Managers &
Analysts
http://www.niaid.nih.gov/about/organization/odoffices/omo/ocicb/Pages/bcbb.aspx 2
The plan…
§  Overview of ChIP-seq technology and concepts
§  Comparison of peak finding tools
§  Hands-on
–  USeq
§  Peak finding
§  Peak annotation
§  Visualization (in IGB)
§  More demos as time permits
3
Why are people interested in protein-DNA
contacts?
§  Proteins called transcription factors (TFs) are involved
in regulation of gene activation
§  The first step in gene activation is binding of the TF to
its target gene.
Gene X
RNA
Polymerase
TF
Chromatin Immunoprecipitation (ChIP)
See also: ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.
Genome Res. 2012 Sep;22(9):1813-31
Steps for ChIP-seq Analysis
1.  Perform ChIP and prepare ChIP library.
2.  Prepare separate library of input DNA from the same sonicate.
3.  Sequence both libraries (at least 20 million reads per library).
4.  Align reads to the reference genome
5.  Remove duplicate alignments
6.  Shift data and call peaks
7.  Downstream analysis
6
Sonicate
Input DNA
ChIP Eluate
DNA
Short Read Alignment
CTCTGCACGCGTGGGTTCGAATCCCACCTTCGTCGA!
Coordinate:
chr6 27,373,801
chr6
Empirical Determination of Peak Shift
Valouev et al., Nature Methods 2008
•  Peak determined for each
strand alignments
•  Distance between peaks is
“Peak Shift”
•  Center-shift data half the
Peak Shift distance (e.g.,
USeq)
•  Alternatively, extend 3’ end
to recreate the original
fragments (e.g., MACS)
Define Regions of Enrichment
•  Determine enrichment of
reads in ChIP library
versus control input library
•  Set statistical false
discovery rate (FDR)
threshold,
•  e.g., 5% FDR = 5 in 100
peaks false positives
•  Output
•  Interval data
•  (e.g., chr, start, stop)
•  Browser graph file
9
Control for non-specific peaks due
to “open” chromatin
Park, Nat Rev Genet, 2009
ChIP-seq Downstream Analysis
10
Supplemental Table 2: D1 Histone-enriched loci (Illumina GAII FDR< 0.0001)
Go Category Total
Genes
Changed
Genes
Enrichment FDR
Cell fate commitment 75 60 1.59848 0
Sequence-specific DNA binding 424 337 1.588112 0
Cellular morphogenesis during differentiation 125 99 1.582495 0
Cell projection organization and biogenesis 169 131 1.548823 0
Cell part morphogenesis 169 131 1.548823 0
Embryonic morphogenesis 88 68 1.543986 0
Regionalization 82 63 1.535126 0
Neurogenesis 221 168 1.518918 0
Wnt receptor signaling pathway 107 80 1.493907 0
Regulation of cell differentiation 119 88 1.477587 0
Regulation of transcription from RNA polymerase II
promoter
99 72 1.453164 0
Organ morphogenesis 304 221 1.452566 0
Embryonic development 226 164 1.449949 0
Regulation of developmental process 191 138 1.443653 0
Voltage-gated ion channel activity 171 123 1.43723 0
Nervous system development 604 433 1.432413 0
Cation channel activity 228 162 1.419703 0
Transcription factor activity 791 552 1.394376 0
Muscle development 136 94 1.38104 0
# Peaks Found in Different Tissues
Allele-specific Binding
Oler et al., NSMB, 2010; Mikkelsen et al., Nature, 2007; Park, Nat Rev Genet, 2009; Barski et al., Cell, 2007; Hammoud et al., Nature, 2009
Peak finding tools
§  CisGenome
§  FindPeaks
§  PeakSeq
§  ChIPseqR
§  PICS
§  F-Seq
§  GLITR
§  MACS
§  QuEST
•  SISSRS
•  USeq
•  Hpeak
•  SICER
•  ERANGE
•  ChromSig
•  Partek
•  Genomatix
•  CLC Bio
ChIP-seq Analysis with USeq
§  ChIPSeq wrapper
•  SamParser (converts SAM to PointData)
•  FilterDuplicateReads
•  ReadCoverage
•  PeakShiftFinder
•  MultipleReplicaScanSeqs or ScanSeqs (genome-wide windowed
statistical analysis)
§  MRSS uses DESeq2 package, SS uses Q-value package
•  EnrichedRegionMaker
§  Other Applications
•  DefinedRegionScanSeqs
•  MultipleReplicaDefinedRegionScanSeqs
•  AggregatePlotter
•  IntersectRegions
•  FindNeighboringGenes
12
ChIPSeq options
Options:
-s Save directory
-t Treatment directory containing aligned ChIP reads
-c Control directory containing aligned Input reads
-y Type of alignments (e.g., sam, bed, eland, novoalign)
-v Genome version (e.g., H_sapiens_Feb_2009), see
http://genome.ucsc.edu/FAQ/FAQreleases
-r Full path to R containing DESeq and Qvalue packages
-m Single replica analysis (run ScanSeqs instead of MultipleReplicaScanSeqs)
-f File to use to filter out known false positives (e.g., Satellites)
Command:
java -Xmx3G -jar ~/bio_apps/USeq/Apps/ChIPSeq -s ChIPSeq_out -t
Pol2 -c Input -y sam -v H_sapiens_Feb_2009 -r ~/bio_apps/R/bin/R -m
-f hg19_Satellites.bed.gz
Exercise 1:
cd ~/chipseq
qsub test_useq_chipseq.sh
13
Other Demos
1.  Run ChIPSeq in USeq GUI
• No command-line required
2.  Run FindNeighboringGenes in USeq GUI
• Find distance to nearest gene, or all genes
within a neighborhood
3.  View peaks in IGB
4.  Run AggregatePlotter in USeq GUI
• Make a class average map, e.g., 1kb
window around transcription start sites
14
Thank You
For questions or comments please contact:
andrew.oler@nih.gov
ScienceApps@niaid.nih.gov
15

More Related Content

What's hot

Protein micro array
Protein micro arrayProtein micro array
Protein micro array
krupa sagar
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
Zohaib HUSSAIN
 
Sts
StsSts
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
karamveer prajapat
 
Microarray technique
Microarray techniqueMicroarray technique
Microarray technique
arunchacko14
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
hemantbreeder
 
Homology modeling
Homology modelingHomology modeling
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
SAIFALI444
 
Lambda vector
Lambda vectorLambda vector
Lambda vector
kishoreGupta17
 
cloning and expression system in yeast
cloning and expression system in yeastcloning and expression system in yeast
cloning and expression system in yeast
ranjithahb ranjithahbhb
 
De novo peptide sequencing-creative proteomics
De novo peptide sequencing-creative proteomicsDe novo peptide sequencing-creative proteomics
De novo peptide sequencing-creative proteomics
Creative Proteomics
 
bacterial artificial chromosome & yeast artificial chromosome
bacterial artificial chromosome & yeast artificial chromosomebacterial artificial chromosome & yeast artificial chromosome
bacterial artificial chromosome & yeast artificial chromosome
ashapatel676
 
S1 Nuclease Mapping
S1 Nuclease MappingS1 Nuclease Mapping
S1 Nuclease Mapping
EmaSushan
 
Electrophoretic mobility shift assay
Electrophoretic mobility shift assay Electrophoretic mobility shift assay
Electrophoretic mobility shift assay
iqraakbar8
 
Site directed mutagenesis by pcr
Site directed mutagenesis by pcrSite directed mutagenesis by pcr
Site directed mutagenesis by pcr
pooranachithra flowry
 
Selection & Screening of Recombinant cells & expression of recombinant (2) (1)
Selection & Screening of  Recombinant cells & expression of recombinant (2) (1)Selection & Screening of  Recombinant cells & expression of recombinant (2) (1)
Selection & Screening of Recombinant cells & expression of recombinant (2) (1)
SunandaArya
 
Screening and selection of recombinants
Screening and selection of recombinants Screening and selection of recombinants
Screening and selection of recombinants
Kristu Jayanti College
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
Sachin Kumar
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
KAUSHAL SAHU
 
Restriction Mapping
Restriction MappingRestriction Mapping
Restriction Mapping
Sunil Bhandari
 

What's hot (20)

Protein micro array
Protein micro arrayProtein micro array
Protein micro array
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
Sts
StsSts
Sts
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Microarray technique
Microarray techniqueMicroarray technique
Microarray technique
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
Lambda vector
Lambda vectorLambda vector
Lambda vector
 
cloning and expression system in yeast
cloning and expression system in yeastcloning and expression system in yeast
cloning and expression system in yeast
 
De novo peptide sequencing-creative proteomics
De novo peptide sequencing-creative proteomicsDe novo peptide sequencing-creative proteomics
De novo peptide sequencing-creative proteomics
 
bacterial artificial chromosome & yeast artificial chromosome
bacterial artificial chromosome & yeast artificial chromosomebacterial artificial chromosome & yeast artificial chromosome
bacterial artificial chromosome & yeast artificial chromosome
 
S1 Nuclease Mapping
S1 Nuclease MappingS1 Nuclease Mapping
S1 Nuclease Mapping
 
Electrophoretic mobility shift assay
Electrophoretic mobility shift assay Electrophoretic mobility shift assay
Electrophoretic mobility shift assay
 
Site directed mutagenesis by pcr
Site directed mutagenesis by pcrSite directed mutagenesis by pcr
Site directed mutagenesis by pcr
 
Selection & Screening of Recombinant cells & expression of recombinant (2) (1)
Selection & Screening of  Recombinant cells & expression of recombinant (2) (1)Selection & Screening of  Recombinant cells & expression of recombinant (2) (1)
Selection & Screening of Recombinant cells & expression of recombinant (2) (1)
 
Screening and selection of recombinants
Screening and selection of recombinants Screening and selection of recombinants
Screening and selection of recombinants
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
 
Restriction Mapping
Restriction MappingRestriction Mapping
Restriction Mapping
 

Viewers also liked

Design of experiments
Design of experiments Design of experiments
Cytoscape
CytoscapeCytoscape
Network components and biological network construction methods
Network components and biological network construction methodsNetwork components and biological network construction methods
Network components and biological network construction methods
Bioinformatics and Computational Biosciences Branch
 
Biological networks
Biological networksBiological networks
Biological networks - building and visualizing
Biological networks - building and visualizingBiological networks - building and visualizing
Biological networks - building and visualizing
Bioinformatics and Computational Biosciences Branch
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
Bioinformatics and Computational Biosciences Branch
 
Protein-protein interaction networks
Protein-protein interaction networksProtein-protein interaction networks
Protein-protein interaction networks
Bioinformatics and Computational Biosciences Branch
 
RNA-Seq
RNA-SeqRNA-Seq

Viewers also liked (9)

Design of experiments
Design of experiments Design of experiments
Design of experiments
 
Cytoscape
CytoscapeCytoscape
Cytoscape
 
Network components and biological network construction methods
Network components and biological network construction methodsNetwork components and biological network construction methods
Network components and biological network construction methods
 
Biological networks
Biological networksBiological networks
Biological networks
 
Biological networks - building and visualizing
Biological networks - building and visualizingBiological networks - building and visualizing
Biological networks - building and visualizing
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Protein-protein interaction networks
Protein-protein interaction networksProtein-protein interaction networks
Protein-protein interaction networks
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 

Similar to ChIP-seq Theory

CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
HorizonDiscovery
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
Silpa87
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
nist-spin
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
Nolan Nichols
 
Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...
Ed Dodds
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Nathan Olson
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
Golden Helix
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
Pawan Kumar
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
Fahadahammed2
 
May workshop
May workshopMay workshop
May workshop
Fahadahammed2
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
Jane Landolin
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Golden Helix Inc
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
Denis C. Bauer
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
GenomeInABottle
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
Golden Helix
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
Golden Helix
 
Fauteux World ADC 2017 San Diego
Fauteux World ADC 2017 San DiegoFauteux World ADC 2017 San Diego
Fauteux World ADC 2017 San Diego
François Fauteux
 
Building Secure Analysis and Storage Systems with Golden Helix
Building Secure Analysis and Storage Systems with Golden HelixBuilding Secure Analysis and Storage Systems with Golden Helix
Building Secure Analysis and Storage Systems with Golden Helix
Golden Helix
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
RuthMWinnie
 

Similar to ChIP-seq Theory (20)

CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...Supporting high throughput high-biotechnologies in today’s research environme...
Supporting high throughput high-biotechnologies in today’s research environme...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
 
May workshop
May workshopMay workshop
May workshop
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
Fauteux World ADC 2017 San Diego
Fauteux World ADC 2017 San DiegoFauteux World ADC 2017 San Diego
Fauteux World ADC 2017 San Diego
 
Building Secure Analysis and Storage Systems with Golden Helix
Building Secure Analysis and Storage Systems with Golden HelixBuilding Secure Analysis and Storage Systems with Golden Helix
Building Secure Analysis and Storage Systems with Golden Helix
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 

More from Bioinformatics and Computational Biosciences Branch

Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
Bioinformatics and Computational Biosciences Branch
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
Bioinformatics and Computational Biosciences Branch
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
Bioinformatics and Computational Biosciences Branch
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Protein function prediction
Protein function predictionProtein function prediction
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
Bioinformatics and Computational Biosciences Branch
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
Bioinformatics and Computational Biosciences Branch
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
Bioinformatics and Computational Biosciences Branch
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Categorical models
Categorical modelsCategorical models
Better graphics in R
Better graphics in RBetter graphics in R
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
Bioinformatics and Computational Biosciences Branch
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
Bioinformatics and Computational Biosciences Branch
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
Bioinformatics and Computational Biosciences Branch
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting
Appendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductorAppendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductor
Bioinformatics and Computational Biosciences Branch
 

More from Bioinformatics and Computational Biosciences Branch (20)

Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Intro to homology modeling
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Intro to JMP for statistics
 
Categorical models
Categorical modelsCategorical models
Categorical models
 
Better graphics in R
Better graphics in RBetter graphics in R
Better graphics in R
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting
GraphPad Prism: Curve fitting
 
Appendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductorAppendix: Crash course in R and BioConductor
Appendix: Crash course in R and BioConductor
 

Recently uploaded

Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 

Recently uploaded (20)

Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 

ChIP-seq Theory

  • 1. Next Generation Sequencing Analysis Series February 11, 2015 Andrew Oler, PhD High-throughput Sequencing Bioinformatics Specialist BCBB/OCICB/NIAID/NIH
  • 2. Bioinformatics and Computational Biosciences Branch §  “BCBB” §  Group of ~30 §  Bioinformatics Software Developers §  Computational Biologists §  Project Managers & Analysts http://www.niaid.nih.gov/about/organization/odoffices/omo/ocicb/Pages/bcbb.aspx 2
  • 3. The plan… §  Overview of ChIP-seq technology and concepts §  Comparison of peak finding tools §  Hands-on –  USeq §  Peak finding §  Peak annotation §  Visualization (in IGB) §  More demos as time permits 3
  • 4. Why are people interested in protein-DNA contacts? §  Proteins called transcription factors (TFs) are involved in regulation of gene activation §  The first step in gene activation is binding of the TF to its target gene. Gene X RNA Polymerase TF
  • 5. Chromatin Immunoprecipitation (ChIP) See also: ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012 Sep;22(9):1813-31
  • 6. Steps for ChIP-seq Analysis 1.  Perform ChIP and prepare ChIP library. 2.  Prepare separate library of input DNA from the same sonicate. 3.  Sequence both libraries (at least 20 million reads per library). 4.  Align reads to the reference genome 5.  Remove duplicate alignments 6.  Shift data and call peaks 7.  Downstream analysis 6 Sonicate Input DNA ChIP Eluate DNA
  • 8. Empirical Determination of Peak Shift Valouev et al., Nature Methods 2008 •  Peak determined for each strand alignments •  Distance between peaks is “Peak Shift” •  Center-shift data half the Peak Shift distance (e.g., USeq) •  Alternatively, extend 3’ end to recreate the original fragments (e.g., MACS)
  • 9. Define Regions of Enrichment •  Determine enrichment of reads in ChIP library versus control input library •  Set statistical false discovery rate (FDR) threshold, •  e.g., 5% FDR = 5 in 100 peaks false positives •  Output •  Interval data •  (e.g., chr, start, stop) •  Browser graph file 9 Control for non-specific peaks due to “open” chromatin Park, Nat Rev Genet, 2009
  • 10. ChIP-seq Downstream Analysis 10 Supplemental Table 2: D1 Histone-enriched loci (Illumina GAII FDR< 0.0001) Go Category Total Genes Changed Genes Enrichment FDR Cell fate commitment 75 60 1.59848 0 Sequence-specific DNA binding 424 337 1.588112 0 Cellular morphogenesis during differentiation 125 99 1.582495 0 Cell projection organization and biogenesis 169 131 1.548823 0 Cell part morphogenesis 169 131 1.548823 0 Embryonic morphogenesis 88 68 1.543986 0 Regionalization 82 63 1.535126 0 Neurogenesis 221 168 1.518918 0 Wnt receptor signaling pathway 107 80 1.493907 0 Regulation of cell differentiation 119 88 1.477587 0 Regulation of transcription from RNA polymerase II promoter 99 72 1.453164 0 Organ morphogenesis 304 221 1.452566 0 Embryonic development 226 164 1.449949 0 Regulation of developmental process 191 138 1.443653 0 Voltage-gated ion channel activity 171 123 1.43723 0 Nervous system development 604 433 1.432413 0 Cation channel activity 228 162 1.419703 0 Transcription factor activity 791 552 1.394376 0 Muscle development 136 94 1.38104 0 # Peaks Found in Different Tissues Allele-specific Binding Oler et al., NSMB, 2010; Mikkelsen et al., Nature, 2007; Park, Nat Rev Genet, 2009; Barski et al., Cell, 2007; Hammoud et al., Nature, 2009
  • 11. Peak finding tools §  CisGenome §  FindPeaks §  PeakSeq §  ChIPseqR §  PICS §  F-Seq §  GLITR §  MACS §  QuEST •  SISSRS •  USeq •  Hpeak •  SICER •  ERANGE •  ChromSig •  Partek •  Genomatix •  CLC Bio
  • 12. ChIP-seq Analysis with USeq §  ChIPSeq wrapper •  SamParser (converts SAM to PointData) •  FilterDuplicateReads •  ReadCoverage •  PeakShiftFinder •  MultipleReplicaScanSeqs or ScanSeqs (genome-wide windowed statistical analysis) §  MRSS uses DESeq2 package, SS uses Q-value package •  EnrichedRegionMaker §  Other Applications •  DefinedRegionScanSeqs •  MultipleReplicaDefinedRegionScanSeqs •  AggregatePlotter •  IntersectRegions •  FindNeighboringGenes 12
  • 13. ChIPSeq options Options: -s Save directory -t Treatment directory containing aligned ChIP reads -c Control directory containing aligned Input reads -y Type of alignments (e.g., sam, bed, eland, novoalign) -v Genome version (e.g., H_sapiens_Feb_2009), see http://genome.ucsc.edu/FAQ/FAQreleases -r Full path to R containing DESeq and Qvalue packages -m Single replica analysis (run ScanSeqs instead of MultipleReplicaScanSeqs) -f File to use to filter out known false positives (e.g., Satellites) Command: java -Xmx3G -jar ~/bio_apps/USeq/Apps/ChIPSeq -s ChIPSeq_out -t Pol2 -c Input -y sam -v H_sapiens_Feb_2009 -r ~/bio_apps/R/bin/R -m -f hg19_Satellites.bed.gz Exercise 1: cd ~/chipseq qsub test_useq_chipseq.sh 13
  • 14. Other Demos 1.  Run ChIPSeq in USeq GUI • No command-line required 2.  Run FindNeighboringGenes in USeq GUI • Find distance to nearest gene, or all genes within a neighborhood 3.  View peaks in IGB 4.  Run AggregatePlotter in USeq GUI • Make a class average map, e.g., 1kb window around transcription start sites 14
  • 15. Thank You For questions or comments please contact: andrew.oler@nih.gov ScienceApps@niaid.nih.gov 15