SlideShare a Scribd company logo
COMPUTER DATA ANALYSIS 
OF GENOME SEQUENCING 
BY TECHNOLOGY ChIP-seq 
AND Hi-C 
adviser–Yuri Orlov, ICG SB RAS 
author– Kulakova Ekaterina, bachelor
Topicality 
 Automated systems allow decoding DNA and genomic sequences up to whole genomes. The 
complete sequencing of genomes leads to avalanche growth on the sequence information 
(megabytes and gigabytes of data). 
 Development of methods based on chromatin immunoprecipitation (ChIP-seq, ChIA-PET) gives 
a qualitatively new data. 
 There are new tasks of computer genomics (analysis of spatial, non-linear structures of 
chromosomes) 
Aim and Scientific novelty 
The aim of this work - the study of chromosomal contacts in the cell nucleus with the help of 
computer programs statistical data of genes and chromosomal domains, experimental data 
analysis ChIP-seq and Hi-C. 
 Integration of modern genome-wide ChIP-seq data and Hi-C, which became available only in 
the last two or three year 
 Using the parameter precision location on chromosome with which to analyze the data 
 Establishing a list of genes located on chromosome boundaries of topological domains. 
*ChIP- Seq = Chromatin ImmunoPrecipitation sequencing 
ChIA-PET = Chromatin Interaction Analysis by Paired-End-Tag sequencing
Methods Hi-C and ChIA-PET* 
Arrangement of chromosomes in 
the cell nucleus (reconstruction 
according to Hi-C) 
Comprehensive Mapping of Long-Range 
Interactions Reveals Folding Principles of the 
Human Genome. Science, 2009 
Topological arrangement of the 
domains of chromosomes and its 
mapping in the genome 
Scheme of local chromosomal 
domains ("tangle" contacts) 
*ChIP- Seq = Chromatin ImmunoPrecipitation sequencing 
ChIA-PET = Chromatin Interaction Analysis by Paired-End-Tag sequencing 
Hi-C = Hi (high dimension chromosome) Conformation 
Separate loops 
«tangle» 
(Dixon et al., 2012) 
Scheme of arrangement of 
genes on chromosome
Genomic data: genes, peaks ChIP-seq, 
contact areas ChIA-PET 
genes 
genes 
Plot of 
chromosomal 
contacts ChIA-PET 
Chromosomal domain 
Peaks of ChIP-seq 
profiles
File formats and their presentation 
Bed-file example 
 >track name=ER_E2 description=ER_E2 
 chr1 557112 558114 
 chr1 559459 560286 
 chr1 998864 999397 
 chr1 999399 999604 
 chr1 1004343 1005146 
 chr1 1070346 1071080 
 chr1 1305474 1306502 
 chr1 1358287 1358744 
 chr1 1776987 1777750 
 chr1 1820476 1821168 
 chr1 1922754 1923628 
 chr1 2131962 2132747 
 chr1 2325805 2326447 
 chr1 2368996 2369977 
 chr1 3119829 3120541 
 chr1 3244610 3245121 
 … 
Data about domains in mouse cells - 
obtained in the laboratory O.L.Serov (ICG 
SB RAS) (Fib_domains, Sp_domains). 
The size of one file with the 
genomic profile - from 100 MB to 
2-3 Gb 
RefSeq annotation taken from UCSC Genome 
Browser 
http://genome.ucsc.edu/cgi-bin/hgTables
Calculation of the position of genes and 
domain boundaries 
 А1 – left coordinate of the gene B1 - right coordinate of the gene. 
 А2 – left coordinate of the domain, В2 – right coordinate of the domain. 
 Е – accuracy, user-defined. 
 if (|А1 – А2| <= Е) & (В1 < А2 + (В2 – А2)/2) true, we assume that the gene 
lies close to the left boundary of the domain. Similar conditions for the right 
border. 
Е 
А2 А1 В1 В2 
домен 
ген 
Example of location of chromosomal 
domains and genes for mouse 
chromosome 10 The linear arrangement of genes in the domain
Table location types of genes in chromosomal 
domains 
Other – other genes 
Inside – genes that lie within the domains 
onBorder – genes lying on the domain 
boundaries.
Analysis of the location set of genes on 
the domains in different cell types 
 User specifies a list of genes. Possible to analyze all the genes in the genome 
(20,000 genes) 
 Types of cells - embryonic stem cells (fibroblasts - Fib) and sperm (Sp) 
mouse. Experiment Hi-C, ICG SB RAS 
Sp (densely packed 
structure) 
92,5 % genes within domains 
1,4% on border 
6,1% other 
Fib (Open chromatin) 
72,6 % genes within domains 
3,2% on border 
24% other
Experimental data. 
Gene Ontology categories 
For analysis were taken genes lying on the 
domain boundaries. 
The result was sorted by the number of 
genes with common biological processes 
category 
Used online resource 
http://david.abcc.ncifcrf.gov/
Analysis of the co-expression of genes, lying on the 
borders of the spatial domain 
For analysis were taken genes located on the domain boundaries. 
Used online resource STRING http://string-db.org/ 
The main result - graphs of gene networks of varying degrees of 
connectivity for the two types of cells 
Fib 
698 – the total number of genes on 
the domain boundaries 
88 – genes involved in the 
connection 
160 pairs of connection 
12% genes from total 
Sp 
314 – the total number of 
genes on the domain 
boundaries 
13 – genes involved in the 
connection 
10 pairs of connection 
4% genes from total
Conclusion 
 Implemented a Java program 
 Application of the program to the experimental data (ICG SB RAS 
and databases on chromosome contacts) 
 The analysis of the location set of genes in chromosomal domains 
(control computer simulation)
Next Steps 
 Define domains including pluripotency genes in the mouse genome (Dixon 
et al., 2012). 
 Make developed project is compatible with other programs designed to 
ICG SB RAS for microarray data developed in languages Java, C / C + +. 
 Integrate the program with data on gene expression database BioGPS 
microchips in human genome. 
Thank you for your attention!
Publications(Thesises) 
 Safronova N.S., Kulakova E.V., Orlov Yu.L. (2013) Applications of text complexity measures to 
genome sequences analysis. // Proceedings of GIW-2013, National University of Singapore, 16- 
18 Dec 2013. P.42. 
 Медведева И.В., Вишневский О.В., Кулакова Е.В., Спицына А.М., Афонников Д.А., Кочетов 
А.В., Орлов Ю.Л. (2014) Геномная организация и контекстные характеристики генов с 
повышенной экспрессией в клетках мозга // Геномная организация и контекстные 
характеристики генов с повышенной экспрессией в клетках мозга // XVI Всероссийская 
научно-техническая конференция «Нейроинформатика-2014»: Сборник научных трудов. 
М.: НИЯУ МИФИ. Ч. 2., С. 32-42. 
 Kulakova E.V., Bryzgalov L.O., Orlov Y.L., Li G., Ruan Y. Computer analysis of chromosome 
contacts revealed by sequencing // Конференция BGRSSB-2014 (Bioinformatics of Genome 
Regulation and StructureSystem Biology). 
 Kulakova E.V., Podkolodnaya O.A.,Serov O.L., Orlov Y.L. Computer data analysis of genome 
sequencing by technology ChIP-seq and Hi-C.// Конференция BGRSSB-2014 (Bioinformatics 
of Genome Regulation and StructureSystem Biology).P – 90. 
 Кулакова Е.В. Компьютерный анализ данных геномного секвенирования по технологии 
ChIP-seq и Hi-C. // Конференция МНСК-2014 (Международная научная студенческая 
конференция). C. 207 
 Spitsina A., Kulakova E.V., Safronova N., Orlova N.G. Statistical analysis 
of gene expression data by rank correlation coefficients.// Конференция BGRSSB-2014 
(Bioinformatics of Genome Regulation and StructureSystem Biology). P-91.

More Related Content

What's hot

RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical NotebookNaima Tahsin
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
Genome Reference Consortium
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Functional Genomics Data Society
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomesGenomeInABottle
 
Introduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEASTIntroduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEAST
Bioinformatics and Computational Biosciences Branch
 
NetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver HartNetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver Hart
Alexander Pico
 
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedHail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Spark Summit
 
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic MethodsAnalytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
cscpconf
 
2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge
Prof. Wim Van Criekinge
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
GenomeInABottle
 
Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878GenomeInABottle
 
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Functional Genomics Data Society
 
7 nucleic acids syllabus statements
7 nucleic acids syllabus statements7 nucleic acids syllabus statements
7 nucleic acids syllabus statementscartlidge
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
GenomeInABottle
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
GenomeInABottle
 

What's hot (20)

RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
Introduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEASTIntroduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEAST
 
NetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver HartNetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver Hart
 
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedHail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic MethodsAnalytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
 
2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878Aug2013 NIST highly confident genotype calls for NA12878
Aug2013 NIST highly confident genotype calls for NA12878
 
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
 
7 nucleic acids syllabus statements
7 nucleic acids syllabus statements7 nucleic acids syllabus statements
7 nucleic acids syllabus statements
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 

Similar to Kulakova sbb2014

Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
Prof. Wim Van Criekinge
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generation
MohamedHasan816582
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
Monica Munoz-Torres
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
Pinky Vincent
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
Monica Munoz-Torres
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
Monica Munoz-Torres
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing ResearchTanmay Ghai
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
Amity university, Noida
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
IJAEMSJORNAL
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
Monica Munoz-Torres
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
Vijay Raj Yanamala
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?
Klaas Vandepoele
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
The UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewThe UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overview
Victoria Perreau
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysisDr. Olusoji Adewumi
 
Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsPulipati Gangadhara Rao
 

Similar to Kulakova sbb2014 (20)

Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generation
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
The UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewThe UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overview
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable crops
 
GBI2016_Cantone
GBI2016_CantoneGBI2016_Cantone
GBI2016_Cantone
 

Recently uploaded

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 

Recently uploaded (20)

What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 

Kulakova sbb2014

  • 1. COMPUTER DATA ANALYSIS OF GENOME SEQUENCING BY TECHNOLOGY ChIP-seq AND Hi-C adviser–Yuri Orlov, ICG SB RAS author– Kulakova Ekaterina, bachelor
  • 2. Topicality  Automated systems allow decoding DNA and genomic sequences up to whole genomes. The complete sequencing of genomes leads to avalanche growth on the sequence information (megabytes and gigabytes of data).  Development of methods based on chromatin immunoprecipitation (ChIP-seq, ChIA-PET) gives a qualitatively new data.  There are new tasks of computer genomics (analysis of spatial, non-linear structures of chromosomes) Aim and Scientific novelty The aim of this work - the study of chromosomal contacts in the cell nucleus with the help of computer programs statistical data of genes and chromosomal domains, experimental data analysis ChIP-seq and Hi-C.  Integration of modern genome-wide ChIP-seq data and Hi-C, which became available only in the last two or three year  Using the parameter precision location on chromosome with which to analyze the data  Establishing a list of genes located on chromosome boundaries of topological domains. *ChIP- Seq = Chromatin ImmunoPrecipitation sequencing ChIA-PET = Chromatin Interaction Analysis by Paired-End-Tag sequencing
  • 3. Methods Hi-C and ChIA-PET* Arrangement of chromosomes in the cell nucleus (reconstruction according to Hi-C) Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science, 2009 Topological arrangement of the domains of chromosomes and its mapping in the genome Scheme of local chromosomal domains ("tangle" contacts) *ChIP- Seq = Chromatin ImmunoPrecipitation sequencing ChIA-PET = Chromatin Interaction Analysis by Paired-End-Tag sequencing Hi-C = Hi (high dimension chromosome) Conformation Separate loops «tangle» (Dixon et al., 2012) Scheme of arrangement of genes on chromosome
  • 4. Genomic data: genes, peaks ChIP-seq, contact areas ChIA-PET genes genes Plot of chromosomal contacts ChIA-PET Chromosomal domain Peaks of ChIP-seq profiles
  • 5. File formats and their presentation Bed-file example  >track name=ER_E2 description=ER_E2  chr1 557112 558114  chr1 559459 560286  chr1 998864 999397  chr1 999399 999604  chr1 1004343 1005146  chr1 1070346 1071080  chr1 1305474 1306502  chr1 1358287 1358744  chr1 1776987 1777750  chr1 1820476 1821168  chr1 1922754 1923628  chr1 2131962 2132747  chr1 2325805 2326447  chr1 2368996 2369977  chr1 3119829 3120541  chr1 3244610 3245121  … Data about domains in mouse cells - obtained in the laboratory O.L.Serov (ICG SB RAS) (Fib_domains, Sp_domains). The size of one file with the genomic profile - from 100 MB to 2-3 Gb RefSeq annotation taken from UCSC Genome Browser http://genome.ucsc.edu/cgi-bin/hgTables
  • 6. Calculation of the position of genes and domain boundaries  А1 – left coordinate of the gene B1 - right coordinate of the gene.  А2 – left coordinate of the domain, В2 – right coordinate of the domain.  Е – accuracy, user-defined.  if (|А1 – А2| <= Е) & (В1 < А2 + (В2 – А2)/2) true, we assume that the gene lies close to the left boundary of the domain. Similar conditions for the right border. Е А2 А1 В1 В2 домен ген Example of location of chromosomal domains and genes for mouse chromosome 10 The linear arrangement of genes in the domain
  • 7. Table location types of genes in chromosomal domains Other – other genes Inside – genes that lie within the domains onBorder – genes lying on the domain boundaries.
  • 8. Analysis of the location set of genes on the domains in different cell types  User specifies a list of genes. Possible to analyze all the genes in the genome (20,000 genes)  Types of cells - embryonic stem cells (fibroblasts - Fib) and sperm (Sp) mouse. Experiment Hi-C, ICG SB RAS Sp (densely packed structure) 92,5 % genes within domains 1,4% on border 6,1% other Fib (Open chromatin) 72,6 % genes within domains 3,2% on border 24% other
  • 9. Experimental data. Gene Ontology categories For analysis were taken genes lying on the domain boundaries. The result was sorted by the number of genes with common biological processes category Used online resource http://david.abcc.ncifcrf.gov/
  • 10. Analysis of the co-expression of genes, lying on the borders of the spatial domain For analysis were taken genes located on the domain boundaries. Used online resource STRING http://string-db.org/ The main result - graphs of gene networks of varying degrees of connectivity for the two types of cells Fib 698 – the total number of genes on the domain boundaries 88 – genes involved in the connection 160 pairs of connection 12% genes from total Sp 314 – the total number of genes on the domain boundaries 13 – genes involved in the connection 10 pairs of connection 4% genes from total
  • 11. Conclusion  Implemented a Java program  Application of the program to the experimental data (ICG SB RAS and databases on chromosome contacts)  The analysis of the location set of genes in chromosomal domains (control computer simulation)
  • 12. Next Steps  Define domains including pluripotency genes in the mouse genome (Dixon et al., 2012).  Make developed project is compatible with other programs designed to ICG SB RAS for microarray data developed in languages Java, C / C + +.  Integrate the program with data on gene expression database BioGPS microchips in human genome. Thank you for your attention!
  • 13. Publications(Thesises)  Safronova N.S., Kulakova E.V., Orlov Yu.L. (2013) Applications of text complexity measures to genome sequences analysis. // Proceedings of GIW-2013, National University of Singapore, 16- 18 Dec 2013. P.42.  Медведева И.В., Вишневский О.В., Кулакова Е.В., Спицына А.М., Афонников Д.А., Кочетов А.В., Орлов Ю.Л. (2014) Геномная организация и контекстные характеристики генов с повышенной экспрессией в клетках мозга // Геномная организация и контекстные характеристики генов с повышенной экспрессией в клетках мозга // XVI Всероссийская научно-техническая конференция «Нейроинформатика-2014»: Сборник научных трудов. М.: НИЯУ МИФИ. Ч. 2., С. 32-42.  Kulakova E.V., Bryzgalov L.O., Orlov Y.L., Li G., Ruan Y. Computer analysis of chromosome contacts revealed by sequencing // Конференция BGRSSB-2014 (Bioinformatics of Genome Regulation and StructureSystem Biology).  Kulakova E.V., Podkolodnaya O.A.,Serov O.L., Orlov Y.L. Computer data analysis of genome sequencing by technology ChIP-seq and Hi-C.// Конференция BGRSSB-2014 (Bioinformatics of Genome Regulation and StructureSystem Biology).P – 90.  Кулакова Е.В. Компьютерный анализ данных геномного секвенирования по технологии ChIP-seq и Hi-C. // Конференция МНСК-2014 (Международная научная студенческая конференция). C. 207  Spitsina A., Kulakova E.V., Safronova N., Orlova N.G. Statistical analysis of gene expression data by rank correlation coefficients.// Конференция BGRSSB-2014 (Bioinformatics of Genome Regulation and StructureSystem Biology). P-91.

Editor's Notes

  1. Актуальность данной темы основана на том, что автоматизированные системы определяющие последовательности оснований ДНК, позволяют расшифровывать ДНК и геномные последовательности вплоть до целых геномов. Полное секвенирование геномов ведет к лавинообразному росту объема информации о нуклеотидных последовательностях (мегабайты и гигабайты данных). Разработка методов иммунопреципитации хроматина и секвенирования (ChIP-seq, ChIA-PET – «чип-сик», «чиа-пет» - рассшифровка этих аббревиатур на английском показана здесь) для исследования регуляторных районов генома, дает качественно новые данные. Появляются новые задачи компьютерной геномики (анализ пространственных, а не линейных структур хромосом) Цель данной работы - изучение хромосомных контактов в ядре клетки с помощью компьютерных программ статистической обработки данных расположения генов и хромосомных доменов, анализ экспериментальных данных ChIP-seq и Hi-C. Научная новизна заключается в интеграции современных полногеномных данных ChIP-seq и Hi-C, ставших доступными только в последние два-три года В использовании параметра точности расположения на хромосоме, с которой необходимо провести анализ данных, И в установлении списка генов, находящихся на границах топологических хромосомных доменов.
  2. Хромосомы, находящиеся в ядре клетки, компактизуются в клубки и узелки. Метод Hi-C (high dimension chromosome Conformation) дает понятия о пространственном расположении хромосом в ядре клетки. На рисунке слева показан результат реконструкции по данными Hi-C. Клубки которые образуют хромосомы делят на домены. Клубок – домен. Как показано на рисунке в центре. Справа –результат полученный методом ChIA-PET (Chromatin Interaction Analysis by Paired-End-Tag sequencing). Метод позволяет определить участки связывания транскрипционных факторов и взаимодействующие участки хроматина расположенных на значительном удалении друг от друга в геноме.
  3. На данном слайде представлено картирование интеграции данных ChIP-seq и ChIA-PET. На основе данных контактов выделяют хромосомные домены (фиолетовые линии контактов и красный треугольник вверху). По таким данным можно изучать расположение генов относительно доменов (показано стрелками).
  4. Данные о доменах хранятся в bed-файле в виде, представленном на слайде: хромосома в которой располагается домен, координаты его начала и конца. Данные о доменах на двух типах клеток - фибробласты и сперматозоиды - были предоставлены лабораторией Олега Леонидовича Серова, ИЦиГ СО РАН. Данные генов были взяты из базы данных UCSC Genome Browser («ЮсиЭсСи Дженом браузер») – В нижней части слайда приведен пример. Размер одного файла с геномным профилем от 100Мб до 2-3 Гб. Файл содержит поля - идентификатор гена, имя гена, хромосома, координаты начала и конца, символьное имя гена и др.
  5. Одной из задач в моей работы было – выделить списки генов находящихся на границе пространственных доменов. В эти списки входили гены относящиеся к двум категориям – те, которые непосредственно пересекают границу и те, которые лежат «близко» к ней. Понятие «близко» основано на параметре точности вводимым пользователем. На картинке видно, что расстояние между левыми координатами домена и гена должно быть в пределах точности, а правая координата гена не должна превышать середины домена.
  6. На слайде представлена результирующая таблица. В ней содержится имя гена и категории внутри, на границе или «другие». Под другими я подразумеваю гены имеющие длину больше длины домена или ген относящийся к специфичной хромосоме. Возможен расчет для случайных групп генов, когда список генов составляется с помощью датчика случайных чисел, для оценки частот распределения генов по таким группам для исследуемой доменной организации.
  7. Еще одной из задач был анализ расположения набора генов. Список генов пользователь задает самостоятельно. Возможен анализ всех генов в геноме. Расположение всех генов было проанализировано на доменах двух типов клеток – фибробласты и сперматозоиды, экспериментальные данные лаборатории Серова, Институт Цитологии и Генетики. Можно увидеть что у сперматозоидов практически все гены лежат внутри доменов. У фибробластов довольно большой процент других генов.
  8. Списки генов на границах доменов были проанализированы при помощи интернет-ресурса DAVID на предмет категорий генных онтологий. На графиках обозначены биологические процессы, число генов отвечающих за них, коэффициент значимости, а так же наблюдаемое и ожидаемое число генов. На обоих графиках видно, что наибольшее число генов отвечает за БЕЛКИ фосфопротеины. Значимыми процессами являются для фибробластов – функции связанные с плазматической мембраны, у сперматозоидов – также с мембраной, что свидетельствует о плотной хроматиновой упаковке генома Таким образом, с помощью программы был получен биологический результат
  9. Этот же список генов был проанализирован на связи коэкспрессии при помощи интернет-ресурса реконструкции генных сетей STRING. Показан интернет-адрес. Показана статистика числа генов и процент образуемых связей в сети. Видно, что У фибробластов генная сеть гораздо более связная (на рисунке слева), 12 процентов генов против 4 процентов. (Если спросят - У сперматозоидов хроматин закрытый, ДНК находится в очень компактном состоянии)
  10. Заключение. Реализована программы на языке Java Программа применена к экспериментальным данным Выполнен анализ расположения набора генов в хромосомных доменах (гены из генных сетей и контрольная компьютерная симуляция)
  11. Дальнейшие действия: Включают исследование расположения отдельных генов, Разработку пакета совместимого с программами, разработанными ранее в ИЦиГ СО РАН для микрочиповых данных, на языках Java, C/C++. Интегрировать разработанную программу с данными экспрессии генов на микрочипах по базе данных BioGPS. Спасибо за внимание!