SlideShare a Scribd company logo
1 of 45
Effective Cancer Detection Using
Higher-Order Genome Architecture
and Chromatin Interactions
My (Angela) Chung
Master of Science in Bioinformatics
Conceptual question
How do higher-order genomic
structures and epigenetic regulations
influence the chromatin interactions
in normal and cancerous cell lines?
2
Roadmap
1 3 5
6
4
2
PIPELINE
SUMMARY
DECTECTION
MODEL
CONTACT MATRICES &
REGULATORY TRACKS
SCIENTIFIC
BACKGROUND
DATA ENGINEERING
& VISUALIZATION
MODEL
EVALUATION
3
Background – clinical & scientific!
4
Chromatin structure
Chang et al. (2018) 5
Higher-order Primary-order
Genomic Structure
TAD structure Super-enhancer
Rowley et al. (2018) Li et al. (2021)
Accumulation of multiple
enhancers at TAD domains
6
CTCF: the regulator of chromatin
organization
Cohesin: the structural maintenance
of chromosome cohesin complex
Genomic Structure
Loop extrusion
Epigenetic marks & chromatin states
7
Epigenetics
H3K27ac is ACTIVATOR
ACTIVATORS for super-enhancer
H3K27me3 is REPRESSOR
Martire et al. (2017) & Chen et al. (2020)
Chromatin state in oncogenic
signaling pathways
8
Hniz et al. (2015) & Li et al. (2021)
Epigenetics – Histone modifications Pathway effects
Transcriptional super-enhancers
q Chromatin structure
q Epigenetic regulation
q Effects on cancer pathway
Processing Pipeline
&
Contact Matrices
Cell morphology
IMR90
Lung fibroblast
Normal cell
K562
Chronic myeloid leukemia
Cancer cell
GM12878
Lymphoblastoid
Normal cell
A549
Human lung adenocarcinoma
Cancer cell
10
High-throughput chromosome
conformation capture (Hi-C)
• Comprehensive mapping of chromatin contacts
• Detect loops across the entire genome
• DNA-DNA proximity ligation is combined with high-throughput sequencing
• Chromosome spatially positions into 3D conformation
Rao et al. (2014) 11
In situ Hi-C
12
Chromatin contacts
chr1: 20mb – 80mb chr1: 60mb – 80mb chr1: 60mb – 70mb chr1: 62mb – 67mb
13
q Cell line: A549
q Chromosome: 1
q Anatomy of contact matrices
Dynamic
Chromatin
contacts
14
q Cell lines and chromosomes
q Resolution
q Variations in genome architecture
Dynamic
Chromatin
contacts
15
q Cell lines and chromosomes
q Resolution
q Variations in genome architecture
Dynamic
Chromatin
contacts
16
q Cell lines and chromosomes
q Resolution
q Variations in genome architecture
Dynamic
Chromatin
contacts
17
q Cell lines and chromosomes
q Resolution
q Variations in genome architecture
Engineer and organize data for visualization
and building a detection model.
Data Engineering &
Visualization
Flowchart for
Data Engineering
TADs
CTCF
Loops
RAD21
Histone
marks
Insulation
Directionality
Index
CTCF at both ends
CTCF at TAD start
CTCF at TAD end
DI at TAD end
DI at TAD start
Sum of insulation scores
No CTCF
Number of histone marks
Average location
Histone marks are not
present
Start boundary score
End boundary score
CTCF
loop
Cohesin-independent loop
Cohesin-associated loop
Ordinary
domain
Loop
extrusion
Extrusion
loss
ChIP-Seq
FANC
Arrowhead
HICCUPS
ChIP-Seq
FANC
ChIP-Seq
Within loop
window
Outside loop
window
CTCF
loop loss
Structure Effect
Analysis Features
Key Components
19
20
Chromatin state distribution
ACTIVATOR = H3K27ac
21
Chromatin state distribution
ACTIVATOR = H3K27ac
22
Chromatin State vs
Boundary Strength Distribution
ACTIVATOR = H3K27ac
23
Chromatin State vs
Boundary Strength Distribution
ACTIVATOR = H3K27ac
24
Enhancers and promoters are potentially protected at strong TAD boundaries.
Chromatin State vs
Boundary Strength Distribution
ACTIVATOR = H3K27ac
25
Chromatin State vs
Boundary Strength Distribution
REPRESSOR = H3K27me3
26
Normal cell
Normal cell
Chromatin State vs
Boundary Strength Distribution
REPRESSOR = H3K27me3
27
Cancer cell Cancer cell
Chromatin State vs
Boundary Strength Distribution
REPRESSOR = H3K27me3
28
Lower density of weak boundaries and negative super-enhancers detected in cancerous cell lines.
Chromatin State vs
Boundary Strength Distribution
REPRESSOR = H3K27me3
Detection Model
Cell type classifer using
machine learning algorithms
Machine learning models
Random Forest 73.76% accuracy
TabNet 70.00%
XGBoost 80.90% accuracy
31
Random Forest
performance
evaluation
Model accuracy is 73.76%
Max depth is unassigned for the highest performance
Number of tree estimators is around 200
Numerical attributes represent feature importance
K562 is best classified at 83%
32
XGBoost
performance
evaluation
Model accuracy is 80.90%
Max depth is 8-12 for the highest performance
Number of estimators is around 300
IMR90 and K562 are best classified at 83%
33
Future direction
34
• Try different scalable and memory-efficient tools to remove biases and normalize Hi-C contact maps.
• Consider gene expression profile (RNA-Seq) for chromatin state categorization.
• Consider new visualization tools to further enhance image quality, especially for triangular heatmap.
• Improve feature engineering to enhance model performance.
• Consider Hi-C image classification to distinguish cell lines.
References
Chang, P., Gohain, M., et al., Computational Methods for Assessing Chromatin Hierarchy. Comp Struc Biotech J., 16. 2018
Rowley, M.J., & Victor, G.C., Organization principles of 3D genome architecture. Nature Reviews: Genetics, 19, 2018
Li, G.H., Qu, Q. et al. Super-enhancers: a new frontier for epigenetic modifiers in cancer chemoresistance. J. Exp Clin Cancer
Res, no. 40, 2021
Fudenberg G., Imakaev, M., et al., Formation of chromosomal domains by loop extrusion. Cell Reports, 15, 2038-2049, 2016
Kyrchanova, O., & Georgiev, P., Mechanisms of enhancer-promoter interactions in higher eukaryotes. Int J Mol Sci., vol. 22, no.
2, 2021
Rao, S., Huang, S.C., et al., Cohesin loss eliminates all loop domains. Cell, vol. 171, no. 2, 2017
Ong, C.T., & Corces V. G., Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet.,
vol. 12, no. 4, 2011
Ernst J., Kheradpour, P., et al., Mapping and analysis of chromatin state dynamics in nine human cell types. Nature., 2011
Hnisz, D., Schuijers, J., et al., Convergence of developmental and oncogenic signaling pathways at transcriptional super-
enhancers. Molecular Cell, vol. 58, no. 2, 2015
Rao, S. S. P., Huntley, M. H., Durand, N. C., et al., A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles
of Chromatin Looping. Cell, 159, 2014
35
Project Committee Members:
Wendy Lee, Ph.D Department of Computer Science
Carlos Rojas, Ph.D Department of Computer Engineering
William Andreopoulos, Ph.D Department of Computer Science
Thank you for your time and attention!
37
38
Correlation Diagrams
TADs
Cohesin-
associated
loops
Chromatin
states Histone
modification
RNA Pol II
Gene expression
Insulation
score
CTCF
Cohesin-
independent
loops
Inter-
chromosomal
links
Gene
expression
Super-
enhancer
colocalization
Up-regulated
Down-regulated
CTCF and histone marks in contact domains
q Cell lines: A549 and IMR90
q Chromosome: 8
q Resolution window: 80mb – 83mb
q CTCF and epigenetic factors in
genomic architecture
Basic mapping:
• Paired-end reads were detected and split reads at ligation
junctions by MboI or HindIII restriction enzyme.
• Map reads (MAPQ > 30) to a reference genome hg19 using
Bowtie2 indexing.
Iterative mapping:
• Eliminate unaligned reads
Fragmentation:
• assign mapped reads to restriction fragments (MboI or HindIII)
Mapping
to generate aligned reads
&
Fragmentation
to assign to restriction
fragments
Filtering
read pairs
to eliminate invalid
read pairs from
experimental
procedures
Distance filters: filter out self-ligation products <25kb
Restriction distance: remove reads mapping >1000kb
Strand filters (inward and outward ligations):
• Inward same strand (un-ligated) <1kb
• Outward same strand (self-ligated) <1kb
Features of invalid reads:
Unmapped
Multi-mapping
PCR duplicates
Low map quality
Ligation errors
Samples Total reads Valid pairs % Valid pairs Sum pairs filtered
GM12878_577 87,569,310 75,182,513
85.9 16,519,508
GM12878_580 63,592,550 54,573,765
85.8 12,121,642
GM12878_581 31,678,266 27,537,842 86.9 5,570,781
GM12878_583 69,592,394 57,017,845
81.9 16,266,949
GM12878_586 59,628,563 49,030,434
82.2 13,838,160
IMR90_672 99,266,806 94,680,468
95.4 25,551,020
IMR90_673 107,407,597 90,914,417 84.6 21,623,779
IMR90_674 14,526,316 12,731,251 87.6 2,345,638
IMR90_675 66,649,091 56,971,226 85.5 26,162,938
IMR90_676 133,365,970 116,341,749 87.2 22,032,033
IMR90_677 118,965,610 94,694,521
79.6 34,048,741
Data processing summary
Four cell lines: GM12878, IMR90, K562, and A549
Total 34 samples including both primary and replicates
Matrix balancing
Goal:
• Correct known and potentially unknown systemic biases
• Generate true contact frequencies which preserve underlying architecture of the matrix.
• Assure visibility for all loci
Binning size: 10kb, 100kb, and more
Normalization: (per chromosome level)
Knight-Ruiz ICE
Product of non-negative matrix and diagonal
matrices D1 and D2 to obtain singular value P
Expected contact frequency from biases and
relative contact probabilities
Data Con LA 2022 - Early cancer detection using higher-order genome architecture

More Related Content

Similar to Data Con LA 2022 - Early cancer detection using higher-order genome architecture

Presentation july 28_2015
Presentation july 28_2015Presentation july 28_2015
Presentation july 28_2015
gkoytiger
 
Presentation july 31_2015
Presentation july 31_2015Presentation july 31_2015
Presentation july 31_2015
gkoytiger
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Nuria Lopez-Bigas
 
Q biomarkersomaticmutation
Q biomarkersomaticmutationQ biomarkersomaticmutation
Q biomarkersomaticmutation
Elsa von Licy
 

Similar to Data Con LA 2022 - Early cancer detection using higher-order genome architecture (20)

Presentation july 28_2015
Presentation july 28_2015Presentation july 28_2015
Presentation july 28_2015
 
Shape Signatures Light
Shape Signatures LightShape Signatures Light
Shape Signatures Light
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
CDAC 2018 Gonzales-Perez interpretation of cancer genomes
CDAC 2018 Gonzales-Perez interpretation of cancer genomesCDAC 2018 Gonzales-Perez interpretation of cancer genomes
CDAC 2018 Gonzales-Perez interpretation of cancer genomes
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Presentation july 31_2015
Presentation july 31_2015Presentation july 31_2015
Presentation july 31_2015
 
Jiankang Wang. Principle of QTL mapping and inclusive composite interval mapp...
Jiankang Wang. Principle of QTL mapping and inclusive composite interval mapp...Jiankang Wang. Principle of QTL mapping and inclusive composite interval mapp...
Jiankang Wang. Principle of QTL mapping and inclusive composite interval mapp...
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
 
Dr. Lee Cooper: Integrated Morphologic Analysis for Identification and Charac...
Dr. Lee Cooper: Integrated Morphologic Analysis for Identification and Charac...Dr. Lee Cooper: Integrated Morphologic Analysis for Identification and Charac...
Dr. Lee Cooper: Integrated Morphologic Analysis for Identification and Charac...
 
CDAC 2018 Boeva discovery
CDAC 2018 Boeva discoveryCDAC 2018 Boeva discovery
CDAC 2018 Boeva discovery
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Strips.blogged
Strips.bloggedStrips.blogged
Strips.blogged
 
Liangqun ms defense.pptx
Liangqun ms defense.pptxLiangqun ms defense.pptx
Liangqun ms defense.pptx
 
Mechanisms of Plaque Rupture in Advanced Atherosclerosis
Mechanisms of Plaque Rupture in Advanced AtherosclerosisMechanisms of Plaque Rupture in Advanced Atherosclerosis
Mechanisms of Plaque Rupture in Advanced Atherosclerosis
 
Tfpcr array poster
Tfpcr array posterTfpcr array poster
Tfpcr array poster
 
Q biomarkersomaticmutation
Q biomarkersomaticmutationQ biomarkersomaticmutation
Q biomarkersomaticmutation
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai
 
Abrf poster2007
Abrf poster2007Abrf poster2007
Abrf poster2007
 
Khanal beltwide conference_2021
Khanal beltwide conference_2021Khanal beltwide conference_2021
Khanal beltwide conference_2021
 

More from Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Data Con LA 2022 - Early cancer detection using higher-order genome architecture

  • 1. Effective Cancer Detection Using Higher-Order Genome Architecture and Chromatin Interactions My (Angela) Chung Master of Science in Bioinformatics
  • 2. Conceptual question How do higher-order genomic structures and epigenetic regulations influence the chromatin interactions in normal and cancerous cell lines? 2
  • 3. Roadmap 1 3 5 6 4 2 PIPELINE SUMMARY DECTECTION MODEL CONTACT MATRICES & REGULATORY TRACKS SCIENTIFIC BACKGROUND DATA ENGINEERING & VISUALIZATION MODEL EVALUATION 3
  • 4. Background – clinical & scientific! 4
  • 5. Chromatin structure Chang et al. (2018) 5 Higher-order Primary-order Genomic Structure
  • 6. TAD structure Super-enhancer Rowley et al. (2018) Li et al. (2021) Accumulation of multiple enhancers at TAD domains 6 CTCF: the regulator of chromatin organization Cohesin: the structural maintenance of chromosome cohesin complex Genomic Structure Loop extrusion
  • 7. Epigenetic marks & chromatin states 7 Epigenetics H3K27ac is ACTIVATOR ACTIVATORS for super-enhancer H3K27me3 is REPRESSOR Martire et al. (2017) & Chen et al. (2020)
  • 8. Chromatin state in oncogenic signaling pathways 8 Hniz et al. (2015) & Li et al. (2021) Epigenetics – Histone modifications Pathway effects Transcriptional super-enhancers q Chromatin structure q Epigenetic regulation q Effects on cancer pathway
  • 10. Cell morphology IMR90 Lung fibroblast Normal cell K562 Chronic myeloid leukemia Cancer cell GM12878 Lymphoblastoid Normal cell A549 Human lung adenocarcinoma Cancer cell 10
  • 11. High-throughput chromosome conformation capture (Hi-C) • Comprehensive mapping of chromatin contacts • Detect loops across the entire genome • DNA-DNA proximity ligation is combined with high-throughput sequencing • Chromosome spatially positions into 3D conformation Rao et al. (2014) 11 In situ Hi-C
  • 12. 12
  • 13. Chromatin contacts chr1: 20mb – 80mb chr1: 60mb – 80mb chr1: 60mb – 70mb chr1: 62mb – 67mb 13 q Cell line: A549 q Chromosome: 1 q Anatomy of contact matrices
  • 14. Dynamic Chromatin contacts 14 q Cell lines and chromosomes q Resolution q Variations in genome architecture
  • 15. Dynamic Chromatin contacts 15 q Cell lines and chromosomes q Resolution q Variations in genome architecture
  • 16. Dynamic Chromatin contacts 16 q Cell lines and chromosomes q Resolution q Variations in genome architecture
  • 17. Dynamic Chromatin contacts 17 q Cell lines and chromosomes q Resolution q Variations in genome architecture
  • 18. Engineer and organize data for visualization and building a detection model. Data Engineering & Visualization
  • 19. Flowchart for Data Engineering TADs CTCF Loops RAD21 Histone marks Insulation Directionality Index CTCF at both ends CTCF at TAD start CTCF at TAD end DI at TAD end DI at TAD start Sum of insulation scores No CTCF Number of histone marks Average location Histone marks are not present Start boundary score End boundary score CTCF loop Cohesin-independent loop Cohesin-associated loop Ordinary domain Loop extrusion Extrusion loss ChIP-Seq FANC Arrowhead HICCUPS ChIP-Seq FANC ChIP-Seq Within loop window Outside loop window CTCF loop loss Structure Effect Analysis Features Key Components 19
  • 22. 22 Chromatin State vs Boundary Strength Distribution ACTIVATOR = H3K27ac
  • 23. 23 Chromatin State vs Boundary Strength Distribution ACTIVATOR = H3K27ac
  • 24. 24 Enhancers and promoters are potentially protected at strong TAD boundaries. Chromatin State vs Boundary Strength Distribution ACTIVATOR = H3K27ac
  • 25. 25 Chromatin State vs Boundary Strength Distribution REPRESSOR = H3K27me3
  • 26. 26 Normal cell Normal cell Chromatin State vs Boundary Strength Distribution REPRESSOR = H3K27me3
  • 27. 27 Cancer cell Cancer cell Chromatin State vs Boundary Strength Distribution REPRESSOR = H3K27me3
  • 28. 28 Lower density of weak boundaries and negative super-enhancers detected in cancerous cell lines. Chromatin State vs Boundary Strength Distribution REPRESSOR = H3K27me3
  • 29. Detection Model Cell type classifer using machine learning algorithms
  • 30. Machine learning models Random Forest 73.76% accuracy TabNet 70.00% XGBoost 80.90% accuracy
  • 31. 31 Random Forest performance evaluation Model accuracy is 73.76% Max depth is unassigned for the highest performance Number of tree estimators is around 200 Numerical attributes represent feature importance K562 is best classified at 83%
  • 32. 32 XGBoost performance evaluation Model accuracy is 80.90% Max depth is 8-12 for the highest performance Number of estimators is around 300 IMR90 and K562 are best classified at 83%
  • 33. 33
  • 34. Future direction 34 • Try different scalable and memory-efficient tools to remove biases and normalize Hi-C contact maps. • Consider gene expression profile (RNA-Seq) for chromatin state categorization. • Consider new visualization tools to further enhance image quality, especially for triangular heatmap. • Improve feature engineering to enhance model performance. • Consider Hi-C image classification to distinguish cell lines.
  • 35. References Chang, P., Gohain, M., et al., Computational Methods for Assessing Chromatin Hierarchy. Comp Struc Biotech J., 16. 2018 Rowley, M.J., & Victor, G.C., Organization principles of 3D genome architecture. Nature Reviews: Genetics, 19, 2018 Li, G.H., Qu, Q. et al. Super-enhancers: a new frontier for epigenetic modifiers in cancer chemoresistance. J. Exp Clin Cancer Res, no. 40, 2021 Fudenberg G., Imakaev, M., et al., Formation of chromosomal domains by loop extrusion. Cell Reports, 15, 2038-2049, 2016 Kyrchanova, O., & Georgiev, P., Mechanisms of enhancer-promoter interactions in higher eukaryotes. Int J Mol Sci., vol. 22, no. 2, 2021 Rao, S., Huang, S.C., et al., Cohesin loss eliminates all loop domains. Cell, vol. 171, no. 2, 2017 Ong, C.T., & Corces V. G., Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet., vol. 12, no. 4, 2011 Ernst J., Kheradpour, P., et al., Mapping and analysis of chromatin state dynamics in nine human cell types. Nature., 2011 Hnisz, D., Schuijers, J., et al., Convergence of developmental and oncogenic signaling pathways at transcriptional super- enhancers. Molecular Cell, vol. 58, no. 2, 2015 Rao, S. S. P., Huntley, M. H., Durand, N. C., et al., A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell, 159, 2014 35
  • 36. Project Committee Members: Wendy Lee, Ph.D Department of Computer Science Carlos Rojas, Ph.D Department of Computer Engineering William Andreopoulos, Ph.D Department of Computer Science Thank you for your time and attention!
  • 37. 37
  • 38. 38
  • 39. Correlation Diagrams TADs Cohesin- associated loops Chromatin states Histone modification RNA Pol II Gene expression Insulation score CTCF Cohesin- independent loops Inter- chromosomal links Gene expression Super- enhancer colocalization Up-regulated Down-regulated
  • 40. CTCF and histone marks in contact domains q Cell lines: A549 and IMR90 q Chromosome: 8 q Resolution window: 80mb – 83mb q CTCF and epigenetic factors in genomic architecture
  • 41. Basic mapping: • Paired-end reads were detected and split reads at ligation junctions by MboI or HindIII restriction enzyme. • Map reads (MAPQ > 30) to a reference genome hg19 using Bowtie2 indexing. Iterative mapping: • Eliminate unaligned reads Fragmentation: • assign mapped reads to restriction fragments (MboI or HindIII) Mapping to generate aligned reads & Fragmentation to assign to restriction fragments
  • 42. Filtering read pairs to eliminate invalid read pairs from experimental procedures Distance filters: filter out self-ligation products <25kb Restriction distance: remove reads mapping >1000kb Strand filters (inward and outward ligations): • Inward same strand (un-ligated) <1kb • Outward same strand (self-ligated) <1kb Features of invalid reads: Unmapped Multi-mapping PCR duplicates Low map quality Ligation errors
  • 43. Samples Total reads Valid pairs % Valid pairs Sum pairs filtered GM12878_577 87,569,310 75,182,513 85.9 16,519,508 GM12878_580 63,592,550 54,573,765 85.8 12,121,642 GM12878_581 31,678,266 27,537,842 86.9 5,570,781 GM12878_583 69,592,394 57,017,845 81.9 16,266,949 GM12878_586 59,628,563 49,030,434 82.2 13,838,160 IMR90_672 99,266,806 94,680,468 95.4 25,551,020 IMR90_673 107,407,597 90,914,417 84.6 21,623,779 IMR90_674 14,526,316 12,731,251 87.6 2,345,638 IMR90_675 66,649,091 56,971,226 85.5 26,162,938 IMR90_676 133,365,970 116,341,749 87.2 22,032,033 IMR90_677 118,965,610 94,694,521 79.6 34,048,741 Data processing summary Four cell lines: GM12878, IMR90, K562, and A549 Total 34 samples including both primary and replicates
  • 44. Matrix balancing Goal: • Correct known and potentially unknown systemic biases • Generate true contact frequencies which preserve underlying architecture of the matrix. • Assure visibility for all loci Binning size: 10kb, 100kb, and more Normalization: (per chromosome level) Knight-Ruiz ICE Product of non-negative matrix and diagonal matrices D1 and D2 to obtain singular value P Expected contact frequency from biases and relative contact probabilities