SlideShare a Scribd company logo
Budget friendly sample
sizes for genomics research
Biostatistician, bioinformatician
Ognjen Milicevic, MD
Why do you need a
biostatistician?
Common biostatistics tasks
● Cleaning and transforming data
● Data description
● Statistical testing
● Tabulation and visualization
● Bioinformatics (applied statistics for genomics)
● Post-hoc power calculations
● ...
Common biostatistics tasks
● Cleaning and transforming data
● Data description
● Statistical testing
● Tabulation and visualization
● Bioinformatics (applied statistics for genomics)
● Post-hoc power calculations
● Complain they weren't consulted earlier
Post-hoc sample size / power analysis
● Due to convenience, we justify choices already made
● Find the similar effect size in literature
● Use the posterior distribution as prior
● Set the desired power (80-100%)
● Adjust as needed for dropout, loss, margin-of-error
● Obtain the sample size you already have
Make a wish,
biostatistician
Dear bioinformatician, how many samples do we need to
sequence to investigate...
NO CONVENIENCE!
● Not routinely done
● Effect size unknown
● Literature not helpful
● Multiple unknown genes
● Distribution is complex
● ...
RNA sequencing around the internet
DATA SCIENCE OF
RNA SEQUENCING
Natural variability of RNA per gene
De Torrente et al. (2020)
Surprisingly, the expression of less than 50% of all genes
was Normally-distributed, with other distributions including
Gamma, Bimodal, Cauchy, and Lognormal also
represented.
Liu et al. (2019)
Based on the analysis of a group of real gene expression
profiles, this study reveal that the primary density
distributions of the real profiles are normal/log-normal and
t distributions, accounting for 80% and 19% respectively.
20K+ genes
Representing RNAs with fragments
Gamma-Poisson distribution
Count and normalize to quantify (TPM)
Overview of the pipeline
Effect
between
groups
Inter-individual
variation in RNA
Batch effects
Representation
variability
Tissue
sample
Chemical
preparation
Sequencing
Count matrix and metadata
Each gene is an independent outcome
LAYERS UPON LAYERS OF VARIABILITY
So, what about those sample sizes?
COVID-19 RNA characterization
Example project
RNA characterization of COVID-19 (2021) - Plan
● Total RNA – virus and host (human)
● Nasopharyngeal swabs and blood samples
● Paired design (on admittance and discharge from hospital)
● 18 individuals, total of 72 samples
● Which biological pathways are affected? (DEG)
● What can we say about the viral load? (metagenomics)
Estimating sample size for RNA
● Theoretical models with assumed distributions
● Parameters inferred from previous datasets
● R-packages: RNASeqDesign, PROPER, powsimR, ssizeRNA
● Web tool: RNASeqSampleSize
● Variable result
● If cost is not relevant, choose the most conservative (largest)
Proposed approach
● Perform one estimate and use it
● Remove unwanted variability (batch
effect)
● Reduce variability with paired design
● Use meaningful metadata
● Filter the genes
● Remove unwanted variability
● Paired design
● Meaningful metadata
● Filter genes
A number of methods based on SVD remove high level batch effects
without specifically tracing them to interpretable variables.
One can use housekeeping or control genes as markers.
• SVA
• RUVseq
These methods produce new surrogate variables.
Colleague quote:
"Once I see batch effects, I can correct them mathematically, but I
never trust that dataset again."
Batch effects against the collaborative science!
● Remove unwanted variability
● Paired design
● Meaningful metadata
● Filter genes
Paired design - taking control samples from patients
after resolution or before the event.
● Increases power
● Not all analysis frameworks can take advantage of it
● Sometimes biologically difficult
● Reduces DF by half
● Remove unwanted variability
● Paired design
● Meaningful metadata
● Filter genes
Gender and age can always be relevant.
Collect metrics of sample quality (before and after
sequencing).
Disease subtypes can be a covariate or group variable.
Helps choosing when sequencing a subset.
● Remove unwanted variability
● Paired design
● Meaningful metadata
● Filter genes
Multiple testing correction for 20K+ genes.
Remove mostly unexpressed genes.
A priori removal is allowed.
Results
● EdgeR GLM
● Nasal DEG p<0.05:
40(paired)/51(unpaired)
● Blood DEG p<0.05:
76(paired)/2(unpaired)
● Every parameter choice changes
results
● Validation?
Annotation representation testing – Panther.db
● Annotation is a subset of genes
● Multiple available annotation sets (structure, function, pathway...)
● We only use significant genes
● Overrepresentation test – chi-square to compare observed and
expected frequencies
● Enrichment test – Mann-Whitney to test randomness of ranks
Molecular function in blood (PAIRED)
● Increased
immunoglobulin binding
● Reduced smell (in blood!)
● Reduced oxygen binding
and carrier activity
● We consider the result
validated
Takeaways of the study
● Study rescued by pairing
● No batch to correct
● Almost no metadata
● Smaller signal in blood
● Specific tissue (nasal) more
robust
WHAT HAPPENED?
Data science implications
Reduced individual variation
Effect
between
groups
Inter-individual
variation in RNA
Batch effects
Representation
variability
Tissue
sample
Chemical
preparation
Sequencing
Intra
Reduced batch effects
Effect
between
groups
Inter-individual
variation in RNA
Batch effects
Representation
variability
Tissue
sample
Chemical
preparation
Sequencing
Intra
Easier to control for batches
● Pairing absorbs a proportion of
batch effects
● Usually 8 lanes in a flowcell
● Focus on pairs instead of whole
samples
● Aggregation of datasets easier
Technical downsides of pairing
● Loss of half DF
● Many frameworks cannot use it as easily as GLM-based ones
● RNA is used for other analyses:
○ SUPPA2 for alternative splicing
○ Building empirical distribution from all pairs of samples
○ If pairing was implemented, would reduce the observations
drastically
SHOULD WE ALWAYS PAIR?
Medical implications
Tissue implications
● Specific tissues have robust signatures without pairing
● Blood reflects many tissues:
○ Weaker signal
○ Local changes reflected
● Systemic effects are found only in blood
● Always available for sampling (minimum invasive)
● Blood analysis benefits from pairing
Utility implications
● Paired designs are easier to aggregate to meta-studies (robust to
batch effects)
● Blood controls can be used as unpaired controls for other studies (if
healthy enough)
● Solves the problem of finding controls
● If controls are after resolution, questionable health (long COVID)
● Some chronic diseases cannot be caught early or ever resolved, so
pairing is impossible
Example – cardiovascular events
● We are interested in markers of
plaque progression/instability
● Patient checkup and sampling every
X months
● Sequencing is expensive, sampling
and storing is not
● Sequence only the previous two
samples before the event
Example – neurodegenerative disease (ALS)
● We cannot predict the disease (10% familial)
● Patient available for sampling once diseased
● Sequence patients sufficiently apart
● We cannot find the root cause of ALS, as we
are not catching the initial event
● We can find signatures of neuronal suffering
and death, which is an actionable point
● Generalizes to all chronic diseases
Example – cancer
● For DNA, tumor is matched with blood
sample control
● For RNA, we need the normal
surrounding tissue
● Sampling the healthy normal target
tissue may be problematic
● Tissue margin – potential normal
sample
● Admixture of tumor in normal reduces
the signal (but not critically for RNA)
Many thanks to...
● Institute for Biocides and Medical Ecology
for providing the samples and sequencing
● HTEC Group for providing computational
resources and support
● School of Medicine, University of Belgrade
for supporting research
● Thanks to DSC organizers for the invite
● Last but not least...
...THANK YOU FOR LISTENING!
ognjen.milicevic@med.bg.ac.rs
ognjen.milicevic@htecgroup.com
ognjen011@gmail.com

More Related Content

Similar to [DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen Milicevic

Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVS
Golden Helix
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
GenomeInABottle
 
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
DataScienceConferenc1
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
Setia Pramana
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
nist-spin
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
Nathan Olson
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
GenomeInABottle
 
Team c final slides
Team c final slidesTeam c final slides
Team c final slides
DEXINREN
 
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
QIAGEN
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappanElsa von Licy
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
Amandeep Kaur
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David Amar
Alexander Pico
 
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score RegressionPartitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
bbuliksullivan
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
UC Davis
 
Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection Oxford Gene Technology
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
QIAGEN
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
QIAGEN
 

Similar to [DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen Milicevic (20)

Whole Genome Trait Association in SVS
Whole Genome Trait Association in SVSWhole Genome Trait Association in SVS
Whole Genome Trait Association in SVS
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
[DSC Adria 23] Enes Deumic application of ai in genomics.pdf
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 
Team c final slides
Team c final slidesTeam c final slides
Team c final slides
 
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David Amar
 
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score RegressionPartitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection Addressing the growing demand for CNV and UPD detection
Addressing the growing demand for CNV and UPD detection
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
 

More from DataScienceConferenc1

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
DataScienceConferenc1
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
DataScienceConferenc1
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
DataScienceConferenc1
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
DataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
DataScienceConferenc1
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
DataScienceConferenc1
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
DataScienceConferenc1
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
DataScienceConferenc1
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
DataScienceConferenc1
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
DataScienceConferenc1
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
DataScienceConferenc1
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
DataScienceConferenc1
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
DataScienceConferenc1
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
DataScienceConferenc1
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
DataScienceConferenc1
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
DataScienceConferenc1
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
DataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
DataScienceConferenc1
 

More from DataScienceConferenc1 (20)

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
 

Recently uploaded

CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdfCHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
Sachin Sharma
 
POLYCYSTIC OVARIAN SYNDROME (PCOS)......
POLYCYSTIC OVARIAN SYNDROME (PCOS)......POLYCYSTIC OVARIAN SYNDROME (PCOS)......
POLYCYSTIC OVARIAN SYNDROME (PCOS)......
Ameena Kadar
 
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdfCHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
Sachin Sharma
 
When a patient should have kidney Transplant ?
When a patient should have kidney Transplant ?When a patient should have kidney Transplant ?
When a patient should have kidney Transplant ?
Dr. Sujit Chatterjee CEO Hiranandani Hospital
 
PET CT beginners Guide covers some of the underrepresented topics in PET CT
PET CT  beginners Guide  covers some of the underrepresented topics  in PET CTPET CT  beginners Guide  covers some of the underrepresented topics  in PET CT
PET CT beginners Guide covers some of the underrepresented topics in PET CT
MiadAlsulami
 
ICH Guidelines for Pharmacovigilance.pdf
ICH Guidelines for Pharmacovigilance.pdfICH Guidelines for Pharmacovigilance.pdf
ICH Guidelines for Pharmacovigilance.pdf
NEHA GUPTA
 
Tips for Pet Care in winters How to take care of pets.
Tips for Pet Care in winters How to take care of pets.Tips for Pet Care in winters How to take care of pets.
Tips for Pet Care in winters How to take care of pets.
Dinesh Chauhan
 
ABDOMINAL COMPARTMENT SYSNDROME
ABDOMINAL COMPARTMENT SYSNDROMEABDOMINAL COMPARTMENT SYSNDROME
ABDOMINAL COMPARTMENT SYSNDROME
Rommel Luis III Israel
 
Dimensions of Healthcare Quality
Dimensions of Healthcare QualityDimensions of Healthcare Quality
Dimensions of Healthcare Quality
Naeemshahzad51
 
Haridwar ❤CALL Girls 🔝 89011★83002 🔝 ❤ℂall Girls IN Haridwar ESCORT SERVICE❤
Haridwar ❤CALL Girls 🔝 89011★83002 🔝 ❤ℂall Girls IN Haridwar ESCORT SERVICE❤Haridwar ❤CALL Girls 🔝 89011★83002 🔝 ❤ℂall Girls IN Haridwar ESCORT SERVICE❤
Haridwar ❤CALL Girls 🔝 89011★83002 🔝 ❤ℂall Girls IN Haridwar ESCORT SERVICE❤
ranishasharma67
 
How many patients does case series should have In comparison to case reports.pdf
How many patients does case series should have In comparison to case reports.pdfHow many patients does case series should have In comparison to case reports.pdf
How many patients does case series should have In comparison to case reports.pdf
pubrica101
 
CONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docxCONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docx
PGIMS Rohtak
 
The Importance of Community Nursing Care.pdf
The Importance of Community Nursing Care.pdfThe Importance of Community Nursing Care.pdf
The Importance of Community Nursing Care.pdf
AD Healthcare
 
The Impact of Meeting: How It Can Change Your Life
The Impact of Meeting: How It Can Change Your LifeThe Impact of Meeting: How It Can Change Your Life
The Impact of Meeting: How It Can Change Your Life
ranishasharma67
 
ventilator, child on ventilator, newborn
ventilator, child on ventilator, newbornventilator, child on ventilator, newborn
ventilator, child on ventilator, newborn
Pooja Rani
 
10 Ideas for Enhancing Your Meeting Experience
10 Ideas for Enhancing Your Meeting Experience10 Ideas for Enhancing Your Meeting Experience
10 Ideas for Enhancing Your Meeting Experience
ranishasharma67
 
Empowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
Empowering ACOs: Leveraging Quality Management Tools for MIPS and BeyondEmpowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
Empowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
Health Catalyst
 
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
o6ov5dqmf
 
Introduction to Forensic Pathology course
Introduction to Forensic Pathology courseIntroduction to Forensic Pathology course
Introduction to Forensic Pathology course
fprxsqvnz5
 
India Clinical Trials Market: Industry Size and Growth Trends [2030] Analyzed...
India Clinical Trials Market: Industry Size and Growth Trends [2030] Analyzed...India Clinical Trials Market: Industry Size and Growth Trends [2030] Analyzed...
India Clinical Trials Market: Industry Size and Growth Trends [2030] Analyzed...
Kumar Satyam
 

Recently uploaded (20)

CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdfCHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
CHAPTER 1 SEMESTER V - ROLE OF PEADIATRIC NURSE.pdf
 
POLYCYSTIC OVARIAN SYNDROME (PCOS)......
POLYCYSTIC OVARIAN SYNDROME (PCOS)......POLYCYSTIC OVARIAN SYNDROME (PCOS)......
POLYCYSTIC OVARIAN SYNDROME (PCOS)......
 
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdfCHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
 
When a patient should have kidney Transplant ?
When a patient should have kidney Transplant ?When a patient should have kidney Transplant ?
When a patient should have kidney Transplant ?
 
PET CT beginners Guide covers some of the underrepresented topics in PET CT
PET CT  beginners Guide  covers some of the underrepresented topics  in PET CTPET CT  beginners Guide  covers some of the underrepresented topics  in PET CT
PET CT beginners Guide covers some of the underrepresented topics in PET CT
 
ICH Guidelines for Pharmacovigilance.pdf
ICH Guidelines for Pharmacovigilance.pdfICH Guidelines for Pharmacovigilance.pdf
ICH Guidelines for Pharmacovigilance.pdf
 
Tips for Pet Care in winters How to take care of pets.
Tips for Pet Care in winters How to take care of pets.Tips for Pet Care in winters How to take care of pets.
Tips for Pet Care in winters How to take care of pets.
 
ABDOMINAL COMPARTMENT SYSNDROME
ABDOMINAL COMPARTMENT SYSNDROMEABDOMINAL COMPARTMENT SYSNDROME
ABDOMINAL COMPARTMENT SYSNDROME
 
Dimensions of Healthcare Quality
Dimensions of Healthcare QualityDimensions of Healthcare Quality
Dimensions of Healthcare Quality
 
Haridwar ❤CALL Girls 🔝 89011★83002 🔝 ❤ℂall Girls IN Haridwar ESCORT SERVICE❤
Haridwar ❤CALL Girls 🔝 89011★83002 🔝 ❤ℂall Girls IN Haridwar ESCORT SERVICE❤Haridwar ❤CALL Girls 🔝 89011★83002 🔝 ❤ℂall Girls IN Haridwar ESCORT SERVICE❤
Haridwar ❤CALL Girls 🔝 89011★83002 🔝 ❤ℂall Girls IN Haridwar ESCORT SERVICE❤
 
How many patients does case series should have In comparison to case reports.pdf
How many patients does case series should have In comparison to case reports.pdfHow many patients does case series should have In comparison to case reports.pdf
How many patients does case series should have In comparison to case reports.pdf
 
CONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docxCONSTRUCTION OF TEST IN MANAGEMENT .docx
CONSTRUCTION OF TEST IN MANAGEMENT .docx
 
The Importance of Community Nursing Care.pdf
The Importance of Community Nursing Care.pdfThe Importance of Community Nursing Care.pdf
The Importance of Community Nursing Care.pdf
 
The Impact of Meeting: How It Can Change Your Life
The Impact of Meeting: How It Can Change Your LifeThe Impact of Meeting: How It Can Change Your Life
The Impact of Meeting: How It Can Change Your Life
 
ventilator, child on ventilator, newborn
ventilator, child on ventilator, newbornventilator, child on ventilator, newborn
ventilator, child on ventilator, newborn
 
10 Ideas for Enhancing Your Meeting Experience
10 Ideas for Enhancing Your Meeting Experience10 Ideas for Enhancing Your Meeting Experience
10 Ideas for Enhancing Your Meeting Experience
 
Empowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
Empowering ACOs: Leveraging Quality Management Tools for MIPS and BeyondEmpowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
Empowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
 
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
 
Introduction to Forensic Pathology course
Introduction to Forensic Pathology courseIntroduction to Forensic Pathology course
Introduction to Forensic Pathology course
 
India Clinical Trials Market: Industry Size and Growth Trends [2030] Analyzed...
India Clinical Trials Market: Industry Size and Growth Trends [2030] Analyzed...India Clinical Trials Market: Industry Size and Growth Trends [2030] Analyzed...
India Clinical Trials Market: Industry Size and Growth Trends [2030] Analyzed...
 

[DigiHealth 22] Budget friendly sample sizes for genomics research - Ognjen Milicevic

  • 1. Budget friendly sample sizes for genomics research Biostatistician, bioinformatician Ognjen Milicevic, MD
  • 2. Why do you need a biostatistician?
  • 3. Common biostatistics tasks ● Cleaning and transforming data ● Data description ● Statistical testing ● Tabulation and visualization ● Bioinformatics (applied statistics for genomics) ● Post-hoc power calculations ● ...
  • 4. Common biostatistics tasks ● Cleaning and transforming data ● Data description ● Statistical testing ● Tabulation and visualization ● Bioinformatics (applied statistics for genomics) ● Post-hoc power calculations ● Complain they weren't consulted earlier
  • 5.
  • 6. Post-hoc sample size / power analysis ● Due to convenience, we justify choices already made ● Find the similar effect size in literature ● Use the posterior distribution as prior ● Set the desired power (80-100%) ● Adjust as needed for dropout, loss, margin-of-error ● Obtain the sample size you already have
  • 8. Dear bioinformatician, how many samples do we need to sequence to investigate...
  • 9. NO CONVENIENCE! ● Not routinely done ● Effect size unknown ● Literature not helpful ● Multiple unknown genes ● Distribution is complex ● ...
  • 10. RNA sequencing around the internet
  • 11. DATA SCIENCE OF RNA SEQUENCING
  • 12. Natural variability of RNA per gene De Torrente et al. (2020) Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Liu et al. (2019) Based on the analysis of a group of real gene expression profiles, this study reveal that the primary density distributions of the real profiles are normal/log-normal and t distributions, accounting for 80% and 19% respectively. 20K+ genes
  • 13. Representing RNAs with fragments Gamma-Poisson distribution Count and normalize to quantify (TPM)
  • 14. Overview of the pipeline Effect between groups Inter-individual variation in RNA Batch effects Representation variability Tissue sample Chemical preparation Sequencing
  • 15. Count matrix and metadata Each gene is an independent outcome
  • 16. LAYERS UPON LAYERS OF VARIABILITY So, what about those sample sizes?
  • 18. RNA characterization of COVID-19 (2021) - Plan ● Total RNA – virus and host (human) ● Nasopharyngeal swabs and blood samples ● Paired design (on admittance and discharge from hospital) ● 18 individuals, total of 72 samples ● Which biological pathways are affected? (DEG) ● What can we say about the viral load? (metagenomics)
  • 19. Estimating sample size for RNA ● Theoretical models with assumed distributions ● Parameters inferred from previous datasets ● R-packages: RNASeqDesign, PROPER, powsimR, ssizeRNA ● Web tool: RNASeqSampleSize ● Variable result ● If cost is not relevant, choose the most conservative (largest)
  • 20. Proposed approach ● Perform one estimate and use it ● Remove unwanted variability (batch effect) ● Reduce variability with paired design ● Use meaningful metadata ● Filter the genes
  • 21. ● Remove unwanted variability ● Paired design ● Meaningful metadata ● Filter genes A number of methods based on SVD remove high level batch effects without specifically tracing them to interpretable variables. One can use housekeeping or control genes as markers. • SVA • RUVseq These methods produce new surrogate variables. Colleague quote: "Once I see batch effects, I can correct them mathematically, but I never trust that dataset again."
  • 22. Batch effects against the collaborative science!
  • 23. ● Remove unwanted variability ● Paired design ● Meaningful metadata ● Filter genes Paired design - taking control samples from patients after resolution or before the event. ● Increases power ● Not all analysis frameworks can take advantage of it ● Sometimes biologically difficult ● Reduces DF by half
  • 24. ● Remove unwanted variability ● Paired design ● Meaningful metadata ● Filter genes Gender and age can always be relevant. Collect metrics of sample quality (before and after sequencing). Disease subtypes can be a covariate or group variable. Helps choosing when sequencing a subset.
  • 25. ● Remove unwanted variability ● Paired design ● Meaningful metadata ● Filter genes Multiple testing correction for 20K+ genes. Remove mostly unexpressed genes. A priori removal is allowed.
  • 26. Results ● EdgeR GLM ● Nasal DEG p<0.05: 40(paired)/51(unpaired) ● Blood DEG p<0.05: 76(paired)/2(unpaired) ● Every parameter choice changes results ● Validation?
  • 27. Annotation representation testing – Panther.db ● Annotation is a subset of genes ● Multiple available annotation sets (structure, function, pathway...) ● We only use significant genes ● Overrepresentation test – chi-square to compare observed and expected frequencies ● Enrichment test – Mann-Whitney to test randomness of ranks
  • 28. Molecular function in blood (PAIRED) ● Increased immunoglobulin binding ● Reduced smell (in blood!) ● Reduced oxygen binding and carrier activity ● We consider the result validated
  • 29. Takeaways of the study ● Study rescued by pairing ● No batch to correct ● Almost no metadata ● Smaller signal in blood ● Specific tissue (nasal) more robust
  • 31. Reduced individual variation Effect between groups Inter-individual variation in RNA Batch effects Representation variability Tissue sample Chemical preparation Sequencing Intra
  • 32. Reduced batch effects Effect between groups Inter-individual variation in RNA Batch effects Representation variability Tissue sample Chemical preparation Sequencing Intra
  • 33. Easier to control for batches ● Pairing absorbs a proportion of batch effects ● Usually 8 lanes in a flowcell ● Focus on pairs instead of whole samples ● Aggregation of datasets easier
  • 34. Technical downsides of pairing ● Loss of half DF ● Many frameworks cannot use it as easily as GLM-based ones ● RNA is used for other analyses: ○ SUPPA2 for alternative splicing ○ Building empirical distribution from all pairs of samples ○ If pairing was implemented, would reduce the observations drastically
  • 35. SHOULD WE ALWAYS PAIR? Medical implications
  • 36. Tissue implications ● Specific tissues have robust signatures without pairing ● Blood reflects many tissues: ○ Weaker signal ○ Local changes reflected ● Systemic effects are found only in blood ● Always available for sampling (minimum invasive) ● Blood analysis benefits from pairing
  • 37. Utility implications ● Paired designs are easier to aggregate to meta-studies (robust to batch effects) ● Blood controls can be used as unpaired controls for other studies (if healthy enough) ● Solves the problem of finding controls ● If controls are after resolution, questionable health (long COVID) ● Some chronic diseases cannot be caught early or ever resolved, so pairing is impossible
  • 38. Example – cardiovascular events ● We are interested in markers of plaque progression/instability ● Patient checkup and sampling every X months ● Sequencing is expensive, sampling and storing is not ● Sequence only the previous two samples before the event
  • 39. Example – neurodegenerative disease (ALS) ● We cannot predict the disease (10% familial) ● Patient available for sampling once diseased ● Sequence patients sufficiently apart ● We cannot find the root cause of ALS, as we are not catching the initial event ● We can find signatures of neuronal suffering and death, which is an actionable point ● Generalizes to all chronic diseases
  • 40. Example – cancer ● For DNA, tumor is matched with blood sample control ● For RNA, we need the normal surrounding tissue ● Sampling the healthy normal target tissue may be problematic ● Tissue margin – potential normal sample ● Admixture of tumor in normal reduces the signal (but not critically for RNA)
  • 41. Many thanks to... ● Institute for Biocides and Medical Ecology for providing the samples and sequencing ● HTEC Group for providing computational resources and support ● School of Medicine, University of Belgrade for supporting research ● Thanks to DSC organizers for the invite ● Last but not least...
  • 42. ...THANK YOU FOR LISTENING!

Editor's Notes

  1. Hello, my name is Ognjen Milicevic from Belgrade, Serbia. Because of my mixed medical and engineering background, today I chose to tackle an interdisciplinary subject -