ACCOST is a method for differential analysis of Hi-C data between two conditions with replicates. It models Hi-C interaction counts with a negative binomial distribution that accounts for distance effects between loci through an offset term. ACCOST normalizes counts with ICE and estimates model parameters to obtain a p-value for each bin pair comparing the two conditions. It was validated on several datasets and shown to identify more differential contacts than other methods like diffHic and FIND, particularly at short genomic distances.
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
Mini useR! in Melbourne https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network/events/251933078/
MelbURN (Melbourne useR group) https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network
July 16th, 2018
Melbourne, Australia
Dimensionality reduction by matrix factorization using concept lattice in dat...eSAT Journals
Abstract Concept lattices is the important technique that has become a standard in data analytics and knowledge presentation in many fields such as statistics, artificial intelligence, pattern recognition ,machine learning ,information theory ,social networks, information retrieval system and software engineering. Formal concepts are adopted as the primitive notion. A concept is jointly defined as a pair consisting of the intension and the extension. FCA can handle with huge amount of data it generates concepts and rules and data visualization. Matrix factorization methods have recently received greater exposure, mainly as an unsupervised learning method for latent variable decomposition. In this paper a novel method is proposed to decompose such concepts by using Boolean Matrix Factorization for dimensionality reduction. This paper focuses on finding all the concepts and the object intersections. Keywords: Data mining, formal concepts, lattice, matrix factorization dimensionality reduction.
Medical pathology images are visually evaluated by experts for disease diagnosis, but the connectionbetween image features and the state of the cells in an image is typically unknown. To understand thisrelationship, we describe a multimodal modeling and inference framework that estimates shared latentstructure of joint gene expression levels and medical image features. The method is built aroundprobabilistic canonical correlation analysis (PCCA), which is jointly fit to image embeddings that are learnedusing convolutional neural networks and linear embeddings of paired gene expression data. We finallydiscuss a set of theoretical and empirical challenges in domain adaptation settings arising from genomics data.(based on work in collab with Gregory Gundersen and Barbara E. Engelhardt)
PhD Dissertation Talk, 22 April 2011
----
The main topic of this thesis addresses the important problem of mining numerical data, and especially gene expression data. These data characterize the behaviour of thousand of genes in various biological situations (time, cell, etc.).
A difficult task consists in clustering genes to obtain classes of genes with similar behaviour, supposed to be involved together within a biological process.
Accordingly, we are interested in designing and comparing methods in the field of knowledge discovery from biological data. We propose to study how the conceptual classification method called Formal Concept Analysis (FCA) can handle the problem of extracting interesting classes of genes. For this purpose, we have designed and experimented several original methods based on an extension of FCA called pattern structures. Furthermore, we show that these methods can enhance decision making in agronomy and crop sanity in the vast formal domain of information fusion.
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
For the last five decades, computational physics has been a valuable scientific instrument in physics. In comparison to using only theoretical and experimental approaches, it has enabled physicists to understand complex problems better. Computational physics was mostly a scientific activity at the time, with relatively few organised undergraduate study.
Ph.D. Assistance serves as an external mentor to brainstorm your idea and translate that into a research model. Hiring a mentor or tutor is common and therefore let your research committee know about the same. We do not offer any writing services without the involvement of the researcher.
Learn More: https://bit.ly/3AUvG0y
Contact Us:
Website: https://www.phdassistance.com/
UK NO: +44–1143520021
India No: +91–4448137070
WhatsApp No: +91 91769 66446
Email: info@phdassistance.com
Since the advent of the horseshoe priors for regularization, global-local shrinkage methods have proved to be a fertile ground for the development of Bayesian theory and methodology in machine learning. They have achieved remarkable success in computation, and enjoy strong theoretical support. Much of the existing literature has focused on the linear Gaussian case. The purpose of the current talk is to demonstrate that the horseshoe priors are useful more broadly, by reviewing both methodological and computational developments in complex models that are more relevant to machine learning applications. Specifically, we focus on methodological challenges in horseshoe regularization in nonlinear and non-Gaussian models; multivariate models; and deep neural networks. We also outline the recent computational developments in horseshoe shrinkage for complex models along with a list of available software implementations that allows one to venture out beyond the comfort zone of the canonical linear regression problems.
Presentation of TinkerPlots software
Persatuan Pendidikan Matematik Malaysia
Jabatan Matematik & Sains
Universiti Malaya
Date: 16/3/2010 - Tue 10.30am
Venue: Computer lab UM, Faculty of Education
Presenter: Hau Miee Yaen
Kernel based approaches in drug target interaction predictionXinyi Z.
This article reviews state-of-art kernel based methods for predicting drug-target interaction.
Three typical methods framing the prediction task into dierent problems
have been implemented on the same gold standard dataset, the experimental results
of which shows an overall outperforming of LapRLS, followed by Bayesian model
and further by BLM. The unknown interactions of new compounds or proteins complicate
predictions on drug or target data so that AUPR values for pair predictions
always achieves the highest. The universal lower AUPR values on drug predictions
than that of targets might be a result of a more ecient similarity presentation of
proteins.
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
Mini useR! in Melbourne https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network/events/251933078/
MelbURN (Melbourne useR group) https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network
July 16th, 2018
Melbourne, Australia
Dimensionality reduction by matrix factorization using concept lattice in dat...eSAT Journals
Abstract Concept lattices is the important technique that has become a standard in data analytics and knowledge presentation in many fields such as statistics, artificial intelligence, pattern recognition ,machine learning ,information theory ,social networks, information retrieval system and software engineering. Formal concepts are adopted as the primitive notion. A concept is jointly defined as a pair consisting of the intension and the extension. FCA can handle with huge amount of data it generates concepts and rules and data visualization. Matrix factorization methods have recently received greater exposure, mainly as an unsupervised learning method for latent variable decomposition. In this paper a novel method is proposed to decompose such concepts by using Boolean Matrix Factorization for dimensionality reduction. This paper focuses on finding all the concepts and the object intersections. Keywords: Data mining, formal concepts, lattice, matrix factorization dimensionality reduction.
Medical pathology images are visually evaluated by experts for disease diagnosis, but the connectionbetween image features and the state of the cells in an image is typically unknown. To understand thisrelationship, we describe a multimodal modeling and inference framework that estimates shared latentstructure of joint gene expression levels and medical image features. The method is built aroundprobabilistic canonical correlation analysis (PCCA), which is jointly fit to image embeddings that are learnedusing convolutional neural networks and linear embeddings of paired gene expression data. We finallydiscuss a set of theoretical and empirical challenges in domain adaptation settings arising from genomics data.(based on work in collab with Gregory Gundersen and Barbara E. Engelhardt)
PhD Dissertation Talk, 22 April 2011
----
The main topic of this thesis addresses the important problem of mining numerical data, and especially gene expression data. These data characterize the behaviour of thousand of genes in various biological situations (time, cell, etc.).
A difficult task consists in clustering genes to obtain classes of genes with similar behaviour, supposed to be involved together within a biological process.
Accordingly, we are interested in designing and comparing methods in the field of knowledge discovery from biological data. We propose to study how the conceptual classification method called Formal Concept Analysis (FCA) can handle the problem of extracting interesting classes of genes. For this purpose, we have designed and experimented several original methods based on an extension of FCA called pattern structures. Furthermore, we show that these methods can enhance decision making in agronomy and crop sanity in the vast formal domain of information fusion.
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
For the last five decades, computational physics has been a valuable scientific instrument in physics. In comparison to using only theoretical and experimental approaches, it has enabled physicists to understand complex problems better. Computational physics was mostly a scientific activity at the time, with relatively few organised undergraduate study.
Ph.D. Assistance serves as an external mentor to brainstorm your idea and translate that into a research model. Hiring a mentor or tutor is common and therefore let your research committee know about the same. We do not offer any writing services without the involvement of the researcher.
Learn More: https://bit.ly/3AUvG0y
Contact Us:
Website: https://www.phdassistance.com/
UK NO: +44–1143520021
India No: +91–4448137070
WhatsApp No: +91 91769 66446
Email: info@phdassistance.com
Since the advent of the horseshoe priors for regularization, global-local shrinkage methods have proved to be a fertile ground for the development of Bayesian theory and methodology in machine learning. They have achieved remarkable success in computation, and enjoy strong theoretical support. Much of the existing literature has focused on the linear Gaussian case. The purpose of the current talk is to demonstrate that the horseshoe priors are useful more broadly, by reviewing both methodological and computational developments in complex models that are more relevant to machine learning applications. Specifically, we focus on methodological challenges in horseshoe regularization in nonlinear and non-Gaussian models; multivariate models; and deep neural networks. We also outline the recent computational developments in horseshoe shrinkage for complex models along with a list of available software implementations that allows one to venture out beyond the comfort zone of the canonical linear regression problems.
Presentation of TinkerPlots software
Persatuan Pendidikan Matematik Malaysia
Jabatan Matematik & Sains
Universiti Malaya
Date: 16/3/2010 - Tue 10.30am
Venue: Computer lab UM, Faculty of Education
Presenter: Hau Miee Yaen
Kernel based approaches in drug target interaction predictionXinyi Z.
This article reviews state-of-art kernel based methods for predicting drug-target interaction.
Three typical methods framing the prediction task into dierent problems
have been implemented on the same gold standard dataset, the experimental results
of which shows an overall outperforming of LapRLS, followed by Bayesian model
and further by BLM. The unknown interactions of new compounds or proteins complicate
predictions on drug or target data so that AUPR values for pair predictions
always achieves the highest. The universal lower AUPR values on drug predictions
than that of targets might be a result of a more ecient similarity presentation of
proteins.
BPSO&1-NN algorithm-based variable selection for power system stability ident...IJAEMSJORNAL
Due to the very high nonlinearity of the power system, traditional analytical methods take a lot of time to solve, causing delay in decision-making. Therefore, quickly detecting power system instability helps the control system to make timely decisions become the key factor to ensure stable operation of the power system. Power system stability identification encounters large data set size problem. The need is to select representative variables as input variables for the identifier. This paper proposes to apply wrapper method to select variables. In which, Binary Particle Swarm Optimization (BPSO) algorithm combines with K-NN (K=1) identifier to search for good set of variables. It is named BPSO&1-NN. Test results on IEEE 39-bus diagram show that the proposed method achieves the goal of reducing variables with high accuracy.
Prediction model of algal blooms using logistic regression and confusion matrix IJECEIAES
Algal blooms data are collected and refined as experimental data for algal blooms prediction. Refined algal blooms dataset is analyzed by logistic regression analysis, and statistical tests and regularization are performed to find the marine environmental factors affecting algal blooms. The predicted value of algal bloom is obtained through logistic regression analysis using marine environment factors affecting algal blooms. The actual values and the predicted values of algal blooms dataset are applied to the confusion matrix. By improving the decision boundary of the existing logistic regression, and accuracy, sensitivity and precision for algal blooms prediction are improved. In this paper, the algal blooms prediction model is established by the ensemble method using logistic regression and confusion matrix. Algal blooms prediction is improved, and this is verified through big data analysis.
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...ijsc
The study treated two closer alternative methods of which the principal characteristic: a non-parametric
method (the least absolute deviation (LAD)) and a traditional method of diagnosis OLS.This was applied to
model, separately, the indices of retention of the same whole of 35 pyrazines (27 pyrazines with 8 other
pyrazines in the same unit) eluted to the columns OV-101 and Carbowax-20M, by using theoretical
molecular descriptors calculated using the software DRAGON. The detection of influential observations for
non-parametric method (LAD) is a problem which has been extensively studied and offers alternative
dicapproaches whose main feature is the robustness.here is presented and compared with the standard
least squares regression .The comparison between methods LAD and OLS is based on the equation of the
hyperplane, in order to confirm the robustness thus to detect by the meaningless statements and the points
of lever and validated results in the state approached by the tests statistics: Test of Anderson-Darling,
shapiro-wilk, Agostino, Jarque-Bera, graphic test (histogram of frequency) and the confidence interval
thanks to the concept of robustness to check if the distribution of the errors is really approximate.
Treatment by Alternative Methods of Regression Gas Chromatographic Retention ...ijsc
The study treated two closer alternative methods of which the principal characteristic: a non-parametric method (the least absolute deviation (LAD)) and a traditional method of diagnosis OLS.This was applied to model, separately, the indices of retention of the same whole of 35 pyrazines (27 pyrazines with 8 other pyrazines in the same unit) eluted to the columns OV-101 and Carbowax-20M, by using theoretical molecular descriptors calculated using the software DRAGON. The detection of influential observations for non-parametric method (LAD) is a problem which has been extensively studied and offers alternative dicapproaches whose main feature is the robustness.here is presented and compared with the standard least squares regression .The comparison between methods LAD and OLS is based on the equation of the hyperplane, in order to confirm the robustness thus to detect by the meaningless statements and the points of lever and validated results in the state approached by the tests statistics: Test of Anderson-Darling, shapiro-wilk, Agostino, Jarque-Bera, graphic test (histogram of frequency) and the confidence interval thanks to the concept of robustness to check if the distribution of the errors is really approximate.
Advanced biometrical and quantitative genetics akshayAkshay Deshmukh
Additive and Multiplicative Model
Shifted Multiplicative Model
Analysis and Selection of Genotype
Methods and steps to select the best model
Bioplot and mapping genotype
Understanding Protein Function on a Genome-scale through the Analysis of Molecular Networks
Cornell Medical School, Physiology, Biophysics and Systems Biology (PBSB) graduate program, 2009.01.26, 16:00-17:00; [I:CORNELL-PBSB] (Long networks talk, incl. the following topics: why networks w. amsci*, funnygene*, net. prediction intro, memint*, tse*, essen*, sandy*, metagenomics*, netpossel*, tyna*+ topnet*, & pubnet* . Fits easily into 60’ w. 10’ questions. PPT works on mac & PC and has many photos w. EXIF tag kwcornellpbsb .)
Date Given: 01/26/2009
Multiple Linear Regression Model with Two Parameter Doubly Truncated New Symm...theijes
The most commonly used method to describe the relationship between response and independent variables is a linear model with Gaussian distributed errors. In practical components, the variables examined might not be mesokurtic and the populace values probably finitely limited. In this paper, we introduce a multiple linear regression models with two-parameter doubly truncated new symmetric distributed (DTNSD) errors for the first time. To estimate the model parameters we used the method of maximum likelihood (ML) and ordinary least squares (OLS). The model desires criteria such as Akaike information criteria (AIC) and Bayesian information criteria (BIC) for the models are used. A simulation study is performed to analysis the properties of the model parameters. A comparative study of doubly truncated new symmetric linear regression models on the Gaussian model showed that the proposed model gives good fit to the data sets for the error term follow DTNSD
large data set is not available for some disease such as Brain Tumor. This and part2 presentation shows how to find "Actionable solution from a difficult cancer dataset
This paper tries to compare more accurate and efficient L1 norm regression algorithms. Other comparative studies are mentioned, and their conclusions are discussed. Many experiments have been performed to evaluate the comparative efficiency and accuracy of the selected algorithms.
Treatment by alternative methods of regression gas chromatographic retention ...ijics
The study treated two closer alternative methods of which the principal characteristic: a non-parametric
method (the least absolute deviation (LAD)) and a traditional method of diagnosis OLS.This was applied to
model, separately, the indices of retention of the same whole of 35 pyrazines (27 pyrazines with 8 other
pyrazines in the same unit) eluted to the columns OV-101 and Carbowax-20M, by using theoretical
molecular descriptors calculated using the software DRAGON. The detection of influential observations for
non-parametric method (LAD) is a problem which has been extensively studied and offers alternative
dicapproaches whose main feature is the robustness .here is presented and compared with the standard
least squares regression .The comparison between methods LAD and OLS is based on the equation of the
hyperplane, in order to confirm the robustness thus to detect by the meaningless statements and the points
of lever and validated results in the state approached by the tests statistics: Test of Anderson-Darling,
shapiro-wilk, Agostino, Jarque-Bera, graphic test (histogram of frequency) and the confidence interval
thanks to the concept of robustness to check if the distribution of the errors is really approximate.
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION ...ijcisjournal
The study treated two closer alternative methods of which the principal characteristic: a non-parametric method (the least absolute deviation (LAD)) and a traditional method of diagnosis OLS.This was applied to model, separately, the indices of retention of the same whole of 35 pyrazines (27 pyrazines with 8 other pyrazines in the same unit) eluted to the columns OV-101 and Carbowax-20M, by using theoretical molecular descriptors calculated using the software DRAGON. The detection of influential observations for non-parametric method (LAD) is a problem which has been extensively studied and offers alternative dicapproaches whose main feature is the robustness .here is presented and compared with the standard least squares regression .The comparison between methods LAD and OLS is based on the equation of the hyperplane, in order to confirm the robustness thus to detect by the meaningless statements and the points of lever and validated results in the state approached by the tests statistics: Test of Anderson-Darling, shapiro-wilk, Agostino, Jarque-Bera, graphic test (histogram of frequency) and the confidence interval thanks to the concept of robustness to check if the distribution of the errors is really approximate.
Using QR Decomposition to calculate the sum of squares of a model has a limitation that the number of rows,
which is also the number of observations or responses, has to be greater than the total number of parameters used in the
model. The main goal in the experimental design model, as a part of the Linear Model, is to analyze the estimable function
of the parameters used in the model. In order not to deal with generalized invers, partitioned design matrix may be used
instead. This partitioned design matrix method may be used to calculate the sum of squares of the models whenever the total
number of parameters is greater than the number of observations. It can also be used to find the degrees of freedom of each
source of variation components. This method is discussed in a Balanced Nested-Factorial Experimental Design.
Similar to 'ACCOST' for differential HiC analysis (20)
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
1. ‘ACCOST’ for di erential HiC analysis‘ACCOST’ for di erential HiC analysis
Nathalie Vialaneix, INRAE/MIATNathalie Vialaneix, INRAE/MIAT
Chrocogen, July 10th, 2020Chrocogen, July 10th, 2020
1 / 241 / 24
3. First ofall... in the previous episodes...First ofall... in the previous episodes...
3 / 243 / 24
4. Topic
(What is this presentation about?)
When two sets of Hi-C matrices have been collected in two different
conditions, what are the available methods to compare the matrices and
identify regions that are significantly different between the conditions?
Comparison usually means: at a bin pair level.
4 / 24
5. Notations and formal de nition ofthe problem
Hi-C matrices: for , Hi-C matrices
Conditions: 2 conditions and such that and
Interactions: is the interaction frequency (in ) for bin pair
where and are two genomic loci, in the matrix
Question: for all pair , test the following assumption:
in which is the random variable that represents the number of contacts
(interaction frequency) between loci and in condition .
H
t
t =, … , T T
C1 C2 C1 ∪ C2 = {1, … , T }
C1 ∩ C2 = ∅
h
t
ij
N
+
(i, j)
i j t
(i, j)
H
ij
0
: N
C1
ij
= N
C2
ij
N
Cr
ij
i j Cr
5 / 24
6. 1. Prior di erential analysis
Most methods start to correct sequencing bias (between matrices
normalization)
Standard sequencing depth normalization [Anders & Huber, 2010] to
obtain equal total number of counts between the different samples (R
package edgeR)
MA plot correction [Lun & Smyth, 2015] and improvement by [Stansfield
et al, 2019] to correct trend in MA (mean versus difference) plots for every
pair of samples (R packages diffHic/csaw and multiHiCcompare)
MD plot correction [Stansfield et al, 2018] to correct trend in MD (distance
versus difference) plots for every pair of samples (R package HiCcompare)
6 / 24
8. 2. Compute a -value per bin
Z score computation [Stansfield et al, 2018] that is based on quantiles of
scaled and centered M values (R package HiCcompare)
that is used when there is no replicate (one sample per
condition)
that is very fast and easy to use
but is a bit low on the theoretical side (no strong evidence)
p
T = 2
8 / 24
9. 2. Compute a -value per bin
Z score computation [Stansfield et al, 2018], (R package HiCcompare)
models [Lun & Smyth, 2015] that is based on Negative Binomial GLM
and statistical tests (R package diffHic)
that needs at least 3 replicates per condition to be used
that is not restricted to two conditions and that can include various
covariates
but is statistically better justified
[Stansfield et al., 2019] (R package multiHiCompare) also do that with small
changes (normalization...)
[Zaborowski and Wilczyński, 2020] also use this distribution but within
distance pools and counts are explained by counts in the other condition
rather than by the condition itself
p
N B
9 / 24
10. 2. Compute a -value per bin
Z score computation [Stansfield et al, 2018], (R package HiCcompare)
models [Lun & Smyth, 2015], (R package diffHic)
In both these approaches, a -value is computed for every bin pair and -
values are corrected by multiple correction procedures (not described)
But spatial dependencies between pairs of bins are not included in the
methods!!
p
N B
p p
10 / 24
11. 2. Compute a -value per bin taking spatial
dependencies into account
Using an analogy with neuroimaging and spatial Poisson processes
[Djekidel et al, 2018] (R package FIND)
needs at least 2 replicates per condition to be used
seems to be restricted to two conditions (but could maybe be easily
extended to more) and can include various covariates
is statistically (more or less) justified (from previous work on image
analysis)
uses tests at bin pair level with multiple corrections but those tests are
based on the value of the bin pair and its neighbors
is shown to work well for high resolution differential analysis (seems
to provide better results for 5kb bins)
p
11 / 24
12. 2. Compute a -value per bin taking spatial
dependencies into account
Using an analogy with neuroimaging and spatial Poisson processes
[Djekidel et al, 2018] (R package FIND)
Using distance based correction and Gaussian filter comparison [Ardakany
et al, 2019] (Python/Matlab scripts selfish available on Github):
not sensitive to sequencing bias and does not require (between
matrix) normalization
only suited to (no replicate) for 2 conditions
p
T = 2
12 / 24
13. Other tools (not reviewed for the moment... but
mentionned in the article)
HOMER (binomial based test between two samples)
ChromoR (transformation into Gaussian measurements and Bayesian factor
analysis)
HiBrowse (based on edgeR, as diffHic and others)
13 / 24
14. Now... coming back to ACCOST!Now... coming back to ACCOST!
14 / 2414 / 24
15. ACCOST overview
suited only for 2 conditions with replicates (even though the
computations may work even without replicates)
based on DESeq (very similar to diffHic or multiHiCompare)
first: ICE normalization (within matrix normalization)
accounts for the distance effect in the matrix with the addition of an offset
in the model (does not require within matrix normalization, nor distance
based correction)
bitckucket python scripts available
15 / 24
16. Main hypotheses ofACCOST
with mean and standard deviation with:
where is the condition for sample and
is an experiment specific vector of locus biaises for locus in
sample
is a distance specific size factor that accounts for the genomic
distance effect
is the true (unknown) number of interactions between and in
condition (on which the test is based)
a similar decomposition for that depends on the parametric estimation
of a function , which models the dispersion as a smooth non-
negative function of the interaction
N
t
ij
∼ N B μ
t
ij
σ
t
ij
μ
t
ij
= β
t
i
β
t
j
s
t
|i−j|
q
k(t)
ij
k(t) t
β
t
i
i
t
s
t
|ij|
q
k(t)
ij
i j
k(t)
σ
t
ij
ν
k(t)
(q
k(t)
ij
16 / 24
17. Howare ACCOST parameters obtained?
is set as the ICE normalization factor of ICE for locus in sample
is obtained as the median of ICE normalized counts for pairs of loci
at distance ,
is then obtained by averaging corrected counts accross replicates of
the same condition:
is finally estimated by a polynomial regression (details skipped for
the sake of clarity but basically very similar to what is performed in DESeq
with the distance based corrected counts )
μ
t
ij
= β
t
i
β
t
j
s
t
|i−j|
q
k(t)
ij
β
t
i
i t
s
t
|i−j|
|i − j| median|i
′
−j
′
|=d
N
t
i′j′
β
t
i
′
β
t
j
′
q
k(t)
ij
q
k(t)
ij
= ∑
t∈Ck
1
|Ck|
N
t
ij
β
t
i
β
t
j
s
k
|i−j|
ν
k(t)
q
k(t)
ij
17 / 24
18. Validation ofACCOST
datasets: two human cell lines from [Rao et al, 2014], two mouse datasets
from [Dixon et al, 2012] and [Sehn et al, 2012], a Plasmodium dataset with
two distinct stages of the parasite from [Ay et al, 2014]
methods: diffHic and FIND
18 / 24
22. Signi cant results locations
increase of significant contacts at 50 kb corresponds to a threshold related to
LOESS normalization
22 / 24
23. References
Ardakany, A.R., Ay, F., and Lonardi, S. (2019). Selfish: discovery of differential chromatin
interactions via a self-similarity measure. Bioinformatics, 35(14):i145--i153.
Ay, F., Bunnik, E.M., Varoquaux, N., Bol, S.M., Prudhomme, J., Vert, J.P., Noble, W.S., Le Roch,
K.G. (2014) Three-dimensional modeling of the P.falciparum genome during the erythrocytic
cycle reveals a strong connection between genome architecture and gene expression.
Genome Research, 24:974--988.
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., Ren, B. (2012)
Topological domains in mammalian genomes identified by analysis of chromatin
interactions. Nature, 485: 376--380.
Djekidel, M.N., Chen, Y., and Zhang, M. Q. (2018). FIND: difFerential chromatin INteractions
Detection using a spatial Poisson process. Genome Research, 28:412--422.
Lun, A. and Smyth, G. (2015). diffHic: a Bioconductor package to detect differential genomic
interactions in Hi-C data. BMC Bioinformatics, 16:258.
Rao, S.S.P. et al. (2014). A 3D map of the human genome at kilobase resolution reveals
principles of chromatin looping. Cell, 159: 1665--1680.
Shen, Y., Yue, F., McCleary, D.F., Ye, Z., Edsall, L., Kuan, S., Wagner, U., Dixon, J., Lee, L.,
Lobanenkov, V.V. et al. (2012) A map of the cis-regulatory sequences in the mouse genome.
Nature, 488: 116-120.
23 / 24
24. References
Stansfield, J.C., Cresswell, K.G., Vladimirov, V.I., and Dozmorov, M.G. (2018). HiCcompare: an
R-package for joint normalization and comparison of HI-C datasets. BMC Bioinformatics,
19:279.
Stansfield, J.C., Cresswell, K.G., and Dozmorov, M.G. (2019). multiHiCcompare: joint
normalization and comparative analysis of complex Hi-C experiments. Bioinformatics,
35(17): 2916-2923.
Zaborowski, R. and Wilczyński, B. (2020). DiADeM: differential analysis via dependency
modelling of chromatin interactions with robust generalized linear models. bioRxiv preprint.
24 / 24