The document presents a quantum-inspired heuristic for solving the multiple sequence alignment problem in bioinformatics. It models the sequence similarity as a traveling salesman problem instance using a normalized similarity matrix. The method applies a quantum-inspired generalized variable neighborhood search metaheuristic to approximate the shortest Hamiltonian path and generate an initial alignment. Evaluation on real biological sequences shows it outperforms progressive methods, producing alignments with good sum-of-pairs scores, especially for large sequence sets.
Knowledge of cause-effect relationships is central to the field of climate science, supporting mechanistic understanding, observational sampling strategies, experimental design, model development and model prediction. While the major causal connections in our planet's climate system are already known, there is still potential for new discoveries in some areas. The purpose of this talk is to make this community familiar with a variety of available tools to discover potential cause-effect relationships from observed or simulation data. Some of these tools are already in use in climate science, others are just emerging in recent years. None of them are miracle solutions, but many can provide important pieces of information to climate scientists. An important way to use such methods is to generate cause-effect hypotheses that climate experts can then study further. In this talk we will (1) introduce key concepts important for causal analysis; (2) discuss some methods based on the concepts of Granger causality and Pearl causality; (3) point out some strengths and limitations of these approaches; and (4) illustrate such methods using a few real-world examples from climate science.
Multi Model Ensemble (MME) predictions are a popular ad-hoc technique for improving predictions of high-dimensional, multi-scale dynamical systems. The heuristic idea behind MME framework is simple: given a collection of models, one considers predictions obtained through the convex superposition of the individual probabilistic forecasts in the hope of mitigating model error. However, it is not obvious if this is a viable strategy and which models should be included in the MME forecast in order to achieve the best predictive performance. I will present an information-theoretic approach to this problem which allows for deriving a sufficient condition for improving dynamical predictions within the MME framework; moreover, this formulation gives rise to systematic and practical guidelines for optimising data assimilation techniques which are based on multi-model ensembles. Time permitting, the role and validity of “fluctuation-dissipation” arguments for improving imperfect predictions of externally perturbed non-autonomous systems - with possible applications to climate change considerations - will also be addressed.
Covariance matrices are central to many adaptive filtering and optimisation problems. In practice, they have to be estimated from a finite number of samples; on this, I will review some known results from spectrum estimation and multiple-input multiple-output communications systems, and how properties that are assumed to be inherent in covariance and power spectral densities can easily be lost in the estimation process. I will discuss new results on space-time covariance estimation, and how the estimation from finite sample sets will impact on factorisations such as the eigenvalue decomposition, which is often key to solving the introductory optimisation problems. The purpose of the presentation is to give you some insight into estimating statistics as well as to provide a glimpse on classical signal processing challenges such as the separation of sources from a mixture of signals.
Polynomial matrices can help to elegantly formulate many broadband multi-sensor / multi-channel processing problems, and represent a direct extension of well-established narrowband techniques which typically involve eigen- (EVD) and singular value decompositions (SVD) for optimisation. Polynomial matrix decompositions extend the utility of the EVD to polynomial parahermitian matrices, and this talk presents a brief overview of such polynomial matrices, characteristics of the polynomial EVD (PEVD) and iterative algorithms for its solution. The presentation concludes with some surprising results when applying the PEVD to subband coding and broadband beamforming.
Conventional tools in array signal processing have traditionally relied on the availability of a large number of samples acquired at each sensor or array element (antenna, hydrophone, microphone, etc.). Large sample size assumptions typically guarantee the consistency of estimators, detectors, classifiers and multiple other widely used signal processing procedures. However, practical scenario and array mobility conditions, together with the need for low latency and reduced scanning times, impose strong limits on the total number of observations that can be effectively processed. When the number of collected samples per sensor is small, conventional large sample asymptotic approaches are not relevant anymore. Recently, large random matrix theory tools have been proposed in order to address the small sample support problem in array signal processing. In fact, it has been shown that the most important and longstanding problems in this field can be reformulated and studied according to this asymptotic paradigm. By exploiting the latest advances in large random matrix theory and high dimensional statistics, a novel and unconventional methodology can be established, which provides an unprecedented treatment of the finite sample-per-sensor regime. In this talk, we will see that random matrix theory establishes a unifying framework for the study of array signal processing techniques under the constraint of a small number of observations per sensor, which has radically changed the way in which array processing methodologies have been traditionally established. We will show how this unconventional way of revisiting classical array processing has lead to major advances in the design and analysis of signal processing techniques for multidimensional observations.
Knowledge of cause-effect relationships is central to the field of climate science, supporting mechanistic understanding, observational sampling strategies, experimental design, model development and model prediction. While the major causal connections in our planet's climate system are already known, there is still potential for new discoveries in some areas. The purpose of this talk is to make this community familiar with a variety of available tools to discover potential cause-effect relationships from observed or simulation data. Some of these tools are already in use in climate science, others are just emerging in recent years. None of them are miracle solutions, but many can provide important pieces of information to climate scientists. An important way to use such methods is to generate cause-effect hypotheses that climate experts can then study further. In this talk we will (1) introduce key concepts important for causal analysis; (2) discuss some methods based on the concepts of Granger causality and Pearl causality; (3) point out some strengths and limitations of these approaches; and (4) illustrate such methods using a few real-world examples from climate science.
Multi Model Ensemble (MME) predictions are a popular ad-hoc technique for improving predictions of high-dimensional, multi-scale dynamical systems. The heuristic idea behind MME framework is simple: given a collection of models, one considers predictions obtained through the convex superposition of the individual probabilistic forecasts in the hope of mitigating model error. However, it is not obvious if this is a viable strategy and which models should be included in the MME forecast in order to achieve the best predictive performance. I will present an information-theoretic approach to this problem which allows for deriving a sufficient condition for improving dynamical predictions within the MME framework; moreover, this formulation gives rise to systematic and practical guidelines for optimising data assimilation techniques which are based on multi-model ensembles. Time permitting, the role and validity of “fluctuation-dissipation” arguments for improving imperfect predictions of externally perturbed non-autonomous systems - with possible applications to climate change considerations - will also be addressed.
Covariance matrices are central to many adaptive filtering and optimisation problems. In practice, they have to be estimated from a finite number of samples; on this, I will review some known results from spectrum estimation and multiple-input multiple-output communications systems, and how properties that are assumed to be inherent in covariance and power spectral densities can easily be lost in the estimation process. I will discuss new results on space-time covariance estimation, and how the estimation from finite sample sets will impact on factorisations such as the eigenvalue decomposition, which is often key to solving the introductory optimisation problems. The purpose of the presentation is to give you some insight into estimating statistics as well as to provide a glimpse on classical signal processing challenges such as the separation of sources from a mixture of signals.
Polynomial matrices can help to elegantly formulate many broadband multi-sensor / multi-channel processing problems, and represent a direct extension of well-established narrowband techniques which typically involve eigen- (EVD) and singular value decompositions (SVD) for optimisation. Polynomial matrix decompositions extend the utility of the EVD to polynomial parahermitian matrices, and this talk presents a brief overview of such polynomial matrices, characteristics of the polynomial EVD (PEVD) and iterative algorithms for its solution. The presentation concludes with some surprising results when applying the PEVD to subband coding and broadband beamforming.
Conventional tools in array signal processing have traditionally relied on the availability of a large number of samples acquired at each sensor or array element (antenna, hydrophone, microphone, etc.). Large sample size assumptions typically guarantee the consistency of estimators, detectors, classifiers and multiple other widely used signal processing procedures. However, practical scenario and array mobility conditions, together with the need for low latency and reduced scanning times, impose strong limits on the total number of observations that can be effectively processed. When the number of collected samples per sensor is small, conventional large sample asymptotic approaches are not relevant anymore. Recently, large random matrix theory tools have been proposed in order to address the small sample support problem in array signal processing. In fact, it has been shown that the most important and longstanding problems in this field can be reformulated and studied according to this asymptotic paradigm. By exploiting the latest advances in large random matrix theory and high dimensional statistics, a novel and unconventional methodology can be established, which provides an unprecedented treatment of the finite sample-per-sensor regime. In this talk, we will see that random matrix theory establishes a unifying framework for the study of array signal processing techniques under the constraint of a small number of observations per sensor, which has radically changed the way in which array processing methodologies have been traditionally established. We will show how this unconventional way of revisiting classical array processing has lead to major advances in the design and analysis of signal processing techniques for multidimensional observations.
The variational Gaussian process (VGP), a Bayesian nonparametric model which adapts its shape to match com- plex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity.
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
There is now a huge literature on Bayesian methods for variable selection that use spike-and-slab priors. Such methods, in particular, have been quite successful for applications in a variety of different fields. High-throughput genomics and neuroimaging are two of such examples. There, novel methodological questions are being generated, requiring the integration of different concepts, methods, tools and data types. These have in particular motivated the development of variable selection priors that go beyond the independence assumptions of a simple Bernoulli prior on the variable inclusion indicators. In this talk I will describe various prior constructions that incorporate information about structural dependencies among the variables. I will also address extensions of the models to the analysis of count data. I will motivate the development of the models using specific applications from neuroimaging and from studies that use microbiome data.
Mini useR! in Melbourne https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network/events/251933078/
MelbURN (Melbourne useR group) https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network
July 16th, 2018
Melbourne, Australia
We will describe and analyze accurate and efficient numerical algorithms to interpolate and approximate the integral of multivariate functions. The algorithms can be applied when we are given the function values at an arbitrary positioned, and usually small, existing sparse set of function values (samples), and additional samples are impossible, or difficult (e.g. expensive) to obtain. The methods are based on local, and global, tensor-product sparse quasi-interpolation methods that are exact for a class of sparse multivariate orthogonal polynomials.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
“Statistical Physics Studies of Machine Learning Problems" by Lenka Zdeborova, Researcher @CNRS
Abstract : We will talk about some insight of the following questions: What makes problems studied in machine and statistical physics related? How can this relation be used to understand better the performance and limitations of machine learning systems? What happens when a phase transition is found in a computational problem? How do phase transitions influence algorithmic hardness?
The variational Gaussian process (VGP), a Bayesian nonparametric model which adapts its shape to match com- plex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity.
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
There is now a huge literature on Bayesian methods for variable selection that use spike-and-slab priors. Such methods, in particular, have been quite successful for applications in a variety of different fields. High-throughput genomics and neuroimaging are two of such examples. There, novel methodological questions are being generated, requiring the integration of different concepts, methods, tools and data types. These have in particular motivated the development of variable selection priors that go beyond the independence assumptions of a simple Bernoulli prior on the variable inclusion indicators. In this talk I will describe various prior constructions that incorporate information about structural dependencies among the variables. I will also address extensions of the models to the analysis of count data. I will motivate the development of the models using specific applications from neuroimaging and from studies that use microbiome data.
Mini useR! in Melbourne https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network/events/251933078/
MelbURN (Melbourne useR group) https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network
July 16th, 2018
Melbourne, Australia
We will describe and analyze accurate and efficient numerical algorithms to interpolate and approximate the integral of multivariate functions. The algorithms can be applied when we are given the function values at an arbitrary positioned, and usually small, existing sparse set of function values (samples), and additional samples are impossible, or difficult (e.g. expensive) to obtain. The methods are based on local, and global, tensor-product sparse quasi-interpolation methods that are exact for a class of sparse multivariate orthogonal polynomials.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
“Statistical Physics Studies of Machine Learning Problems" by Lenka Zdeborova, Researcher @CNRS
Abstract : We will talk about some insight of the following questions: What makes problems studied in machine and statistical physics related? How can this relation be used to understand better the performance and limitations of machine learning systems? What happens when a phase transition is found in a computational problem? How do phase transitions influence algorithmic hardness?
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...IJCSEA Journal
A hybrid learning automata–genetic algorithm (HLGA) is proposed to solve QoS routing optimization problem of next generation networks. The algorithm complements the advantages of the learning Automato Algorithm(LA) and Genetic Algorithm(GA). It firstly uses the good global search capability of LA to generate initial population needed by GA, then it uses GA to improve the Quality of Service(QoS) and acquiring the optimization tree through new algorithms for crossover and mutation operators which are an NP–Complete problem. In the proposed algorithm, the connectivity matrix of edges is used for genotype representation. Some novel heuristics are also proposed for mutation, crossover, and creation of random individuals. We evaluate the performance and efficiency of the proposed HLGA-based algorithm in comparison with other existing heuristic and GA-based algorithms by the result of simulation. Simulation results demonstrate that this paper proposed algorithm not only has the fast calculating speed and high accuracy but also can improve the efficiency in Next Generation Networks QoS routing. The proposed algorithm has overcome all of the previous algorithms in the literature..
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
Point Placement Algorithms: An Experimental StudyCSCJournals
The point location problem is to determine the position of n distinct points on a line, up to translation and reflection by the fewest possible pairwise (adversarial) distance queries. In this paper we report on an experimental study of a number of deterministic point placement algorithms and an incremental randomized algorithm, with the goal of obtaining a greater insight into the practical utility of these algorithms, particularly of the randomized one.
International Journal of Managing Information Technology (IJMIT)IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph, the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network. SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed. In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
Clustering is an important step in the process of data analysis with applications to numerous fields. Clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a quality cluster. Existing clustering aggregation algorithms are applied directly to large number of data points. The algorithms are inefficient if the number of data points is large. This project defines an efficient approach for clustering aggregation based on data fragments. In fragment-based approach, a data fragment is any subset of the data. To increase the efficiency of the proposed approach, the clustering aggregation can be performed directly on data fragments under comparison measure and normalized mutual information measures for clustering aggregation, enhanced clustering aggregation algorithms are described. To show the minimal computational complexity. (Agglomerative, Furthest, and Local Search); nevertheless, which increases the accuracy.
In this talk we will describe a methodology to handle the causality to make inference on common-cause failure in a situation of missing data. The data are collected in the form of contingency table but the available information are only the numbers of CCF of different orders and the numbers of failure due to a given cause. Therefore only the margins of the contingency table are observed; thefrequencies in each cell are unknown. Assuming a Poisson model for the count, we suggest a Bayesian approach and we use the inverse Bayes formula (IBF) combined with a Metropolis-Hastings algorithm to make inference on the rate of occurrence for the different combination cause, order. The performance of the resulting algorithm is evaluated through simulations. A comparison is made with results obtained from the _-composition approach to deal with causality suggested by Zheng et al. (2013).
Presentation of "Quantum automata for infinite periodic words" for the 6th International Conference on Information, Intelligence, Systems and Applications (IISA 2015)
User Requirements for Gamifying Sports Software- in 3rd International Workshop on Games and Software Engineering (GAS 2013) in ICSE 2013 May 18, 2013 San Francisco, California, U.S.A.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
A quantum-inspired optimization heuristic for the multiple sequence alignment problem in bio-computing
1. A quantum-inspired optimization heuristic for
the multiple sequence alignment problem in
bio-computing
10th International Conference on Information, Intelligence, Systems and Ap-
plications (IISA 2019)
Patras, Greece
Konstantinos Giannakis, Christos Papalitsas, Georgia Theocharopoulou, Sofia
Fanarioti, and Theodore Andronikos
July 15-17, 2019
Dept. of Informatics, Ionian University, Greece
kgiann@ionio.gr
2. Acknowledgment
This research is funded in the context of the project “Investigating alternative computational
methods and their use in computational problems related to optimization and game theory,” (MIS
5007500) under the call for proposals “Supporting researchers with an emphasis on young
researchers” (EDULL34). The project is co-financed by Greece and the European Union (European
Social Fund - ESF) by the Operational Programme Human Resources Development, Education and
Lifelong Learning 2014-2020.
1
3. Project info
∙
∙ Natural and biomolecular computing
∙ Modeling with Membrane automata
∙ Unconventional Computational Methods
∙ Quantum Optimization Algorithms and Strategies
→http://di.ionio.gr/alcomopt/←
2
4. An overview
◎ Biological Sequences
◎ Alignment
∙ Single and multiple alignment (MSA)
◎ NP-hard for most cases
◎ Bio-inspired techniques
◎ Translate the MSA problem as a TSP instance
∙ Proposal of a quantum-inspired method
∙ To progressively solve this problem
∙ Evaluated through simulations against other aligning methods
∙ Provide a good initial alignment, especially for large sets of
sequences
3
6. Introduction
∙ Biological sequences play an important role in various fields
✓ The multiple sequence alignment or MSA considers the problem
of aligning a set of sequences
∙ They usually represent proteins or nucleotides
∙ Pairwise alignment → 2 sequences
∙ Related to non-biological fields, e.g., in information-related
fields, natural language processing, finances, etc.
NP-complete with SP-score and NP-hard for most metrics
∙ Global and local alignments
∙ Usually dynamic programming approaches
- Heuristic and/or probabilistic methods for larger instances
5
7. Motivaton
∙ Arranging sequences of DNA, RNA, or proteins in order to study
properties and relationships among themselves
∙ Useful in many fields → it can reveal interesting properties
about the sequences (e.g., phylogenetic trees)
∙ Sequences are usually large and multitudinous
∙ Need for good alignments within acceptable time frames, at
least for an initial solution
6
8. Pairwise and multiple alignment
∙ Examples: sequences S1 = ACTTAA and S2 = CAGTAC:
A C T - T A A -
- C A G T A - C
∙ If you add an extra sequence S3 = CGGACA:
A C T - T A A - -
- C A G T A - C -
- C G G - A - C A
7
9. Solutions and methods
∙ Progressive, iterative, motif-based, and hybrid methods
✓ Progressive: the most similar sequences are aligned first and
then the rest of them are aligned in a decreasing order
∙ Various scoring functions (e.g., sum-of-pairs metric, the
weighted sum-of-pairs, etc.)
∙ Quantum- and bio-inspired approaches1,2,3
∙ Trying to take advantage of the superiority of quantum
computing
1
A. Perdomo-Ortiz et al. “Finding low-energy conformations of lattice protein models by
quantum annealing”. In: Scientific reports 2 (2012), p. 571.
2
D-Wave Initiates Open Quantum Software Environment. D-Wave Systems.
3
L. Abdesslem et al. “Multiple sequence alignment by quantum genetic algorithm”. In: |
Proceedings 20th International Parallel & Distributed Processing Symposium. IEEE. 2006, p. 360.
8
10. Other approaches
∙ Needleman–Wunsch algorithm for global pairwise alignment
(dynamic programming; O(n2
) complexity)
∙ Smith–Waterman algorithm for local pairwise alignment
∙ Platforms: MUSCLE, Multalin, CLUSTALs (Omega)
∙ Multalin: a progressive approach based on the UPGMA,
∙ CLUSTAL W: a progressive approach based upon the
neighbor-joining (NJ) algorithm
∙ Both UPGMA and (NJ) require the construction of a complete
distance matrix of the sequences
9
11. Related literature and quantum computation
∙ Quantum computing4
∙ D-WAVE
∙ Neven’s law
∙ Quantum algorithms for pattern-matching5
and sequence
comparison problems6
∙ A quantum-inspired genetic algorithm for MSA7
∙ Combining TSP with MSA8,9
4
M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge
University Press, 2010.
5
A. Sarkar. “Quantum Algorithms: for pattern-matching in genomic sequences”. In: (2018).
6
L. C. Hollenberg. “Fast quantum search algorithms in protein sequence comparisons:
Quantum bioinformatics”. In: Physical Review E 62.5 (2000), p. 7532.
7
Hong-Wei Huo et al. “A quantum-inspired genetic algorithm based on probabilistic coding for
multiple sequence alignment”. In: Journal of bioinformatics and computational biology 8.01
(2010), pp. 59–75.
8
J.D. Banda-Tapia et al. “Optimizing multiple sequence alignments using traveling salesman
problem and order-based evolutionary algorithms”. In: Proceedings of CIBB 2 (2012).
9
C. Korostensky and G. Gonnet. “Using traveling salesman problem algorithms for evolutionary
tree construction”. In: Bioinformatics 16.7 (2000), pp. 619–627.
10
13. Notation
∙ Σ = {s1, s2, .., sm} a finite alphabet; a sequence is a string over Σ
∙ Two sequences S′
1 and S′
2 are alignments of two sequences S1
and S2 if S′
i can be obtained from Si by removing gaps
∙ This holds for more sequences, too
∙ m: the max length of sequences S′
1 and S′
2
∙ The total cost for this alignment can be expressed as
m∑
i=1
d(S′
1(i), S′
2(i))
∙ d is the chosen scoring function
∙ S′
j (i) is the ith character of sequence S′
j
12
14. Scoring functions
∙ A standard score scheme:
d(S′
1(i), S′
2(i)) =
{
1, if match
0, if mismatch
∙ An optimal alignment is the one with the maximum score
∙ The matrix of size n × m represents an alignment of the n
sequences
A C T - T A A - -
- C A G T A - C -
- C G G - A - C A
∙ Several scoring schemes
∙ The most widely used → sum of pairs
13
15. Sum of pairs
∙
(mn
2
)
total pairs
n−1∑
i=1
n∑
j=i+1
d(Si, Sj)
∙ Can “punish” mismatches, e.g., by giving -1 to any mismatch
∙ Examples: sequences S1 = ACTTAA and S2 = CAGTAC:
A C T - T A A -
- C A G T A - C
∙ The score for this alignment is 3
A C T - T A A - -
- C A G T A - C -
- C G G - A - C A
∙ The score for the above alignment is 9
14
16. The idea
∙ Similarity among sequences is modeled as a TSP instance
∙ The similarity matrix is used as the neighboring matrix
∙ Proper normalization in order to have a minimization problem
∙ The underlying structure is a complete, undirected graph
G = (V, A)
∙ Each edge is associated with a weight wij
15
18. Our approach
∙ We apply the qGVNS quantum-inspired metaheuristic10
∙ It consists of a VND local search, a diversification procedure,
and a neighborhood change step
∙ VNS and GVNS have bio-inspired origins
∙ Novelty: In qGVNS, a modified diversification phase successfully
resolves local minima traps by exploiting quantum computation
techniques
∙ A simulated quantum register generates a complex
n-dimensional unit vector during each shaking call
∙ The complex n-dimensional vector is fed as input and outputs a
real n-dimensional vector
∙ The i-th component of the real vector is equal to the modulus
squared of the i-th component of the complex vector
10
Christos Papalitsas et al. “Combinatorial GVNS (General Variable Neighborhood Search)
Optimization for Dynamic Garbage Collection”. In: Algorithms 11.4 (2018), p. 38.
17
19. The qGVNS routine
Data: an initial solution
Result: an optimized solution
1 Initialization of the feasibility distance matrix
2 begin
3 X ← Nearest Neighbor heuristic;
4 repeat
5 X′
← Quantum-Perturbation(X)
6 X′′
← pipeVND(X′
)
7 if X′′
is better than X′
then
8 X ← X′′
9 end
10 until optimal solution is found or time limit is met;
11 end
18
20. The proposed method
Data: a set of sequences
Result: a set of aligned sequences
1 Disti,j: The pairwise distance matrix of the sequences
2 begin
3 i ← the i-th sequence
4 j ← the j-th sequence
5 repeat
6 Disti,j ← Calculate pairwise distance of i-th and j-th
sequences
7 until all pairs are examined;
8 end
9 Sol ← Approximation of the shortest Hamiltonian of Dist ▷ Derived
using qGVNS of Algorithm 1
10 Score_matrix ▷ Choose a score matrix of your choice
11 Gap_penalty ▷ Choose the value of gap penalty
12 Align sequences following the Sol matrix as guide
19
21. In a nutshell
∙ Initial calculation of the distance matrix among the sequences
∙ This is a trivial part for most MSA algorithms
∙ The choices for a score matrix and a gap penalty do not affect
the main algorithm
✓ They can be separately chosen
∙ It enhances its applicability
20
22. Analysis
∙ Need for the calculation of pairwise distances of the sequences
∙ Exhaustive comparisons
∙ Calculation of the complete distance matrix
∙ For n sequences, this calculation requires O(m(n2
− n))
comparisons (m is the maximum length)
∙ The calculation of the shortest Hamiltonian path is a known
NP-hard problem
It seems like the proposed algorithm simply hides the “hard”
computational part
✓ This is not true, though, since the GVNS-based heuristic only
approximates the solution in a specific time frame
∙ After the qGVNS, the actual alignment is straightforward through
a simple (binary) guide tree from the above resulting ordering
21
23. Guide tree
Seq. 1
Seq. 2
Seq. 3
Seq. 4
Seq. 5
Seq. 6
Seq. 7
Seq. 8
.............
A depiction of the tree produced after calculating the shortest path of the
distance matrix.
22
25. Evaluation
∙ Benchmarked on 33 real sequences retrieved from NCBI
∙ Samples from different kinds of organisms were chosen
(archaea, invertebrates, plants, and plasmids)
∙ MatLab
∙ For the first set, each match is scored with 1, whereas any
mismatch is scored with -1
∙ For the second set, no negative score for mismatches
∙ Compared against UPGMA and single neighbor algorithm
24
26. Sum of pairs score with negative mismatches (1/2)
-1.6x108
-1.4x10
8
-1.2x108
-1x10
8
-8x107
-6x107
-4x107
-2x107
0
1
11
17
5
2
14
10
8
13
26
19
SumofPairs
# of Benchmark
Alignment score of actual sequences (negative mismatches)(1/3)
UPGA
Single neighbor
Proposed algorithm
Set (1/3)
-9x106
-8x106
-7x106
-6x106
-5x106
-4x10
6
-3x106
-2x10
6
-1x106
0
29
27
7
28
31
30
23
16
12
6
22
SumofPairs
# of Benchmark
Alignment score of actual sequences (negative mismatches)(2/3)
UPGA
Single neighbor
Proposed algorithm
Set (2/3)
25
27. Sum of pairs score with negative mismatches (2/2)
-500000
-450000
-400000
-350000
-300000
-250000
-200000
-150000
-100000
-50000
0
21
4
25
15
20
18
3
24
9
33
32
SumofPairs
# of Benchmark
Alignment score of actual sequences (negative mismatches)(3/3)
UPGA
Single neighbor
Proposed algorithm
Set (3/3)
26
28. Sum of pairs score with only matches (1/2)
0
50000
100000
150000
200000
250000
32
24
3
20
15
22
33
18
9
30
31
SumofPairs
# of Benchmark
Alignment score of actual sequences (only matches)(1/3)
UPGA
Single neighbor
Proposed algorithm
Set (1/3)
100000
200000
300000
400000
500000
600000
700000
800000
900000
25
4
21
6
29
7
13
16
12
10
23
SumofPairs
# of Benchmark
Alignment score of actual sequences (only matches)(2/3)
UPGA
Single neighbor
Proposed algorithm
Set (2/3)
27
29. Sum of pairs score with only matches (2/2)
0
2x106
4x106
6x10
6
8x106
1x10
7
1.2x107
1.4x10
7
1.6x107
1.8x107
5
17
28
27
1
26
19
11
8
14
2
SumofPairs
# of Benchmark
Alignment score of actual sequences (only matches)(3/3)
UPGA
Single neighbor
Proposed algorithm
Set (3/3)
28
30. Result overview
∙ Better suited for larger sequences
∙ All algorithms operate under the progressive alignment scheme
∙ They, too, use the full distance matrix
∙ All use the same method for pairwise alignment
✓ The proposed algorithm outperforms the other two
∙ Especially when larger sets of sequences are used
∙ No prior knowledge on the relation among the sequences was
assumed
29
32. Conclusion
Biology and bioinformatics are related to demanding problems
MSA is an indicative example
Many approaches and scoring methods
We followed the progressive approach
A quantum-inspired optimization heuristic
Evaluated on actual sets of sequences from NCBI
We provide alignments with good scores, especially when the
sets of sequences are large
Useful for constructing an initial alignment
If prior knowledge is available?
Decrease cost upon the distance matrix
31
33. Key References I
L. Abdesslem et al. “Multiple sequence alignment by quantum genetic algorithm”. In: |
Proceedings 20th International Parallel & Distributed Processing Symposium. IEEE. 2006,
p. 360.
J.D. Banda-Tapia et al. “Optimizing multiple sequence alignments using traveling salesman
problem and order-based evolutionary algorithms”. In: Proceedings of CIBB 2 (2012).
D-Wave Initiates Open Quantum Software Environment. D-Wave Systems.
L. C. Hollenberg. “Fast quantum search algorithms in protein sequence comparisons:
Quantum bioinformatics”. In: Physical Review E 62.5 (2000), p. 7532.
Hong-Wei Huo et al. “A quantum-inspired genetic algorithm based on probabilistic coding
for multiple sequence alignment”. In: Journal of bioinformatics and computational biology
8.01 (2010), pp. 59–75.
C. Korostensky and G. Gonnet. “Using traveling salesman problem algorithms for
evolutionary tree construction”. In: Bioinformatics 16.7 (2000), pp. 619–627.
M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information.
Cambridge University Press, 2010.
32
34. Key References II
Christos Papalitsas et al. “Combinatorial GVNS (General Variable Neighborhood Search)
Optimization for Dynamic Garbage Collection”. In: Algorithms 11.4 (2018), p. 38.
A. Perdomo-Ortiz et al. “Finding low-energy conformations of lattice protein models by
quantum annealing”. In: Scientific reports 2 (2012), p. 571.
A. Sarkar. “Quantum Algorithms: for pattern-matching in genomic sequences”. In: (2018).
33