Truncated Singular Value Decomposition (SVD) has always been a key algorithm in modern machine learning.
Scientists and researchers use this applied mathematics method in many fields. Despite a long history and prevalence, the issue of how to choose the best truncation level still remains an open challenge. In this paper, we describe a new algorithm, akin a the discrete optimization method, that relies on the Receiver Operating Characteristics (ROC) Areas Under the Curve (AUCs) computation. We explore a concrete application of the algorithm to a bioinformatics problem, i.e. the prediction of biomolecular annotations. We applied the algorithm to nine different datasets and the obtained results demostrate the effectiveness of our technique.
Bounded model checking encodes executions of a system up to a bounded length k as a propositional formula along with a property violation. If the formula is satisfiable, a counterexample is found; if unsatisfiable, no counterexample up to length k exists. The technique leverages efficient SAT solvers but is incomplete. Modern SAT solvers use conflict-driven learning and backtracking based on the Davis-Putnam-Logemann-Loveland algorithm to efficiently solve the propositional formulas generated by bounded model checking.
This paper proposes an approach to bounded model checking using classical symbolic execution. It encodes a program and property to check as a formula that can be checked by an SMT solver. This formula representation allows for parallelization by splitting it into independent subformulas. An evaluation shows the proposed technique, implemented in a tool called JCBMC, outperforms other bounded model checkers on examples. Future work includes automatically tuning the technique's parameters.
Matrix Factorization In Recommender SystemsYONG ZHENG
The document discusses matrix factorization techniques for recommender systems. It begins with an overview of recommender systems and their use of matrix factorization for dimensionality reduction. Principal component analysis and singular value decomposition are described as early linear algebra techniques used for this purpose. The document then focuses on how these techniques evolved into basic and extended matrix factorization methods in recommender systems, using the Netflix Prize competition as an example.
This document discusses using SVD (singular value decomposition) as a filtering technique prior to clustering temporal usage data. It describes applying SVD to filter out noise and high dimensionality before performing k-means clustering. SVD is used to decompose the data matrix and filter out components associated with the smallest singular values. Then k-means clustering is applied to the correlation between observations and the remaining right eigenvectors. This approach provides a robust way to cluster high-dimensional temporal data and identify distinct customer usage patterns over time.
Investigation of repeated blasts at Aitik mine using waveform cross correlationIvan Kitov
We present results of signal detection from repeated events at the Aitik and Kiruna mines in Sweden as based on waveform cross correlation. Several advanced methods based on tensor Singular Value Decomposition is applied to waveforms measured at seismic array ARCES, which consists of three-component sensors.
We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms
that use space that is linear or sublinear in the dimension. We prove general results showing that any sketch of a matrix that satisfies a certain operator norm guarantee can be used to approximate these scores. We instantiate these results with powerful matrix sketching techniques such as Frequent Directions and random projections to derive efficient and practical algorithms for these problems, which we validate over real-world data sets. Our main technical contribution is to prove matrix perturbation
inequalities for operators arising in the computation of these measures.
-Proceedings: https://arxiv.org/abs/1804.03065
-Origin: https://arxiv.org/abs/1804.03065
Understanding High-dimensional Networks for Continuous Variables Using ECLHPCC Systems
Syed Rahman & Kshitij Khare, University of Florida, present at the 2016 HPCC Systems Engineering Summit Community Day.
The availability of high dimensional data (or “big data”) has touched almost every field of science and industry. Such data, where the number of variables (features) is often much higher than the number of samples, is now more pervasive than it has ever been. Discovering meaningful relationships between the variables in such data is one of the major challenges that modern day data scientists have to contend with.
The covariance matrix of the variables is the most fundamental quantity that can help us understand the complex multivariate relationships in the data. In addition to estimating the inverse covariance matrix, CSCS can be used to detect the edges in a directed acyclic graph, as opposed to the edges an undirected graph, which CONCORD (presented at the 2015 summit) was used for.
Similar to the CONCORD algorithm, the CSCS algorithm works by minimizing a convex objective function through a cyclic coordinate minimization approach. In addition, it is theoretically guaranteed to converge to a global minimum of the objective function. One of the main advantage of CSCS is that each row can be calculated independently of the other rows, and thus we are able to harness the power of distributed computing.
Syed Rahman
Syed Rahman is a PhD student in the Statistics department at the University of Florida working under the supervision of Dr. Kshitij Khare. He is interested in high-dimensional covariance estimation. In 2015, Syed programmed the CONCORD algorithm in ECL and presented this at the HPCC Systems Engineering Summit.
Kshitij Khare
Kshitij Khare is an Associate Professor of Statistics at the University of Florida. He earned his Ph.D. in Statistics from Stanford University in 2009. He has a variety of interests, which include covariance/network estimation in high-dimensional datasets, and Bayesian inference using Markov chain Monte Carlo methods. One of Dr. Khare's major research focus is development of novel statistical methods and algorithms for "big data" or high-dimensional data.
Bounded model checking encodes executions of a system up to a bounded length k as a propositional formula along with a property violation. If the formula is satisfiable, a counterexample is found; if unsatisfiable, no counterexample up to length k exists. The technique leverages efficient SAT solvers but is incomplete. Modern SAT solvers use conflict-driven learning and backtracking based on the Davis-Putnam-Logemann-Loveland algorithm to efficiently solve the propositional formulas generated by bounded model checking.
This paper proposes an approach to bounded model checking using classical symbolic execution. It encodes a program and property to check as a formula that can be checked by an SMT solver. This formula representation allows for parallelization by splitting it into independent subformulas. An evaluation shows the proposed technique, implemented in a tool called JCBMC, outperforms other bounded model checkers on examples. Future work includes automatically tuning the technique's parameters.
Matrix Factorization In Recommender SystemsYONG ZHENG
The document discusses matrix factorization techniques for recommender systems. It begins with an overview of recommender systems and their use of matrix factorization for dimensionality reduction. Principal component analysis and singular value decomposition are described as early linear algebra techniques used for this purpose. The document then focuses on how these techniques evolved into basic and extended matrix factorization methods in recommender systems, using the Netflix Prize competition as an example.
This document discusses using SVD (singular value decomposition) as a filtering technique prior to clustering temporal usage data. It describes applying SVD to filter out noise and high dimensionality before performing k-means clustering. SVD is used to decompose the data matrix and filter out components associated with the smallest singular values. Then k-means clustering is applied to the correlation between observations and the remaining right eigenvectors. This approach provides a robust way to cluster high-dimensional temporal data and identify distinct customer usage patterns over time.
Investigation of repeated blasts at Aitik mine using waveform cross correlationIvan Kitov
We present results of signal detection from repeated events at the Aitik and Kiruna mines in Sweden as based on waveform cross correlation. Several advanced methods based on tensor Singular Value Decomposition is applied to waveforms measured at seismic array ARCES, which consists of three-component sensors.
We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms
that use space that is linear or sublinear in the dimension. We prove general results showing that any sketch of a matrix that satisfies a certain operator norm guarantee can be used to approximate these scores. We instantiate these results with powerful matrix sketching techniques such as Frequent Directions and random projections to derive efficient and practical algorithms for these problems, which we validate over real-world data sets. Our main technical contribution is to prove matrix perturbation
inequalities for operators arising in the computation of these measures.
-Proceedings: https://arxiv.org/abs/1804.03065
-Origin: https://arxiv.org/abs/1804.03065
Understanding High-dimensional Networks for Continuous Variables Using ECLHPCC Systems
Syed Rahman & Kshitij Khare, University of Florida, present at the 2016 HPCC Systems Engineering Summit Community Day.
The availability of high dimensional data (or “big data”) has touched almost every field of science and industry. Such data, where the number of variables (features) is often much higher than the number of samples, is now more pervasive than it has ever been. Discovering meaningful relationships between the variables in such data is one of the major challenges that modern day data scientists have to contend with.
The covariance matrix of the variables is the most fundamental quantity that can help us understand the complex multivariate relationships in the data. In addition to estimating the inverse covariance matrix, CSCS can be used to detect the edges in a directed acyclic graph, as opposed to the edges an undirected graph, which CONCORD (presented at the 2015 summit) was used for.
Similar to the CONCORD algorithm, the CSCS algorithm works by minimizing a convex objective function through a cyclic coordinate minimization approach. In addition, it is theoretically guaranteed to converge to a global minimum of the objective function. One of the main advantage of CSCS is that each row can be calculated independently of the other rows, and thus we are able to harness the power of distributed computing.
Syed Rahman
Syed Rahman is a PhD student in the Statistics department at the University of Florida working under the supervision of Dr. Kshitij Khare. He is interested in high-dimensional covariance estimation. In 2015, Syed programmed the CONCORD algorithm in ECL and presented this at the HPCC Systems Engineering Summit.
Kshitij Khare
Kshitij Khare is an Associate Professor of Statistics at the University of Florida. He earned his Ph.D. in Statistics from Stanford University in 2009. He has a variety of interests, which include covariance/network estimation in high-dimensional datasets, and Bayesian inference using Markov chain Monte Carlo methods. One of Dr. Khare's major research focus is development of novel statistical methods and algorithms for "big data" or high-dimensional data.
Recent developments in the field of reduced order modeling - and in particular, active subspace construction - have made it possible to efficiently approximate complex models by constructing low-order response surfaces based upon a small subspace of the original high dimensional parameter space. These methods rely upon the fact that the response tends to vary more prominently in a few dominant directions defined by linear combinations of the original inputs, allowing for a rotation of the coordinate axis and a consequent transformation of the parameters. In this talk, we discuss a gradient free active subspace algorithm that is feasible for high dimensional parameter spaces where finite-difference techniques are impractical. We illustrate an initialized gradient-free active subspace algorithm for a neutronics example implemented with SCALE6.1.
Slides of my doctoral thesis dissertation talk, given on 20 March 2014 at Politecnico di Milano. Title: "Computational prediction of gene functions through machine learning methods and multiple validation procedures"
The document describes the sequence-to-sequence (seq2seq) model with an encoder-decoder architecture. It explains that the seq2seq model uses two recurrent neural networks - an encoder RNN that processes the input sequence into a fixed-length context vector, and a decoder RNN that generates the output sequence from the context vector. It provides details on how the encoder, decoder, and training process work in the seq2seq model.
The International Journal of Engineering and Science (The IJES)theijes
This document summarizes a research paper that reviews techniques for optimal design and placement of pilot symbols for channel estimation in OFDM systems operating under rapidly time-varying channels. It discusses how particle swarm optimization, the Cramér–Rao Bound, and Bayesian Cramér–Rao Bound techniques are commonly used to optimize pilot sequence design to improve channel estimation performance and reduce intercarrier interference. Grouping pilot tones into clusters rather than evenly spacing each pilot tone can provide better channel estimation against doubly selective channels. The optimal clustered pilot sequence is derived using maximum likelihood estimation and is independent of signal-to-noise ratio or Doppler rate.
This document provides an overview of simulation software for modeling multibody systems. It discusses different modeling approaches, such as using Cartesian or relative coordinates, and different solution methods in dynamics simulation, including Lagrange multipliers and velocity transformations. Examples of computer implementation for kinematics and dynamics simulation are presented. The document also briefly discusses using web technologies for simulation and collaboration.
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATIONcscpconf
The Vortex Search (VS) algorithm is one of the recently proposed metaheuristic algorithms which was inspired from the vortical flow of the stirred fluids. Although the VS algorithm is
shown to be a good candidate for the solution of certain optimization problems, it also has some drawbacks. In the VS algorithm, candidate solutions are generated around the current best solution by using a Gaussian distribution at each iteration pass. This provides simplicity to the
algorithm but it also leads to some problems along. Especially, for the functions those have a number of local minimum points, to select a single point to generate candidate solutions leads the algorithm to being trapped into a local minimum point. Due to the adaptive step-size
adjustment scheme used in the VS algorithm, the locality of the created candidate solutions is increased at each iteration pass. Therefore, if the algorithm cannot escape a local point as
quickly as possible, it becomes much more difficult for the algorithm to escape from that point
in the latter iterations. In this study, a modified Vortex Search algorithm (MVS) is proposed to
overcome above mentioned drawback of the existing VS algorithm. In the MVS algorithm, the candidate solutions are generated around a number of points at each iteration pass. Computational results showed that with the help of this modification the global search ability of
the existing VS algorithm is improved and the MVS algorithm outperformed the existing VS algorithm, PSO2011 and ABC algorithms for the benchmark numerical function set.
Modified Vortex Search Algorithm for Real Parameter Optimization csandit
The document presents a modified version of the Vortex Search (VS) algorithm called the Modified Vortex Search (MVS) algorithm. The MVS algorithm aims to overcome the drawback of the VS algorithm getting trapped in local minima for functions with multiple local minima. In the MVS algorithm, candidate solutions are generated around multiple centers at each iteration rather than a single center. This allows the algorithm to explore different regions simultaneously and avoid getting stuck in local minima. Computational results showed the MVS algorithm outperformed the original VS algorithm as well as PSO, ABC algorithms on benchmark test functions prone to getting trapped in local minima.
I studied in Indian Institute of Technology, Kharagpur, India. I did my B.Texh and M.Tech in the department of Electronics and Electrical Communication Engineering. I was student of 2018 batch. After that, I joined Schneider Electric Systems India Private limited Company as Software design Engineer. Currently I am designated as Senior Firmware Engineer in the same company. I have work experience of 4+ years. The uploaded ppt is my MTP Thesis. It is about "temperature aware application mapping on to mesh based network on chip using Genetic Algorithm".
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONijaia
This document presents a modified version of the Vortex Search (VS) algorithm called the Modified Vortex Search (MVS) algorithm for numerical function optimization. The VS algorithm has the drawback that it can get trapped in local minima for functions with multiple local minima. The MVS algorithm addresses this by generating candidate solutions around multiple points at each iteration rather than a single point, allowing it to escape local minima more easily. Computational results on benchmark functions showed the MVS algorithm outperformed the original VS algorithm, as well as PSO2011 and ABC algorithms.
This document discusses evaluating classifiers' performance when additional constraints are present. It proposes using linear programming to optimize classifiers by minimizing cost while meeting constraints. Specifically, it examines finding the optimal classifier for predicting customer attrition at a bank, where constraints include limited resources for customer outreach. Linear programming allows incorporating constraints to locate the classifier that minimizes total misclassification costs.
Regression models the relationship between continuous variables by fitting a line or curve to the data points. Logistic regression performs nonlinear regression by first transforming the dependent variable values to logits (log odds) and then fitting a linear regression line to the transformed data. This results in a sigmoid curve that models the probability of an output variable given continuous input variables. The sigmoid curve bounds the predicted probabilities between 0 and 1, allowing logistic regression to be used for binary classification problems.
The document discusses linear regression and logistic regression. Linear regression finds the best-fitting linear relationship between independent and dependent variables. Logistic regression applies a sigmoid function to the linear combination of inputs to output a probability between 0 and 1, fitting a logistic curve rather than a straight line. It works by first transforming the probabilities into log-odds (logits) and then performing linear regression on the transformed data. This allows predicting probabilities while ensuring outputs remain between 0 and 1.
This document discusses the design of minimum cost, fault tolerant adder circuits in reversible logic for quantum computing. It aims to minimize quantum cost, reduce critical path delay and number of gates, and optimize garbage outputs. The document provides an overview of reversible and quantum computing principles. It then proposes designs for reversible fault tolerant full adders and carry skip/lookahead adders. Performance is analyzed in terms of gates, garbage outputs, delay and quantum cost, showing improvements over existing designs. The document concludes the reversible circuit designs are preferable for quantum computing due to their lower quantum costs.
This document discusses second order statistics for vehicle-to-infrastructure (V2I) communications using macro diversity systems over composite fading channels. It introduces V2V and V2I communications and macro diversity, then discusses modeling fading channels for vehicular networks using statistical distributions like Rayleigh, Rice, and Nakagami-m. It outlines evaluating the performance of V2I communications using macro diversity through analyzing the cumulative distribution function (CDF), level crossing rate (LCR), and average fade duration (AFD) of the system. Numerical results are presented on the CDF, LCR, and AFD for V2I communications using macro diversity selection combining reception.
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal
This document summarizes research on developing parallel algorithms to optimize solving the longest common subsequence (LCS) problem. LCS is commonly used for sequence comparison in bioinformatics. Traditional sequential dynamic programming algorithms have complexity of O(mn) for sequences of lengths m and n. The document reviews parallel algorithms developed using tools like OpenMP and GPUs like CUDA to reduce computation time. It proposes the authors' own optimized parallel algorithm for multi-core CPUs using OpenMP.
1) Randomized numerical linear algebra (RandNLA) algorithms can be used to solve large-scale least-squares problems by computing a randomized sketch of the design matrix in two steps and then obtaining approximate solutions.
2) The document implements and evaluates these RandNLA algorithms in Apache Spark on datasets up to terabytes in size, finding that Spark is well-suited due to the algorithms' parallelism and Spark's ability to cache data in memory.
3) The evaluation compares the performance of low-precision solvers that directly use the sketch and high-precision solvers that employ the sketch as a preconditioner, finding that both approaches can efficiently solve least-squares problems on large datasets.
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
Recent developments in the field of reduced order modeling - and in particular, active subspace construction - have made it possible to efficiently approximate complex models by constructing low-order response surfaces based upon a small subspace of the original high dimensional parameter space. These methods rely upon the fact that the response tends to vary more prominently in a few dominant directions defined by linear combinations of the original inputs, allowing for a rotation of the coordinate axis and a consequent transformation of the parameters. In this talk, we discuss a gradient free active subspace algorithm that is feasible for high dimensional parameter spaces where finite-difference techniques are impractical. We illustrate an initialized gradient-free active subspace algorithm for a neutronics example implemented with SCALE6.1.
Slides of my doctoral thesis dissertation talk, given on 20 March 2014 at Politecnico di Milano. Title: "Computational prediction of gene functions through machine learning methods and multiple validation procedures"
The document describes the sequence-to-sequence (seq2seq) model with an encoder-decoder architecture. It explains that the seq2seq model uses two recurrent neural networks - an encoder RNN that processes the input sequence into a fixed-length context vector, and a decoder RNN that generates the output sequence from the context vector. It provides details on how the encoder, decoder, and training process work in the seq2seq model.
The International Journal of Engineering and Science (The IJES)theijes
This document summarizes a research paper that reviews techniques for optimal design and placement of pilot symbols for channel estimation in OFDM systems operating under rapidly time-varying channels. It discusses how particle swarm optimization, the Cramér–Rao Bound, and Bayesian Cramér–Rao Bound techniques are commonly used to optimize pilot sequence design to improve channel estimation performance and reduce intercarrier interference. Grouping pilot tones into clusters rather than evenly spacing each pilot tone can provide better channel estimation against doubly selective channels. The optimal clustered pilot sequence is derived using maximum likelihood estimation and is independent of signal-to-noise ratio or Doppler rate.
This document provides an overview of simulation software for modeling multibody systems. It discusses different modeling approaches, such as using Cartesian or relative coordinates, and different solution methods in dynamics simulation, including Lagrange multipliers and velocity transformations. Examples of computer implementation for kinematics and dynamics simulation are presented. The document also briefly discusses using web technologies for simulation and collaboration.
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATIONcscpconf
The Vortex Search (VS) algorithm is one of the recently proposed metaheuristic algorithms which was inspired from the vortical flow of the stirred fluids. Although the VS algorithm is
shown to be a good candidate for the solution of certain optimization problems, it also has some drawbacks. In the VS algorithm, candidate solutions are generated around the current best solution by using a Gaussian distribution at each iteration pass. This provides simplicity to the
algorithm but it also leads to some problems along. Especially, for the functions those have a number of local minimum points, to select a single point to generate candidate solutions leads the algorithm to being trapped into a local minimum point. Due to the adaptive step-size
adjustment scheme used in the VS algorithm, the locality of the created candidate solutions is increased at each iteration pass. Therefore, if the algorithm cannot escape a local point as
quickly as possible, it becomes much more difficult for the algorithm to escape from that point
in the latter iterations. In this study, a modified Vortex Search algorithm (MVS) is proposed to
overcome above mentioned drawback of the existing VS algorithm. In the MVS algorithm, the candidate solutions are generated around a number of points at each iteration pass. Computational results showed that with the help of this modification the global search ability of
the existing VS algorithm is improved and the MVS algorithm outperformed the existing VS algorithm, PSO2011 and ABC algorithms for the benchmark numerical function set.
Modified Vortex Search Algorithm for Real Parameter Optimization csandit
The document presents a modified version of the Vortex Search (VS) algorithm called the Modified Vortex Search (MVS) algorithm. The MVS algorithm aims to overcome the drawback of the VS algorithm getting trapped in local minima for functions with multiple local minima. In the MVS algorithm, candidate solutions are generated around multiple centers at each iteration rather than a single center. This allows the algorithm to explore different regions simultaneously and avoid getting stuck in local minima. Computational results showed the MVS algorithm outperformed the original VS algorithm as well as PSO, ABC algorithms on benchmark test functions prone to getting trapped in local minima.
I studied in Indian Institute of Technology, Kharagpur, India. I did my B.Texh and M.Tech in the department of Electronics and Electrical Communication Engineering. I was student of 2018 batch. After that, I joined Schneider Electric Systems India Private limited Company as Software design Engineer. Currently I am designated as Senior Firmware Engineer in the same company. I have work experience of 4+ years. The uploaded ppt is my MTP Thesis. It is about "temperature aware application mapping on to mesh based network on chip using Genetic Algorithm".
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONijaia
This document presents a modified version of the Vortex Search (VS) algorithm called the Modified Vortex Search (MVS) algorithm for numerical function optimization. The VS algorithm has the drawback that it can get trapped in local minima for functions with multiple local minima. The MVS algorithm addresses this by generating candidate solutions around multiple points at each iteration rather than a single point, allowing it to escape local minima more easily. Computational results on benchmark functions showed the MVS algorithm outperformed the original VS algorithm, as well as PSO2011 and ABC algorithms.
This document discusses evaluating classifiers' performance when additional constraints are present. It proposes using linear programming to optimize classifiers by minimizing cost while meeting constraints. Specifically, it examines finding the optimal classifier for predicting customer attrition at a bank, where constraints include limited resources for customer outreach. Linear programming allows incorporating constraints to locate the classifier that minimizes total misclassification costs.
Regression models the relationship between continuous variables by fitting a line or curve to the data points. Logistic regression performs nonlinear regression by first transforming the dependent variable values to logits (log odds) and then fitting a linear regression line to the transformed data. This results in a sigmoid curve that models the probability of an output variable given continuous input variables. The sigmoid curve bounds the predicted probabilities between 0 and 1, allowing logistic regression to be used for binary classification problems.
The document discusses linear regression and logistic regression. Linear regression finds the best-fitting linear relationship between independent and dependent variables. Logistic regression applies a sigmoid function to the linear combination of inputs to output a probability between 0 and 1, fitting a logistic curve rather than a straight line. It works by first transforming the probabilities into log-odds (logits) and then performing linear regression on the transformed data. This allows predicting probabilities while ensuring outputs remain between 0 and 1.
This document discusses the design of minimum cost, fault tolerant adder circuits in reversible logic for quantum computing. It aims to minimize quantum cost, reduce critical path delay and number of gates, and optimize garbage outputs. The document provides an overview of reversible and quantum computing principles. It then proposes designs for reversible fault tolerant full adders and carry skip/lookahead adders. Performance is analyzed in terms of gates, garbage outputs, delay and quantum cost, showing improvements over existing designs. The document concludes the reversible circuit designs are preferable for quantum computing due to their lower quantum costs.
This document discusses second order statistics for vehicle-to-infrastructure (V2I) communications using macro diversity systems over composite fading channels. It introduces V2V and V2I communications and macro diversity, then discusses modeling fading channels for vehicular networks using statistical distributions like Rayleigh, Rice, and Nakagami-m. It outlines evaluating the performance of V2I communications using macro diversity through analyzing the cumulative distribution function (CDF), level crossing rate (LCR), and average fade duration (AFD) of the system. Numerical results are presented on the CDF, LCR, and AFD for V2I communications using macro diversity selection combining reception.
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal
This document summarizes research on developing parallel algorithms to optimize solving the longest common subsequence (LCS) problem. LCS is commonly used for sequence comparison in bioinformatics. Traditional sequential dynamic programming algorithms have complexity of O(mn) for sequences of lengths m and n. The document reviews parallel algorithms developed using tools like OpenMP and GPUs like CUDA to reduce computation time. It proposes the authors' own optimized parallel algorithm for multi-core CPUs using OpenMP.
1) Randomized numerical linear algebra (RandNLA) algorithms can be used to solve large-scale least-squares problems by computing a randomized sketch of the design matrix in two steps and then obtaining approximate solutions.
2) The document implements and evaluates these RandNLA algorithms in Apache Spark on datasets up to terabytes in size, finding that Spark is well-suited due to the algorithms' parallelism and Spark's ability to cache data in memory.
3) The evaluation compares the performance of low-precision solvers that directly use the sketch and high-precision solvers that employ the sketch as a preconditioner, finding that both approaches can efficiently solve least-squares problems on large datasets.
Similar to A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves (20)
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
How Barcodes Can Be Leveraged Within Odoo 17Celine George
In this presentation, we will explore how barcodes can be leveraged within Odoo 17 to streamline our manufacturing processes. We will cover the configuration steps, how to utilize barcodes in different manufacturing scenarios, and the overall benefits of implementing this technology.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves
1. IEEE BIBE 2013
13rd IEEE International Conference on
Bioinformatics and Bioengineering,
11st November, Chania, Greece, EU
A Discrete Optimization Approach
for SVD Best Truncation Choice based
on ROC Curves
Davide Chicco, Marco Masseroli
davide.chicco@elet.polimi.it
2. Summary
1. The context & the problem
• Biomolecular annotations
• Prediction of biomolecular annotations
• SVD (Singular Value Decomposition)
• SVD Truncation
2. The proposed solution
• ROC Area Under the Curve comparison
• Truncation level choices
3. Evaluation
• Evaluation data set & results
4. Conclusions
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
2
3. Biomolecular annotations
• The concept of annotation: association of nucleotide or amino
acid sequences with useful information describing their features
• This information is expressed through controlled vocabularies,
sometimes structured as ontologies, where every controlled
term of the vocabulary is associated with a unique
alphanumeric code
• The association of such a code with a gene or protein ID
constitutes an annotation
Biological function feature
Gene /
Protein
Annotation
gene2bff
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
3
4. Biomolecular annotations (2)
• The association of an information/feature with a gene or
protein ID constitutes an annotation
• Annotation example:
• gene: GD4
• feature: “is present in the mitochondrial membrane”
Biological function feature
Gene /
Protein
Annotation
gene2bff
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
4
5. Prediction of biomolecular annotations
• Many available annotations in different databanks
• However, available annotations are incomplete
• Only a few of them represent highly reliable, human–curated
information
• To support and quicken the time–consuming curation process,
prioritized lists of computationally predicted annotations
are extremely useful
• These lists could be generated softwares based that implement
Machine Learning algorithms
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
5
6. Annotation prediction through
Singular Value Decomposition – SVD
• Annotation matrix A {0, 1} m x n
− m rows: genes / proteins
− n columns: annotation terms
A(i,j) = 1 if gene / protein i is annotated to term j or to any
descendant of j in the considered ontology structure (true
path rule)
A(i,j) = 0 otherwise (it is unknown)
term01
term02
term03
term04
…
termN
gene01
0
0
0
0
…
0
gene02
0
1
1
0
…
1
…
…
…
…
…
…
…
geneM
0
0
0
0
…
0
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
7
7. Annotation prediction through
Singular Value Decomposition – SVD
• Annotation matrix A {0, 1} m x n
− m rows: genes / proteins
− n columns: annotation terms
A(i,j) = 1 if gene / protein i is annotated to term j or to any
descendant of j in the considered ontology structure (true
path rule)
A(i,j) = 0 otherwise (it is unknown)
term01
term02
term03
term04
…
termN
gene01
0
0
0
0
…
0
gene02
0
1
1
0
…
1
…
…
…
…
…
…
…
geneM
0
0
0
0
…
0
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
8
8. Singular Value Decomposition – SVD
Compute SVD:
A U V T
A U V T U V T V TA U V T
A U
A
Compute reduced rank approximation:
Ak U k kkVk U kUkVkkkVkTU k kVkT
A AT T
A
k
Ak U k kVkT
k
k
• An annotation prediction is performed by computing a reduced
rank approximation Ak of the annotation matrix A
(where 0 < k < r, with r the number of non zero singular values
of A, i.e. the rank of A)
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
9
9. Singular Value Decomposition – SVD (2)
• Ak contains real valued entries related to the likelihood that
gene i shall be annotated to term j
For a certain real threshold τ:
if Ak(i,j) > τ, gene i is predicted to be annotated to term j
− The threshold τ can be chosen in order to obtain the
best predicted annotations [Khatri et al., 2005]
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
10
10. Singular Value Decomposition – SVD (3)
• It is possible to rewrite the SVD decomposition in an equivalent
form, such that the predicted annotation profile is given by:
ak,iT = aiT Vk VkT
where ak,iT is a row vector containing the predictions for gene i
• Note that Vk depends on the whole set of genes
• Indeed, the columns of Vk are a set of eigenvectors of the
global term-to-term correlation matrix T = ATA, estimated from
the whole set of available annotations
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
11
11. Evaluation of the prediction
To evaluate the prediction, we compare each A(i,j) element to its
corresponding Ak(i,j) for each real threshold τ, with 0 ≤ τ ≤ 1.0
•
if A(i,j) = 1 & Ak(i,j) > τ:
AC: Annotation Confirmed
(AC <- AC+1)
•
if A(i,j) = 1 & Ak(i,j) ≤ τ:
AR: Annotation to be Reviewed
(AR <- AR+1)
•
if A(i,j) = 0 & Ak(i,j) ≤ τ: NAC: No Annotation Confirmed
(NAC <- NAC+1)
•
if A(i,j) = 0 & Ak(i,j) > τ:
AP: annotation predicted
(AP <- AP+1)
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
12
12. SVD truncation
• The main problem of truncated SVD: how to choose the
truncation?
• Where to truncate?
How to choose the k here?
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
13
13. New concept: Receiver Operating Characteristic
(ROC) curve
Starting from the annotation prediction evaluation factor we just
introduced
AC: Annotation Confirmed
AR: Annotation to be Reviewed
NAC: No Annotation Confirmed
AP: Annotation Predicted
Input
Output
Yes
Yes
Yes
No
No
No
No
Yes
We can design the Receiver Operating Characteristic curves for
every prediction:
On the x, the annotation to be reviewed rate:
On the y, the annotation predicted rate:
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
𝑨𝑹
𝑨𝑪+𝑨𝑹
𝑨𝑷
𝑨𝑷+𝑵𝑨𝑪
14
14. New concept: Receiver Operating Characteristic
(ROC) curve (2)
On the y, the annotation confirmed rate:
On the x, the annotation predicted rate:
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
𝑨𝑪
𝑨𝑪+𝑨𝑹
𝑨𝑷
𝑨𝑷+𝑵𝑨𝑪
15
15. SVD truncation choice
Algorithm:
1) Choose some possible truncation levels
2) Compute the Receiver Operating Characteristic for each
SVD prediction of those truncation levels
3) Compute the Area Under the Curve (AUC) of each ROC
4) Choose the truncation level of the ROC that has
maximum AUC
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
16
16. SVD truncation choice (2)
Algorithm:
1) Choose some possible truncation levels
2) Compute the Receiver Operating Characteristic for each
SVD prediction of those truncation levels
3) Compute the Area Under the Curve (AUC) of each ROC
4) Choose the truncation level of the ROC that has
maximum AUC
Quite easy!
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
17
17. SVD truncation choice (3)
Algorithm:
Quite challenging!
1) Choose some possible truncation levels
2) Compute the Receiver Operating Characteristic for each
SVD prediction of those truncation levels
3) Compute the Area Under the Curve (AUC) of each ROC
4) Choose the truncation level of the ROC that has
maximum AUC
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
18
18. Minimum AUC between all the ROCs of various
truncation levels
1) Choose some possible truncation levels
We cannot compute the SVD, its ROC and its AUC for every
truncation values because would be too expensive (for time
and resources).
Algorithm:
1) Since the matrix A(i,j) has m rows (genes) and n columns
(annotation terms), we take p = min(m, n)
2) Since r ≤ p is the number of non-zero singular values
along the diagonal of , the best truncation value is in the
interval [1; r]
3) newInterval = {1, r}
4) k = firstElement(newInterval)
5) step = length(newInterval) / numStep
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
19
19. Minimum AUC between all the ROCs of various
truncation levels (2)
4. We make a sampling of all the N non-null singular values,
with constant sample intervals of size step (step=10% * N)
5. For every sampled singular value, we compute the SVD
and its corresponding ROC AUC for ACrate in [0%, 100%]
and APrate in [0%, 1%]
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
20
20. Minimum AUC between all the ROCs of various
truncation levels (3)
Given the first AUC, if the AUCs of all the three subsequent
samples decrease, we take it for the zoom next step
Local
Best
Index
zoom
This means we found a local maximum.
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
21
21. Minimum AUC between all the ROCs of various
truncation levels (3)
If the AUC differences of the last three singular values are
lower than gamma = 10%, , we take it for the zoom next step
Chosen
Index
zoom
This means that the AUCs do not grow up enough
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
22
22. Minimum AUC between all the ROCs of various
truncation levels (3)
Once we chose the index where to zoom, we re-run the
algorithm in the sub-interval
zoom
Until one of the previously described condition is satisfied
Or the maximum number of zooms (numZoom = 4) is reached
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
23
23. Example
Dataset: annotations with Gallus gallus genes and Biological
Process Gene Ontology terms
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
24
24. Results
• To evaluate the performance of our method, we used
annotations of
terms: Biological process (BP), Cellular component (CC) and
Molecular function (MF) GO features
organisms Bos Taurus, Danio rerio, Gallus gallus genes
• Available on July 2009 in an old version of the Gene Ontology
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
25
25. Results (2)
We then checked the, against the percentage of annotations
predicted percentage of annotations predicted with our SVD
method and our optimized truncation levelby the SVD method
and fixed truncation level (k=500) used by Draghici et al. in the
paper “A semantic analysis of the annotations of the human
genome” (2005)
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
26
26. Conclusions
Problem: SVD truncation in
the prediction of genomic
annotations context
Proposed solution: finding the
truncation level corresponding
to the maximum AUC of the
ROC curve, and it’s near to
zero
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
27
27. Conclusions (2)
•To avoid computing SVD for all the possible truncation levels
(too expensive!), we proposed an algorithm for the search of
local and global maxima, by zooming sub-intervals
•The best SVD truncation levels suggested by this algorithm for
our dataset (annotations of Bos Taurus, Danio Rerio, and Gallus
gallus genes, and GO terms) gave better results than other
truncation levels, in a reasonable time.
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
28
28. Future developments
• To obtain the best sampling, we could study the gradient
variations in the distribution of the AUC values for different
truncation levels and the histogram of the eigenvalues
• Our approach is not limited to the Gene Ontology and can be
applied to any controlled annotations
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
29
29. A Discrete Optimization Approach for SVD Best
Truncation Choice based on ROC Curves
Thanks for your attention!!!
www.DavideChicco.it
davide.chicco@elet.polimi.it
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”
30