This document presents a supervised planetary unmixing method using optimal transport. It introduces using the Wasserstein distance as a metric for comparing spectral signatures, which is defined over probability distributions and can account for shifts in frequency. The method formulates unmixing as an optimization problem that matches spectra to a dictionary using the Wasserstein distance, while also incorporating an abundance prior. Preliminary experiments on Vesta asteroid data show the abundance maps produced with this optimal transport approach.
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
“Statistical Physics Studies of Machine Learning Problems" by Lenka Zdeborova, Researcher @CNRS
Abstract : We will talk about some insight of the following questions: What makes problems studied in machine and statistical physics related? How can this relation be used to understand better the performance and limitations of machine learning systems? What happens when a phase transition is found in a computational problem? How do phase transitions influence algorithmic hardness?
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
“Statistical Physics Studies of Machine Learning Problems" by Lenka Zdeborova, Researcher @CNRS
Abstract : We will talk about some insight of the following questions: What makes problems studied in machine and statistical physics related? How can this relation be used to understand better the performance and limitations of machine learning systems? What happens when a phase transition is found in a computational problem? How do phase transitions influence algorithmic hardness?
Big Data and Small Devices by Katharina MorikBigMine
How can we learn from the data of small ubiquitous systems? Do we need to send the data to a server or cloud and do all learning there? Or can we learn on some small devices directly? Are smartphones small? Are navigation systems small? How complex is learning allowed to be in times of big data? What about graphical models? Can they be applied on small devices or even learned on restricted processors?
Big data are produced by various sources. Most often, they are distributedly stored at computing farms or clouds. Analytics on the Hadoop Distributed File System (HDFS) then follows the MapReduce programming model. According to the Lambda architecture of Nathan Marz and James Warren, this is the batch layer. It is complemented by the speed layer, which aggregates and integrates incoming data streams in real time. When considering big data and small devices, obviously, we imagine the small devices being hosts of the speed layer, only. Analytics on the small devices is restricted by memory and computation resources.
The interplay of streaming and batch analytics offers a multitude of configurations. In this talk, we discuss opportunities for using sophisticated models for learning spatio-temporal models. In particular, we investigate graphical models, which generate the probabilities for connected (sensor) nodes. First, we present spatio-temporal random fields that take as input data from small devices, are computed at a server, and send results to -possibly different — small devices. Second, we go even further: the Integer Markov Random Field approximates the likelihood estimates such that it can be computed on small devices. We illustrate our learning models by applications from traffic management.
Noise Resilience of Variational Quantum CompilingKunalSharma515
APS March meeting 2020
Variational hybrid quantum-classical algorithms (VHQCAs) are near-term algorithms that leverage classical optimization to minimize a cost function, which is efficiently evaluated on a quantum computer. Recently VHQCAs have been proposed for quantum compiling, where a target unitary U is compiled into a short-depth gate sequence V. In this work, we report on a surprising form of noise resilience for these algorithms. Namely, we find one often learns the correct gate sequence V (i.e., the correct variational parameters) despite various sources of incoherent noise acting during the cost-evaluation circuit. Our main results are rigorous theorems stating that the optimal variational parameters are unaffected by a broad class of noise models, such as measurement noise, gate noise, and Pauli channel noise. Furthermore, our numerical implementations on IBM's noisy simulator demonstrate resilience when compiling the quantum Fourier transform, Toffoli gate, and W-state preparation. Hence, variational quantum compiling, due to its robustness, could be practically useful for noisy intermediate-scale quantum devices. Finally, we speculate that this noise resilience may be a general phenomenon that applies to other VHQCAs such as the variational quantum eigensolver.
Adaptive Noise Cancellation using Multirate TechniquesIJERD Editor
International Journal of Engineering Research and Development is an international premier peer reviewed open access engineering and technology journal promoting the discovery, innovation, advancement and dissemination of basic and transitional knowledge in engineering, technology and related disciplines.
In this paper, a new algorithm for a high resolution
Direction Of Arrival (DOA) estimation method for multiple
wideband signals is proposed. The proposed method proceeds
in two steps. In the first step, the received signals data is
decomposed in a Toeplitz form using the first-order statistics.
In the second step, The QR decomposition is applied on the
constructed Toeplitz matrix. Compared with existing schemes,
the proposed scheme provides several advantages. First, it
requires computing the triangular matrix R or the orthogonal
matrix Q to find the DOA; these matrices can be computed
with O(n2) operation. However, most of the existing schemes
required eignvalue decomposition (EVD) for the covariance
matrix or singular value decomposition (SVD) for the data
matrix; using EVD or SVD requires much more complex
computational O(n3) operation. Second, the proposed scheme
is more suitable for high-speed communication since it
requires first-order statistics and a single snapshot. Third,
the proposed scheme can estimate the correlated wideband
signals without using spatial smoothing techniques; whereas,
already-existing schemes do not. Accuracy of the proposed
wideband DOA estimation method is evaluated through
computer simulation in comparison with a conventional
method.
International Journal of Managing Information Technology (IJMIT)IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph, the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network. SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed. In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
C. Guyon, T. Bouwmans. E. Zahzah, “Foreground Detection via Robust Low Rank Matrix Factorization including Spatial Constraint with Iterative Reweighted Regression”, International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, November 2012.
Representation Learning for Structural Music Similarity MeasurementsVahndi Minah
The Project has been carried out to determine whether representation learning can be used to improve the performance of a state-of-the-art structural music similarity system. Representation learning has been carried out using denoising autoencoder neural networks on a number of common audio features. Relevant components of existing toolkits written by members of the MIR community in Matlab have also been adapted for the Python 2.7 programming language. Finally, new types of features have been developed incorporating the combination of representation learning and energy-normalised statistics calculations. These features have shown to be perform favourably in comparison with highly engineered features derived from chroma vectors. Future work is identified to further investigate the development of these features using higher level representation learning and to improve upon the overall system performance.
genetic algorithm based music recommender systemneha pevekar
The goal of a recommender
system is to generate meaningful recommendations to
a collection of users for items or products that might
interest them.
Many of the largest e-commerce websites are already
using recommender systems to help their customers
find products to purchase or download.
Big Data and Small Devices by Katharina MorikBigMine
How can we learn from the data of small ubiquitous systems? Do we need to send the data to a server or cloud and do all learning there? Or can we learn on some small devices directly? Are smartphones small? Are navigation systems small? How complex is learning allowed to be in times of big data? What about graphical models? Can they be applied on small devices or even learned on restricted processors?
Big data are produced by various sources. Most often, they are distributedly stored at computing farms or clouds. Analytics on the Hadoop Distributed File System (HDFS) then follows the MapReduce programming model. According to the Lambda architecture of Nathan Marz and James Warren, this is the batch layer. It is complemented by the speed layer, which aggregates and integrates incoming data streams in real time. When considering big data and small devices, obviously, we imagine the small devices being hosts of the speed layer, only. Analytics on the small devices is restricted by memory and computation resources.
The interplay of streaming and batch analytics offers a multitude of configurations. In this talk, we discuss opportunities for using sophisticated models for learning spatio-temporal models. In particular, we investigate graphical models, which generate the probabilities for connected (sensor) nodes. First, we present spatio-temporal random fields that take as input data from small devices, are computed at a server, and send results to -possibly different — small devices. Second, we go even further: the Integer Markov Random Field approximates the likelihood estimates such that it can be computed on small devices. We illustrate our learning models by applications from traffic management.
Noise Resilience of Variational Quantum CompilingKunalSharma515
APS March meeting 2020
Variational hybrid quantum-classical algorithms (VHQCAs) are near-term algorithms that leverage classical optimization to minimize a cost function, which is efficiently evaluated on a quantum computer. Recently VHQCAs have been proposed for quantum compiling, where a target unitary U is compiled into a short-depth gate sequence V. In this work, we report on a surprising form of noise resilience for these algorithms. Namely, we find one often learns the correct gate sequence V (i.e., the correct variational parameters) despite various sources of incoherent noise acting during the cost-evaluation circuit. Our main results are rigorous theorems stating that the optimal variational parameters are unaffected by a broad class of noise models, such as measurement noise, gate noise, and Pauli channel noise. Furthermore, our numerical implementations on IBM's noisy simulator demonstrate resilience when compiling the quantum Fourier transform, Toffoli gate, and W-state preparation. Hence, variational quantum compiling, due to its robustness, could be practically useful for noisy intermediate-scale quantum devices. Finally, we speculate that this noise resilience may be a general phenomenon that applies to other VHQCAs such as the variational quantum eigensolver.
Adaptive Noise Cancellation using Multirate TechniquesIJERD Editor
International Journal of Engineering Research and Development is an international premier peer reviewed open access engineering and technology journal promoting the discovery, innovation, advancement and dissemination of basic and transitional knowledge in engineering, technology and related disciplines.
In this paper, a new algorithm for a high resolution
Direction Of Arrival (DOA) estimation method for multiple
wideband signals is proposed. The proposed method proceeds
in two steps. In the first step, the received signals data is
decomposed in a Toeplitz form using the first-order statistics.
In the second step, The QR decomposition is applied on the
constructed Toeplitz matrix. Compared with existing schemes,
the proposed scheme provides several advantages. First, it
requires computing the triangular matrix R or the orthogonal
matrix Q to find the DOA; these matrices can be computed
with O(n2) operation. However, most of the existing schemes
required eignvalue decomposition (EVD) for the covariance
matrix or singular value decomposition (SVD) for the data
matrix; using EVD or SVD requires much more complex
computational O(n3) operation. Second, the proposed scheme
is more suitable for high-speed communication since it
requires first-order statistics and a single snapshot. Third,
the proposed scheme can estimate the correlated wideband
signals without using spatial smoothing techniques; whereas,
already-existing schemes do not. Accuracy of the proposed
wideband DOA estimation method is evaluated through
computer simulation in comparison with a conventional
method.
International Journal of Managing Information Technology (IJMIT)IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph, the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network. SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed. In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
C. Guyon, T. Bouwmans. E. Zahzah, “Foreground Detection via Robust Low Rank Matrix Factorization including Spatial Constraint with Iterative Reweighted Regression”, International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, November 2012.
Representation Learning for Structural Music Similarity MeasurementsVahndi Minah
The Project has been carried out to determine whether representation learning can be used to improve the performance of a state-of-the-art structural music similarity system. Representation learning has been carried out using denoising autoencoder neural networks on a number of common audio features. Relevant components of existing toolkits written by members of the MIR community in Matlab have also been adapted for the Python 2.7 programming language. Finally, new types of features have been developed incorporating the combination of representation learning and energy-normalised statistics calculations. These features have shown to be perform favourably in comparison with highly engineered features derived from chroma vectors. Future work is identified to further investigate the development of these features using higher level representation learning and to improve upon the overall system performance.
genetic algorithm based music recommender systemneha pevekar
The goal of a recommender
system is to generate meaningful recommendations to
a collection of users for items or products that might
interest them.
Many of the largest e-commerce websites are already
using recommender systems to help their customers
find products to purchase or download.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
최근 이수가 되고 있는 Bayesian Deep Learning 관련 이론과 최근 어플리케이션들을 소개합니다. Bayesian Inference 의 이론에 관해서 간단히 설명하고 Yarin Gal 의 Monte Carlo Dropout 의 이론과 어플리케이션들을 소개합니다.
A quantum-inspired optimization heuristic for the multiple sequence alignment...Konstantinos Giannakis
Slides from the presentation of "A quantum-inspired optimization heuristic for the multiple sequence alignment problem in bio-computing" in IISA 2019 conference.
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...AMIDST Toolbox
Maximum a posteriori (MAP) inference is a particularly complex type of probabilistic inference in Bayesian networks. It consists of finding the most probable configuration of a set of variables of interest given observations on a collection of other variables. In this paper we study scalable solutions to the MAP problem in hybrid Bayesian networks parameterized using conditional linear Gaussian distributions. We propose scalable solutions based on hill climbing and simulated anneal- ing, built on the Apache Flink framework for big data processing. We analyze the scalability of the solution through a series of experiments on large synthetic networks.
Full text paper: http://www.jmlr.org/proceedings/papers/v52/ramos-lopez16.pdf
Motivated by a real-world financial dataset, we propose a distributed variational message passing scheme for learning conjugate exponential models. We show that the method can be seen as a projected natural gradient ascent algorithm, and it therefore has good convergence properties. This is supported experimentally, where we show that the approach is robust wrt. common problems like imbalanced data, heavy-tailed empirical distributions, and a high degree of missing values. The scheme is based on map-reduce operations, and utilizes the memory management of modern big data frameworks like Apache Flink to obtain a time-efficient and scalable implementation. The proposed algorithm compares favourably to stochastic variational inference both in terms of speed and quality of the learned models. For the scalability analysis, we evaluate our approach over a network with more than one billion nodes (and approx. 75% latent variables) using a computer cluster with 128 processing units.
Full paper link: http://www.jmlr.org/proceedings/papers/v52/masegosa16.pdf
Multi Model Ensemble (MME) predictions are a popular ad-hoc technique for improving predictions of high-dimensional, multi-scale dynamical systems. The heuristic idea behind MME framework is simple: given a collection of models, one considers predictions obtained through the convex superposition of the individual probabilistic forecasts in the hope of mitigating model error. However, it is not obvious if this is a viable strategy and which models should be included in the MME forecast in order to achieve the best predictive performance. I will present an information-theoretic approach to this problem which allows for deriving a sufficient condition for improving dynamical predictions within the MME framework; moreover, this formulation gives rise to systematic and practical guidelines for optimising data assimilation techniques which are based on multi-model ensembles. Time permitting, the role and validity of “fluctuation-dissipation” arguments for improving imperfect predictions of externally perturbed non-autonomous systems - with possible applications to climate change considerations - will also be addressed.
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
The main machine learning algorithms are built upon various mathematical foundations such as statistics, optimization, and probability. Will this also hold true for Artificial Intelligence? In this presentation, I will showcase some recent examples of interactions between machine learning and mathematics.
Colloquium @ CEREMADE (October 3, 2023)
This talk will report briey on some findings from the problem of picking the weights for a weighted function space in QMC. Then it will be mostly about importance sampling. We want to estimate the probability _ of a union of J rare events. The method uses n samples, each of which picks one of the rare events at random, samples conditionally on that rare event happening and counts the total number of rare events that happen. It was used by Naiman and Priebe for scan
statistics, Shi, Siegmund and Yakir for genomic scans and Adler, Blanchet and Liu for extrema of Gaussian processes. We call it ALOE, for `at least one event'. The ALOE estimate is unbiased and we find that it has a coefficient of variation no larger than p (J + J�1 � 2)=(4n). The coefficient of variation is also no larger than p (__=_ � 1)=n where __ is the union bound. Our motivating problem comes from power system reliability, where the phase differences between connected nodes have a joint Gaussian distribution and the J rare events arise from unacceptably large phase differences. In the grid reliability problems even some events defined by 5772
constraints in 326 dimensions, with probability below 10�22, are estimated with a coefficient of variation of about 0:0024 with only n = 10;000 sample values. In a genomic context, the rare events become false discoveries. There we are interested in the possibility of a large number of simultaneous events, not just one or more. Some work with Kenneth Tay will be presented on that problem.
Joint with Yury Maximov and Michael Chertkov Los Alamos National Laboratory and Kenneth Tay, Stanford
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time Series Models. Andre Lucas. Amsterdam - June, 25 2015. European Financial Management Association 2015 Annual Meetings.
Similar to Supervised Planetary Unmixing with Optimal Transport (20)
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Supervised Planetary Unmixing with Optimal Transport
1. Supervised Planetary Unmixing with
Optimal Transport
August 23, 2016
Sina Nakhostin, Nicolas Courty, Remi Flamary and Thomas Corpetti
Contact: sina.nakhostin@irisa.fr
IRISA
Université de Bretagne-SUD
France
2. 18
Whispers 2016
Problem Definition
Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Agenda
Problem Definition
Optimal Transport (OT)
Unmixing with OT
Experiments and results
3. 18
Whispers 2016
2Problem Definition
Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Supervised Unmixing
It is about a projection
Given:
A multi/hyper-spectral
dataset.
A dictionary of reference
signatures.
Goal:
Producing a set of
abundance maps
representing distribution of
different materials within
the scene.
4. 18
Whispers 2016
3Problem Definition
Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Predicament
Endmember Variability
Signature profile of the
same material is usually
characterized by more
than one signature due to:
Sensing device accuracy
Reflectance angle
Shading effect
etc.
5. 18
Whispers 2016
3Problem Definition
Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Predicament
Endmember Variability
Signature profile of the
same material is usually
characterized by more
than one signature due to:
Sensing device accuracy
Reflectance angle
Shading effect
etc.
Exploiting Overcomplete Dictionaries is a way to account
for endmember variability.
6. 18
Whispers 2016
4Problem Definition
Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Predicament
Choice of Distance
What is the best distance measure for comparing
dictionary atoms ?
7. 18
Whispers 2016
4Problem Definition
Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Predicament
Choice of Distance
What is the best distance measure for comparing
dictionary atoms ?
Conventional Distance Measures
Euclidean Distance
Spectral Angle Mapper
Spectral Information Divergence
8. 18
Whispers 2016
4Problem Definition
Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Predicament
Choice of Distance
What is the best distance measure for comparing
dictionary atoms ?
Conventional Distance Measures
Euclidean Distance
Spectral Angle Mapper
Spectral Information Divergence
Proposed Measure
A distance measure based on Optimal Transport (OT).
Wasserstein Distance (a.k.a. Earth Mover Distance)
Defined between probability distributions.
Can be designed to be mostly sensitive to shifts in
frequency domain.
9. 18
Whispers 2016
5Problem Definition
Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Why optimal Transport after all?
To see spectra as probability distributions.
Each spectrum should to be normalized along spectral
values.
Normalization makes the analysis less sensitive to the
total power of spectra in each pixel.
This improves robustness against shadows or other large
radiance changes and thus can prevent degenerate
solutions.
10. 18
Whispers 2016
6Problem Definition
Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Contributions
Figure : Courtesy of Cuturi. Transporting 2D probability distributions
In this work we:
Introduce an original Unmixing Algorithm based on
Optimal Transport Theory.
Use an efficient optimization scheme based on iterative
Bregman projections for solving the underlying problem.
Our formulation allows one to input an eventual prior about
the abundances.
We give preliminary results on the challenging asteroid
4-Vesta dataset.
11. 18
Whispers 2016
Problem Definition
7Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
What is Optimal Transport?
Lets µs and µt be two discrete
probability distributions in R+
.
Let a transport plan be an
association (a coupling) between
each bins of µs and µt .
The Kantorovitch formulation of
OT looks for an optimal coupling
between the two probability
distributions wrt. to a given
metric (see Figure)
12. 18
Whispers 2016
Problem Definition
8Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Discreet Optimal Transport
Knowing that distributions are available through a finite
number of bins (i.e. spectral bands) in R+
, we can write
them as:
µs =
ns
i=1
ps
i δs
xi
; µt =
nt
i=1
pt
i δt
xi
Where δxi
is the Dirac at location xi ∈ R+
. ps
i and pt
i are
probability masses associated to the i-th bins.
The set of probability couplings (joint probability
distributions) between µs and µt is defined as:
= {γ ∈ (R+
)ns×nt
|γ1nt = µs; γ1ns = µt }
Where ns and nt are the number of bins in µs and µt .
13. 18
Whispers 2016
Problem Definition
9Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Wasserstein Distance
OT seeks for γ minimizing the quantity:
WC(µs, µt ) = min
γ∈ (µs,µt )
< γ, C >F , (1)
Where < ., . >F is the Frobenius norm and C(d×d) ≥ 0 is the
cost matrix (pairwise distance wrt. a given metric).
Here, WC(µs, µt ) is called the Wasserstein distance.
What about Scalability?
The solution of (1) is a linear program with equality constraints.
Its resolution can be very time consuming.
14. 18
Whispers 2016
Problem Definition
10Optimal Transport
(OT)
Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Entropic Regularization
In order to control the smoothness of the coupling, [Cuturi,
2013] proposes an Entropy-based regularization term over
γ which reads:
WC, (µs, µt ) = min
γ∈ (µs,µt )
< γ, C >F − h(γ)
Entropy Regularizer
, (2)
This allows to draw a parallel between OT and a Bregman
projection:
γ = arg min
γ∈ (µs,µt )
KL(γ, ζ), (3)
Where ζ = exp(−C
).
This version of OT admits a simpler resolution method,
based on successive projections over the two marginal
constraints.
We use this closed form projection to solve for
Unmixing problem
15. 18
Whispers 2016
Problem Definition
Optimal Transport
(OT)
11Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Unmixing of the spectrum µ
Lets assume a linear mixture : µ = Eα.
Where E(d×q) is the overcomplete dictionary and α > 0 is
a q-vector of abundance values and α 1 = 1.
We seek for p abundance values for each pixel and
(p ≤ q) → Endmember variability.
We also assume to have a prior knowledge α0(p×1) over
the abundances.
The unmixing of µ is then the solution of the following
optimization:
α = arg min
α
WC0, 0
(µ, Eα)
data fitting
+τ WC1, 1
(α, α0)
prior
. (4)
Data fitting part searches for the best decomposition from
the observations. Regularization part enforces the
compliance of the solution with the priors, balanced by
parameter τ ∈ R+
.
16. 18
Whispers 2016
Problem Definition
Optimal Transport
(OT)
12Unmixing with OT
Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Unmixing of the spectrum µ
α = arg min
α
WC0, 0
(µ, Eα)
data fitting
+τ WC1, 1
(α, α0)
prior
. (5)
C0(d×d) and C1(q×p) are respectively the cost function
matrix in the spectral domain and the cost function which
contains information about the endmember groups.
The resolution of the optimization is also an algorithm
based on iterative Bregman projections. See details in the
paper.
17. 18
Whispers 2016
Problem Definition
Optimal Transport
(OT)
Unmixing with OT
13Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
4-Vesta dataset
We do unmixing on a portion of 4-Vesta northern
hemispher.
The VIR image has 383 bands covering the ranges:
0.55 − 1.05µm with spectral sampling of 1.8nm.
1.0 − 2.5µm with spectral sampling of 9.8nm.
We look for three main lithologies : Eucrite, Orthopyroxene
and Olivine.
A dictionary of 10 atoms formed by the signatures of
different lithologies which found in meteorites was used.
18. 18
Whispers 2016
Problem Definition
Optimal Transport
(OT)
Unmixing with OT
14Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Cost metric for the captor (C0)
In order to tailor our cost matrix C0 in alignment to the
characteristics of the dataset, we build C0(383×383) as the
square euclidean distance over the spectral values.
This clearly reflects the characteristic of the spectra and
the level of (dis)similarity among them.
19. 18
Whispers 2016
Problem Definition
Optimal Transport
(OT)
Unmixing with OT
15Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Cost metric for the materials (C1)
We manually construct C1(10×3)
to reflect the information
regarding the groups of
endmembers belonging to the
same material.
Two endmembers belonging to
the same material share a very
low cost with the corresponding
material in α0, C1(i,j) = 0.
Priors over material groups α
We can also encode our prior knowledge about the domination
of one or another material through the vector α(3×1). In case
there is no such prior knowledge, we can set all the priors
equal value eg here 1/3.
20. 18
Whispers 2016
Problem Definition
Optimal Transport
(OT)
Unmixing with OT
16Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Comparison with other method
Abundance maps by OT
Abundance maps by constrained LS
Unmixing based on OT reveals interesting patterns for
distribution of each material.
21. 18
Whispers 2016
Problem Definition
Optimal Transport
(OT)
Unmixing with OT
17Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Abundance maps with varying priors
More extensive tests should be conducted, for finding the
best parametrization.
22. 18
Whispers 2016
Problem Definition
Optimal Transport
(OT)
Unmixing with OT
18Experiments and
results
Dept. IRISA
Université de Bretagne-SUD
France
Conclusion/Perspectives
Conclusion
An unmixing algorithm based on Optimal Transport.
The metric devoted to distributions is mostly sensitive to
shifts in the frequency domain.
Endmember variability is addressed through the use of
overcomplete dictionary.
Through an iterative Bregman projection a cost function is
to be optimized.
Perspectives
Introducing new regularization term that will account for
sparsity in the groupings.
Possible candidate could be sparse Group Lasso (or Fuse
Lasso).