This document proposes a method for scaling up dictionary learning for massive matrix factorization. It presents an online algorithm that can handle large datasets in both dimensions (many samples and many features) by introducing subsampling. The key steps are:
1) Computing codes on random subsets of samples instead of full samples to reduce complexity from O(p) to O(s) where s is the subsample size.
2) Partially updating the surrogate functions used for dictionary updates instead of full updates to also achieve O(s) complexity.
3) Performing cautious dictionary updates, leaving values unchanged for unseen features, to minimize in O(s) time.
Validation on fMRI and collaborative filtering datasets shows the method
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)Ulaş Bağcı
. Region Growing algorithm
• Homogeneity Criteria
• Split/Merge algorithm
• Examples from CT, MRI, PET
• Limitations
Image Filtering, Enhancement, Noise Reduction, and Signal Processing • MedicalImageRegistration • MedicalImageSegmentation • MedicalImageVisualization • Machine Learning in Medical Imaging • Shape Modeling/Analysis of Medical Images Deep Learning in Radiology
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)Ulaş Bağcı
. Region Growing algorithm
• Homogeneity Criteria
• Split/Merge algorithm
• Examples from CT, MRI, PET
• Limitations
Image Filtering, Enhancement, Noise Reduction, and Signal Processing • MedicalImageRegistration • MedicalImageSegmentation • MedicalImageVisualization • Machine Learning in Medical Imaging • Shape Modeling/Analysis of Medical Images Deep Learning in Radiology
발표자: 전석준(KAIST 박사과정)
발표일: 2018.8.
Super-resolution은 저해상도 이미지를 고해상도 이미지로 변환시키는 기술로 오랜기간 연구되어 온 주제입니다. 최근 딥러닝 기술이 적용됨에 따라 super-resolution 성능이 비약적으로 향상되었습니다. 저희는 스테레오 이미지를 이용하여 더 높은 해상도의 이미지를 얻는 기술을 개발하였습니다. 이에 관련 내용을 발표하고자 합니다.
1. Multi-Frame Super-Resolution
2. Learning-Based Super-Resolution
3. Stereo Imaging
4. Deep-Learning Based Stereo Super-Resolution
CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...Betagroup
CitizenLab is a citizen sourcing platform on which citizens co-create their city. The platform facilitates a two-way communication between the city and its citizens. Citizens post ideas, discuss them with each other and upvote the best ideas. On the other hand, the city uses CitizenLab to consult the opinion of its citizens and to crowdsource their creative solutions to an existing problem. Our SaaS solution helps cities to tap into the collective intelligence of the citizens and become more responsive to their citizens’ needs.
Computer Vision: Correlation, Convolution, and GradientAhmed Gad
Three important operations in computer vision are explained starting with each one got explained and implemented in Python.
Generally, all of these three operations have many similarities in as they follow the same general steps but there are some subtle changes. The main change is using different masks.
The single image dehazing based on efficient transmission estimationAVVENIRE TECHNOLOGIES
We propose a novel haze imaging model for single image haze removal. Haze imaging model is formulated using dark channel prior (DCP), scene radiance, intensity, atmospheric light and transmission medium. The dark channel prior is based on the statistics of outdoor haze-free images. We find that, in most of the local regions which do not cover the sky, some pixels (called dark pixels) very often have very low intensity in at least one color (RGB) channel. In hazy images, the intensity of these dark pixels in that channel is mainly contributed by the air light. Therefore, these dark pixels can directly provide an accurate estimation of the haze transmission. Combining a haze imaging model and a interpolation method, we can recover a high-quality haze free image and produce a good depth map.
Digital image processing is the use of computer algorithms to perform image processing on digital images. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing.
Image processing, Noise, Noise Removal filtersKuppusamy P
Basics of images, Digital Images, Noise, Noise Removal filters
Reference:
Richard Szeliski, Computer Vision: Algorithms and Applications, Springer 2010
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
발표자: 전석준(KAIST 박사과정)
발표일: 2018.8.
Super-resolution은 저해상도 이미지를 고해상도 이미지로 변환시키는 기술로 오랜기간 연구되어 온 주제입니다. 최근 딥러닝 기술이 적용됨에 따라 super-resolution 성능이 비약적으로 향상되었습니다. 저희는 스테레오 이미지를 이용하여 더 높은 해상도의 이미지를 얻는 기술을 개발하였습니다. 이에 관련 내용을 발표하고자 합니다.
1. Multi-Frame Super-Resolution
2. Learning-Based Super-Resolution
3. Stereo Imaging
4. Deep-Learning Based Stereo Super-Resolution
CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...Betagroup
CitizenLab is a citizen sourcing platform on which citizens co-create their city. The platform facilitates a two-way communication between the city and its citizens. Citizens post ideas, discuss them with each other and upvote the best ideas. On the other hand, the city uses CitizenLab to consult the opinion of its citizens and to crowdsource their creative solutions to an existing problem. Our SaaS solution helps cities to tap into the collective intelligence of the citizens and become more responsive to their citizens’ needs.
Computer Vision: Correlation, Convolution, and GradientAhmed Gad
Three important operations in computer vision are explained starting with each one got explained and implemented in Python.
Generally, all of these three operations have many similarities in as they follow the same general steps but there are some subtle changes. The main change is using different masks.
The single image dehazing based on efficient transmission estimationAVVENIRE TECHNOLOGIES
We propose a novel haze imaging model for single image haze removal. Haze imaging model is formulated using dark channel prior (DCP), scene radiance, intensity, atmospheric light and transmission medium. The dark channel prior is based on the statistics of outdoor haze-free images. We find that, in most of the local regions which do not cover the sky, some pixels (called dark pixels) very often have very low intensity in at least one color (RGB) channel. In hazy images, the intensity of these dark pixels in that channel is mainly contributed by the air light. Therefore, these dark pixels can directly provide an accurate estimation of the haze transmission. Combining a haze imaging model and a interpolation method, we can recover a high-quality haze free image and produce a good depth map.
Digital image processing is the use of computer algorithms to perform image processing on digital images. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing.
Image processing, Noise, Noise Removal filtersKuppusamy P
Basics of images, Digital Images, Noise, Noise Removal filters
Reference:
Richard Szeliski, Computer Vision: Algorithms and Applications, Springer 2010
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Blind Source Separation using Dictionary LearningDavide Nardone
The sparse decomposition of images and signals found great use in the field of: Compression, Noise removal and also in the Sources separation. This implies the decomposition of signals in the form of linear combinations with some elements of a redundant dictionary. The dictionary may be either a fixed dictionary (Fourier, Wavelet, etc) or may be learned from a set of samples. The algorithms based on learning the dictionary can be applied to a broad class of signals and have a better compression performance than methods based on fixed dictionary. Here we present a Compressed Sensing (CS) approach with an adaptive dictionary for solving a Determined Blind Source Separation (DBSS). The proposed method has been developed by reformulating a DBSS as Sparse Coding (SC) problem. The algorithm consist of few steps: Mixing matrix estimation, Sparse source separation and Source reconstruction. A sparse mixture of the original source signals has been used for the estimating the mixing matrix which have been used for the reconstruction of the of the source signals. A 'block signal representation' is used for representing the mixture in order to greatly improve the computation efficiency of the 'mixing matrix estimation' and the 'signal recovery' processes without particularly lose separation accuracy. Some experimental results are provided to compare the computation and separation performance of the method by varying the type of the dictionary used, be it fixed or an adaptive one. Finally a real case of study in the field of the Wireless Sensor Network (WSN) is illustrated in which a set of sensor nodes relay data to a multi-receiver node. Since more nodes transmits messages simultaneously it's necessary to separate the mixture of information at the receiver, thus solving a BSS problem.
Brain reading, compressive sensing, fMRI and statistical learning in PythonGael Varoquaux
Talk given at Gipsa-lab on using machine learning to learn from fMRI brain patterns and regions related to behavior. This talks focuses on the signal and inverse-problem aspects of the equation, as well as on the software.
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
Exploring temporal graph data with Python: a study on tensor decomposition o...André Panisson
Tensor decompositions have gained a steadily increasing popularity in data mining applications. Data sources from sensor networks and Internet-of-Things applications promise a wealth of interaction data that can be naturally represented as multidimensional structures such as tensors. For example, time-varying social networks collected from wearable proximity sensors can be represented as 3-way tensors. By representing this data as tensors, we can use tensor decomposition to extract community structures with their structural and temporal signatures.
The current standard framework for working with tensors, however, is Matlab. We will show how tensor decompositions can be carried out using Python, how to obtain latent components and how they can be interpreted, and what are some applications of this technique in the academy and industry. We will see a use case where a Python implementation of tensor decomposition is applied to a dataset that describes social interactions of people, collected using the SocioPatterns platform. This platform was deployed in different settings such as conferences, schools and hospitals, in order to support mathematical modelling and simulation of airborne infectious diseases. Tensor decomposition has been used in these scenarios to solve different types of problems: it can be used for data cleaning, where time-varying graph anomalies can be identified and removed from data; it can also be used to assess the impact of latent components in the spreading of a disease, and to devise intervention strategies that are able to reduce the number of infection cases in a school or hospital. These are just a few examples that show the potential of this technique in data mining and machine learning applications.
Slides for the presentation at ENBIS 2018 of "Deep k-Means: Jointly Clustering with k-Means and Learning Representations" by Thibaut Thonet. Joint work with Maziar Moradi Fard and Eric Gaussier.
Big Data and Small Devices by Katharina MorikBigMine
How can we learn from the data of small ubiquitous systems? Do we need to send the data to a server or cloud and do all learning there? Or can we learn on some small devices directly? Are smartphones small? Are navigation systems small? How complex is learning allowed to be in times of big data? What about graphical models? Can they be applied on small devices or even learned on restricted processors?
Big data are produced by various sources. Most often, they are distributedly stored at computing farms or clouds. Analytics on the Hadoop Distributed File System (HDFS) then follows the MapReduce programming model. According to the Lambda architecture of Nathan Marz and James Warren, this is the batch layer. It is complemented by the speed layer, which aggregates and integrates incoming data streams in real time. When considering big data and small devices, obviously, we imagine the small devices being hosts of the speed layer, only. Analytics on the small devices is restricted by memory and computation resources.
The interplay of streaming and batch analytics offers a multitude of configurations. In this talk, we discuss opportunities for using sophisticated models for learning spatio-temporal models. In particular, we investigate graphical models, which generate the probabilities for connected (sensor) nodes. First, we present spatio-temporal random fields that take as input data from small devices, are computed at a server, and send results to -possibly different — small devices. Second, we go even further: the Integer Markov Random Field approximates the likelihood estimates such that it can be computed on small devices. We illustrate our learning models by applications from traffic management.
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...multimediaeval
This working notes paper describes the system proposed by THU-HCSIL team for dynamic music emotion recognition. The procedure is divided into two module-feature extraction and regression. Both feature selection and feature combination are used to form the final THU feature set. In regression module, a Booster-based Multi-level Regression method is presented, which outperforms the baseline significantly on test data in RMSE metric for dynamic task.
http://ceur-ws.org/Vol-1263/mediaeval2014_submission_15.pdf
Simple representations for learning: factorizations and similarities Gael Varoquaux
Real-life data seldom comes in the ideal form for statistical learning.
This talk focuses on high-dimensional problems for signals and
discrete entities: when dealing with many, correlated, signals or
entities, it is useful to extract representations that capture these
correlations.
Matrix factorization models provide simple but powerful representations. They are used for recommender systems across discrete entities such as users and products, or to learn good dictionaries to represent images. However they entail large computing costs on very high-dimensional data, databases with many products or high-resolution images. I will present an
algorithm to factorize huge matrices based on stochastic subsampling that gives up to 10-fold speed-ups [1].
With discrete entities, the explosion of dimensionality may be due to variations in how a smaller number of categories are represented. Such a problem of "dirty categories" is typical of uncurated data sources. I will discuss how encoding this data based on similarities recovers a useful category structure with no preprocessing. I will show how it interpolates between one-hot encoding and techniques used in character-level natural language processing.
[1] Stochastic subsampling for factorizing huge matrices, A Mensch, J Mairal, B Thirion, G Varoquaux, IEEE Transactions on Signal Processing 66 (1), 113-128
[2] Similarity encoding for learning with dirty categorical variables. P Cerda, G Varoquaux, B Kégl Machine Learning (2018): 1-18
A crucial ingredient of a successful weather prediction system is its ability to combine observational data with the
output of numerical weather prediction models to estimate the state of the atmosphere and the oceans. This problem of estimation of the state of a high dimensional chaotic system such as the atmosphere, given noisy and partial observations of it is known as data assimilation in the context of earth sciences. The main object of interest in these problems is
the conditional distribution, called the posterior, of the state conditioned on the observations. Monte Carlo methods are the most commonly used techniques to study this posterior and also to use it efficiently for prediction. I will give a general introduction to the data assimilation problems and also to Monte Carlo techniques, followed by a discussion of some commonly used Monte Carlo algorithms for data assimilation.
Multi Model Ensemble (MME) predictions are a popular ad-hoc technique for improving predictions of high-dimensional, multi-scale dynamical systems. The heuristic idea behind MME framework is simple: given a collection of models, one considers predictions obtained through the convex superposition of the individual probabilistic forecasts in the hope of mitigating model error. However, it is not obvious if this is a viable strategy and which models should be included in the MME forecast in order to achieve the best predictive performance. I will present an information-theoretic approach to this problem which allows for deriving a sufficient condition for improving dynamical predictions within the MME framework; moreover, this formulation gives rise to systematic and practical guidelines for optimising data assimilation techniques which are based on multi-model ensembles. Time permitting, the role and validity of “fluctuation-dissipation” arguments for improving imperfect predictions of externally perturbed non-autonomous systems - with possible applications to climate change considerations - will also be addressed.
Workload-aware materialization for efficient variable elimination on Bayesian...Cigdem Aslay
Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method, which can lead to significant efficiency gains when processing inference queries using the Variable Elimination algorithm. In particular, we address the problem of choosing a set of intermediate results to precompute and materialize, so as to maximize the expected efficiency gain over a given query workload. For the problem we consider, we provide an optimal polynomial-time algorithm and discuss alternative methods. We validate our technique using real-world Bayesian networks. Our experimental results confirm that a modest amount of materialization can lead to significant improvements in the running time of queries, with an average gain of 70%, and reaching up to a gain of 99%, for a uniform workload of queries. Moreover, in comparison with existing junction tree methods that also rely on materialization, our approach achieves competitive efficiency during inference using significantly lighter materialization.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
In silico drugs analogue design: novobiocin analogues.pptx
Dictionary Learning for Massive Matrix Factorization
1. Dictionary Learning for
Massive Matrix Factorization
Arthur Mensch, Julien Mairal,
Ga¨el Varoquaux, Bertrand Thirion
Inria/CEA Parietal, Inria Thoth
June 20, 2016
2. Matrix factorization
X
p
n
D
p
k
=
A
n
k
1X ∈ Rp×n = DA ∈ Rp×k × Rk×n
Flexible tool for unsupervised data analysis
Dataset has lower underlying complexity than appearing size
How to scale it to very large datasets ? (Brain imaging, 2TB)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 1 / 19
3. Matrix factorization
X
p
n
D
p
k
=
A
n
k
1
Low rank factorization : k < p
X
p
n
D
p
k
=
A
n
k
1
...with optional sparse factors
→ interpretable data (fMRI, genetics, topic modeling)
X
p
n
D
p
k
=
A
n
k
1Overcomplete dictionary learning k p - sparse A
[Olshausen and Field, 1997]
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 2 / 19
4. Formalism and methods
Non-convex formulation
min
D∈C,A∈Rk×n
X − DA 2
2 + λΩ(A)
Constraints on D
Penalty on A ( 1, 2)
Naive resolution
Alternated minimization: use full X at each iteration
Very slow : single iteration in O(p n)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 3 / 19
5. Online matrix factorization
Stream (xt), update D at each t [Mairal et al., 2010]
Single iteration in O(p), a few epochs
xt
p
n
D
p
k
=
αt
n
k
streaming
1
Large n, regular p, eg image patches:
p = 256 n ≈ 106
1GB
Both (sparse) low-rank factorization / sparse coding
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 4 / 19
6. Scaling-up for massive matrices
Functional MRI (HCP dataset)
Brain “movies” : space × time
Extract k sparse networks
p = 2 · 105 n = 2 · 106 2 TB
Way larger than vision problems
Unusual setting: data is large in
both directions
Also useful in collaborative filtering
X
Voxels
Time
=
D A
k spatial maps Time
x
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 5 / 19
7. Scaling-up for massive matrices
Out-of-the-box online algorithm ?
xt
p
n
D
p
k
=
αt
n
k
1Limited time budget ?
Need to accomodate large p
235 h run time
1 full epoch
10 h run time
1
24
epoch
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 6 / 19
8. Scaling-up in both directions
X
p
n
Batch → online
xt
n
Steaming
Handle large n
1
xt
p
n
Streaming
Mtxt
n
Streaming
Subsampling
Handle large p
Online → double online
1
Online learning + partial random access to samples
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 7 / 19
9. Scaling-up in both directions
xt
p
n
Streaming
Mtxt
n
Streaming
Subsampling
Handle large p
Online → double online
1
Low-distorsion lemma [Johnson and Lindenstrauss, 1984]
Random linear alebra [Halko et al., 2009]
Sketching for data reduction [Pilanci and Wainwright, 2014]
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 8 / 19
10. Algorithm design
Online dictionary learning [Mairal et al., 2010]
1 Compute code – O(p)
αt = argmin
α∈Rk
xt − Dt−1α 2
2 + λΩ(αt)
2 Update surrogate – O(p)
gt =
1
t
t
i=1
xi − Dαi
2
2
3 Minimize surrogate – O(p)
Dt = argmin
D∈C
gt(D) = argmin
D∈C
Tr (D DAt − D Bt)
xt access → O(p) algorithm (complexity dependency in p)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 9 / 19
11. Introducing subsampling
Iteration cost in O(p): can we reduce it?
xt → Mtxt, p → rk Mt = s
Use only Mtxt in algorithm
computation: complexity in O(s)
Mtxt
p
n
Streaming
Subsampling
1Our contribution
Adapt the 3 parts of the algorith to obtain O(s) complexity
1 Code
computation
2 Surrogate
update
3 Surrogate
minimization
[Szab´o et al., 2011]: dictionary learning with missing value – O(p)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 10 / 19
12. 1. Code computation
Linear regression with random sampling
αt = argmin
α∈Rk
Mt(xt − Dt−1αt) 2
2 + λ
rk Mt
p
Ω(α)
approximative solution of
αt = argmin
α∈Rk
xt − Dt−1αt
2
2 + λΩ(α)
validity in high dimension, with incoherent features:
D MtD ≈
s
p
D D D Mtxt ≈
s
p
D xt
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 11 / 19
13. 2. Surrogate update
Original algorithm: At and Bt used in dictionary update
At = 1
t
t
i=1 αi αi same as in online algorithm
Bt = (1 − 1
t )Bt−1 + 1
t xtαt = 1
t
t
i=1 xi αi Forbidden
Partial update of Bt at each iteration
Bt =
1
t
i=1 Mi
t
i=1
Mi xi αi
Only MtB is updated
Behaves like Ex[xα] for large t
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 12 / 19
14. 3. Surrogate minimization
Original algorithm : block coordinate descent with projection on C
min
D∈C
gt(D) Dj ← p⊥
Cj
(Dj −
1
Aj,j
(D(At)j − (Bt)j ))
Forbidden update of full D at iteration t
Cautious update
Leave dictionary unchanged for unseen features (I − Mt)
min
D∈C
(I−Mt )D=(I−Mt )Dt−1
gt(D)
O(s) update in block coordinate descent
Dj ← p⊥
Cr
j
(Dj −
1
(At)j,j
(Mt(D(At)j − (Bt)j )))
1 ball Cr
j = {D ∈ C, MtD 1 ≤ MtDt−1 1}
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 13 / 19
15. Resting-state fMRI
HCP dataset
One brain image per second
200 sessions n = 2 · 106 p = 2 · 105
D AX
Voxels
Time
=
k spatial maps Time
x
Sparse decomposition k = 70 C = Bk
1 Ω = · 2
2
Validation
Increase reduction factor p
s
Objective function on test set vs CPU time
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 14 / 19
16. Resting-state fMRI
Online dictionary learning
235 h run time
1 full epoch
10 h run time
1
24 epoch
Proposed method
10 h run time
1
2 epoch, reduction r=12
Qualitatively, usable maps are obtained 10× faster
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 15 / 19
17. Resting-state fMRI
.1 h 1 h 10 h 100 h
CPU
time
2.20
2.25
2.30
2.35
2.40
Objectivevalueontestset ×108
Original online algorithm
Reduction factor r 4 8 12
No reduction
Speed-up close to reduction factor p
s
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 16 / 19
18. Resting-state fMRI
100 1000 Epoch 4000Records
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
2.24
Objectivevalueontestset ×108
λ = 10−4
No reduction
(original alg.)
r = 4
r = 8
r = 12
Convergence speed / number of seen records
Information is acquired faster
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 17 / 19
19. Collaborative filtering
Mtxt movie ratings from user t
vs. coordinate descent for MMMF loss (no hyperparameters)
100 s 1000 s
0.93
0.94
0.95
0.96
0.97
0.98
0.99 Netflix (140M)
Coordinate descent
Proposed
(full projection)
Proposed
(partial projection)
Dataset Test RMSE Speed
CD MODL -up
ML 1M 0.872 0.866 ×0.75
ML 10M 0.802 0.799 ×3.7
NF (140M) 0.938 0.934 ×6.8
Outperform coordinate descent beyond 10M ratings
Same prediction performance
Speed-up 6.8× on Netflix
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 18 / 19
20. Conclusion
Take-home message
Loading stochastic subsets of sample
streams can drastically accelerates online
matrix factorization
Mtxt
p
n
Streaming
Subsampling
1
Reduce CPU (+IO) load at each iteration
cf Gradient Descent vs SGD
An order of magnitude speed-up on two different problems
Python package http://github.com/arthurmensch/modl
Heuristic at contribution time
A follow-up algorithm has convergence guarantees
Questions ? (Poster # 41 this afternoon)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 19 / 19
21. Bibliography I
[Halko et al., 2009] Halko, N., Martinsson, P.-G., and Tropp, J. A. (2009).
Finding structure with randomness: Probabilistic algorithms for constructing
approximate matrix decompositions.
arXiv:0909.4061 [math].
[Johnson and Lindenstrauss, 1984] Johnson, W. B. and Lindenstrauss, J.
(1984).
Extensions of Lipschitz mappings into a Hilbert space.
Contemporary mathematics, 26(189-206):1.
[Mairal et al., 2010] Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2010).
Online learning for matrix factorization and sparse coding.
The Journal of Machine Learning Research, 11:19–60.
[Olshausen and Field, 1997] Olshausen, B. A. and Field, D. J. (1997).
Sparse coding with an overcomplete basis set: A strategy employed by V1?
Vision Research, 37(23):3311–3325.
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 20 / 19
22. Bibliography II
[Pilanci and Wainwright, 2014] Pilanci, M. and Wainwright, M. J. (2014).
Iterative Hessian sketch: Fast and accurate solution approximation for
constrained least-squares.
arXiv:1411.0347 [cs, math, stat].
[Szab´o et al., 2011] Szab´o, Z., P´oczos, B., and Lorincz, A. (2011).
Online group-structured dictionary learning.
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 2865–2872. IEEE.
[Yu et al., 2012] Yu, H.-F., Hsieh, C.-J., and Dhillon, I. (2012).
Scalable coordinate descent approaches to parallel matrix factorization for
recommender systems.
In Proceedings of the International Conference on Data Mining, pages
765–774. IEEE.
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 21 / 19
24. Collaborative filtering
Streaming uncomplete data
Mt is imposed by user t
Data stream : Mtxt movies ranked by user t
Proposed by [Szab´o et al., 2011]), with O(p) complexity
Validation: Test RMSE (rating prediction) vs CPU time
Baseline: Coordinate descent solver [Yu et al., 2012] solving
related loss
n
i=1
( Mt(Xt − Dαt) 2
2 + λ αt
2
2) + λ D 2
2
Fastest solver available apart from SGD – no hyperparameters
Our method is not sensitive to hyperparameters
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 22 / 19
25. Algorithm
Our algorithm
1 Code computation
αt = argmin
α∈Rk
Mt (xt − Dt−1α) 2
2
+ λ
rk Mt
p
Ω(αt )
2 Surrogate aggregation
At =
1
t
t
i=1
αi αi
Bt = Bt−1 +
1
t
i=1 Mi
(Mt xt αt − Mt Bt−1)
3 Surrogate minimization
Mt Dj ← p⊥
Cj
(Mt Dj −
1
(At )j,j
Mt (D(At )j −(Bt )j ))
Original online MF
1 Code computation
αt = argmin
α∈Rk
xt − Dt−1α 2
2
+ λΩ(αt )
2 Surrogate aggregation
At =
1
t
t
i=1
αi αi
Bt = Bt−1 +
1
t
(xt αt − Bt−1)
3 Surrogate minimization
Dj ← p⊥
Cr
j
(Dj −
1
(At )j,j
(D(At )j −(Bt )j ))
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 23 / 19