Singular Value Decomposition (SVD) is a useful technique for matrices factorisation and as such is ideally suited for handling large sets of data. Mathematica 10 built-in powerful and optimised SVD functionality enables efficient and quick processing of big data where number of records reaches millions.
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Singular value decompostion - application in data analytics
1. Singular Value Decomposition with Mathematica 10
Singular value decomposition (SVD) is a useful matrix factorisation technique ideally suited for large datasets. Similarly to the PCA discussed previously, SVD can be applied in the multivariate analysis to obtain reduced footprint of data diagrams with just few variables which represent fundamental patterns in the noisy data. Symbolically, the SVD breaks the matrix into 3 components: where is the left m x n singular vectors matrix, is a diagonal n x n singular values matrix and is the right n x n singular vectors matrix. The SVD also performs the sorting of columns using high-to-low singular value ordering with the highest singular value occupying the upper left corner and the smallest singular value sitting in the bottom right corner of matrix . Mathematica 10 provides rich functionality for SVD processing with well-defined functions.
Let’s assume the fund invests into large set of financial instruments [25,000] where the daily prices have been collected for 1Y. This is quite large dataset with 6.25 mil of records. The objective is to analyse the behaviour of this large universe. The sample of first 25 securities is shown below:
We apply the SVD method to detect major trends in this portfolio. Since the dataset is large, our objective is to identify the driving elements in the group. We want to reduce the 250 x 25,000 matrix into a smaller manageable set.
We use two Mathematica function: (i) SingularValueDecomposition and (ii) SingularValueList which provide the necessary tools for our analysis. To optimise the work, we reduce the number of factors to 10, i.e. focus only on the 10 largest singular values as representative drivers of the entire universe:
This is our result – 10 largest SV. In terms of their ‘weight’ in the overall portfolio, we can see that the 1st SV is roughly 50% of the overall explanation of noise in the data
The power of SVD comes in detecting driving factors quickly and efficiently. This is the ratio of successive singular values:
With the chart:
And their cumulative weights:
2. So what does SVD analysis tell us? Instead of looking at the large universe of data, we can construct a reduces set of representative ‘drivers’ that provide explanatory feedback of what causes changes in the values over time. In our case, the 10 largest SV explain approx. 70% of changes in 25,000 series of data. This represents a massive reduction in dimensionality and efficiency of our effort. If higher explanatory level is required, we can increase the factors set to higher number – say 15 – to move to a higher confidence level.
The meaning of factors is similar to that of PCA and they refer to various deformation modes of the time series. In statistical terms they can be linked to the moments of multivariate distribution.
It may be interest to look also the U and V matrices that provide additional useful information. In this respect the rows of matrix V are particularly meaningful as they ‘decompose’ and enrich the information on how each singular value affects column components.
For example, row[1] in the V matrix provides explanation how the mean value of the up-down movement is propagated across each factor:
In the same way, we can examine other important SVs in decreasing order of importance:
Second factor:
Third factor: