Tensor decompositions have gained a steadily increasing popularity in data mining applications. Data sources from sensor networks and Internet-of-Things applications promise a wealth of interaction data that can be naturally represented as multidimensional structures such as tensors. For example, time-varying social networks collected from wearable proximity sensors can be represented as 3-way tensors. By representing this data as tensors, we can use tensor decomposition to extract community structures with their structural and temporal signatures.
The current standard framework for working with tensors, however, is Matlab. We will show how tensor decompositions can be carried out using Python, how to obtain latent components and how they can be interpreted, and what are some applications of this technique in the academy and industry. We will see a use case where a Python implementation of tensor decomposition is applied to a dataset that describes social interactions of people, collected using the SocioPatterns platform. This platform was deployed in different settings such as conferences, schools and hospitals, in order to support mathematical modelling and simulation of airborne infectious diseases. Tensor decomposition has been used in these scenarios to solve different types of problems: it can be used for data cleaning, where time-varying graph anomalies can be identified and removed from data; it can also be used to assess the impact of latent components in the spreading of a disease, and to devise intervention strategies that are able to reduce the number of infection cases in a school or hospital. These are just a few examples that show the potential of this technique in data mining and machine learning applications.
This document discusses tensor decomposition with Python. It begins by explaining what tensor decomposition and factorization are, and how they can be used to represent multi-dimensional datasets and perform dimensionality reduction. It then discusses matrix and tensor factorization methods like NMF, topic modeling, and CP/PARAFAC decomposition. The remainder of the document provides examples of tensor decomposition using Python tools and libraries, and discusses applications to analyzing temporal network and sensor data.
- The document discusses data analysis methods for analyzing tactics in team sports.
- It covers collecting data through sensors and cameras, as well as different approaches to data analysis including rule-based and machine learning methods.
- Specifically, it proposes combining experience/theory-based approaches with machine learning to automatically classify attacking and defensive tactics, predict player trajectories based on who they are watching, and evaluate players and teams based on movement predictions.
The document discusses various topics related to data quality and preprocessing. It describes common data quality problems like noise, outliers, and missing values. It also discusses techniques for detecting and handling these problems. Additionally, it covers preprocessing techniques like aggregation, sampling, dimensionality reduction, feature selection and transformation, and data normalization. The goal of these techniques is to clean, transform and reduce data into a format that is suitable for analysis.
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...CSCJournals
Spike sorting is of prime importance in neurophysiology and hence has received considerable attention. However, conventional methods suffer from the degradation of clustering results in the presence of high levels of noise contamination. This paper presents a scheme for taking advantage of automatic clustering and enhancing the feature extraction efficiency, especially for low-SNR spike data. The method employs linear discriminant analysis based on a fuzzy c-means (FCM) algorithm. Simulated spike data [1] were used as the test bed due to better a priori knowledge of the spike signals. Application to both high and low signal-to-noise ratio (SNR) data showed that the proposed method outperforms conventional principal-component analysis (PCA) and FCM algorithm. FCM failed to cluster spikes for low-SNR data. For two discriminative performance indices based on Fisher's discriminant criterion, the proposed approach was over 1.36 times the ratio of between- and within-class variation of PCA for spike data with SNR ranging from 1.5 to 4.5 dB. In conclusion, the proposed scheme is unsupervised and can enhance the performance of fuzzy c-means clustering in spike sorting with low-SNR data.
This document discusses tensor decomposition with Python. It begins by explaining what tensor decomposition and factorization are, and how they can be used to represent multi-dimensional datasets and perform dimensionality reduction. It then discusses matrix and tensor factorization methods like NMF, topic modeling, and CP/PARAFAC decomposition. The remainder of the document provides examples of tensor decomposition using Python tools and libraries, and discusses applications to analyzing temporal network and sensor data.
- The document discusses data analysis methods for analyzing tactics in team sports.
- It covers collecting data through sensors and cameras, as well as different approaches to data analysis including rule-based and machine learning methods.
- Specifically, it proposes combining experience/theory-based approaches with machine learning to automatically classify attacking and defensive tactics, predict player trajectories based on who they are watching, and evaluate players and teams based on movement predictions.
The document discusses various topics related to data quality and preprocessing. It describes common data quality problems like noise, outliers, and missing values. It also discusses techniques for detecting and handling these problems. Additionally, it covers preprocessing techniques like aggregation, sampling, dimensionality reduction, feature selection and transformation, and data normalization. The goal of these techniques is to clean, transform and reduce data into a format that is suitable for analysis.
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...CSCJournals
Spike sorting is of prime importance in neurophysiology and hence has received considerable attention. However, conventional methods suffer from the degradation of clustering results in the presence of high levels of noise contamination. This paper presents a scheme for taking advantage of automatic clustering and enhancing the feature extraction efficiency, especially for low-SNR spike data. The method employs linear discriminant analysis based on a fuzzy c-means (FCM) algorithm. Simulated spike data [1] were used as the test bed due to better a priori knowledge of the spike signals. Application to both high and low signal-to-noise ratio (SNR) data showed that the proposed method outperforms conventional principal-component analysis (PCA) and FCM algorithm. FCM failed to cluster spikes for low-SNR data. For two discriminative performance indices based on Fisher's discriminant criterion, the proposed approach was over 1.36 times the ratio of between- and within-class variation of PCA for spike data with SNR ranging from 1.5 to 4.5 dB. In conclusion, the proposed scheme is unsupervised and can enhance the performance of fuzzy c-means clustering in spike sorting with low-SNR data.
Presented at Evolution 2013, June 24; describes an approach to teaching populations genetics at the upper undergraduate/beginning graduate level, using simulations based in R and incorporating available large genomic data sets.
1. Approximate message passing (AMP) is an algorithm that can be used for compressed sensing problems to recover a sparse signal x from linear measurements y=Ax+v in near-linear time.
2. Distributed AMP extends the AMP algorithm to distributed settings where multiple nodes take independent measurements yk=Akx+vk of the same signal x.
3. The distributed AMP algorithm involves nodes running AMP independently on their local measurements and aggregating information through message passing updates to estimate the signal x in a distributed manner.
1. The document discusses implicit shape representations for liver segmentation from CT scans, comparing heat, signed distance, and Poisson transforms.
2. It evaluates these representations using principal component analysis to build a linear shape space model from training data.
3. Results show the Poisson transform provides the most stable and effective implicit representation for segmentation, outperforming other methods in experiments projecting new shapes into the learned shape space.
Getting started with chemometric classificationAlex Henderson
The document provides an overview of chemometric classification and resources for working with spectroscopic data. It discusses key terminology like variables, observations, and vector space. It also covers important preprocessing steps like normalization, mean centering, and principal components analysis (PCA). PCA finds orthogonal principal components that maximize the explained variance in the data in a lower dimensional space.
A Non Parametric Estimation Based Underwater Target ClassifierCSCJournals
Underwater noise sources constitute a prominent class of input signal in most underwater signal processing systems. The problem of identification of noise sources in the ocean is of great importance because of its numerous practical applications. In this paper, a methodology is presented for the detection and identification of underwater targets and noise sources based on non parametric indicators. The proposed system utilizes Cepstral coefficient analysis and the Kruskal-Wallis H statistic along with other statistical indicators like F-test statistic for the effective detection and classification of noise sources in the ocean. Simulation results for typical underwater noise data and the set of identified underwater targets are also presented in this paper.
Multiscale Entropy Analysis (MSE) is a method for measuring the complexity of time series data across multiple temporal scales. It involves coarse-graining the time series into multiple scales and calculating a sample entropy value at each scale to quantify the regularity. When applied to physiological signals, MSE reveals greater complexity in original data versus surrogate data, unlike single-scale entropy analyses. The software provided calculates MSE for physiological time series and outputs sample entropy values over a range of scales. Outliers can impact results by changing the time series variance, and filtering can alter MSE curves.
An efficient fuzzy classifier with feature selection basedssairayousaf
This document presents an efficient fuzzy classifier with feature selection capabilities. A fuzzy entropy measure is used to partition the input feature space into non-overlapping decision regions and to select relevant features. Fuzzy entropy evaluates the information of pattern distribution in the pattern space. The decision regions do not overlap, reducing computational complexity and load. Classification speed is extremely fast while still achieving good performance by correctly determining decision region boundaries. Feature selection via fuzzy entropy reduces dimensionality by discarding noisy, redundant, and unimportant features. The proposed classifier is applied to two databases with good classification results, demonstrating its effectiveness.
Fractal Image Compression By Range Block ClassificationIRJET Journal
This document proposes a method for fractal image compression using range block classification and particle swarm optimization. It begins with an abstract that describes fractal image compression as a lossy technique that partitions images into range and domain blocks, with each range block searching domain blocks for the best match using PSO. The document then provides more details on PSO, describes implementing fractal image compression with PSO by having particles represent domain block locations and fitness measure matching between blocks, and shows experimental results compressing test images with the method. The goal is to improve compression ratio and decompressed image quality over traditional techniques by using PSO for block matching.
BPSO&1-NN algorithm-based variable selection for power system stability ident...IJAEMSJORNAL
Due to the very high nonlinearity of the power system, traditional analytical methods take a lot of time to solve, causing delay in decision-making. Therefore, quickly detecting power system instability helps the control system to make timely decisions become the key factor to ensure stable operation of the power system. Power system stability identification encounters large data set size problem. The need is to select representative variables as input variables for the identifier. This paper proposes to apply wrapper method to select variables. In which, Binary Particle Swarm Optimization (BPSO) algorithm combines with K-NN (K=1) identifier to search for good set of variables. It is named BPSO&1-NN. Test results on IEEE 39-bus diagram show that the proposed method achieves the goal of reducing variables with high accuracy.
Tenser Product of Representation for the Group CnIJERA Editor
The main objective of this paper is to compute the tenser product of representation for the group Cn. Also
algorithms designed and implemented in the construction of the main program designated for the determination
of the tenser product of representation for the group Cn including a flow-diagram of the main program. Some
algorithms are followed by simple examples for illustration.
A common fixed point theorem for two random operators using random mann itera...Alexander Decker
This academic article presents a common fixed point theorem for two random operators using a random Mann iteration scheme. It proves that if a sequence defined by the random Mann iteration of two random operators converges, then the limit point is a common fixed point of the two operators. The paper defines relevant concepts such as random operators and random fixed points. It then presents the main theorem and proof that under a contractive condition, the limit of the random Mann iteration is a common fixed point. The proof uses properties of measurable mappings and the convergence of the iterative sequence.
A Method for the Reduction 0f Linear High Order MIMO Systems Using Interlacin...IJMTST Journal
This document presents a new method for reducing the order of linear multi-input multi-output (MIMO) systems. The method obtains the denominator polynomial of the reduced order model using an interlacing property of the roots of the even and odd parts of the original system's denominator polynomial. The numerator polynomial is obtained using a factor division technique. The method is illustrated through an example of reducing a 4th order MIMO system to 2nd order. Response characteristics of the original and reduced systems are compared, showing the reduced model matches the time response of the original system well.
This document summarizes the bat algorithm, which is a metaheuristic optimization algorithm inspired by the echolocation behavior of microbats. It describes how bats use echolocation to locate prey and obstacles. The basic steps of the bat algorithm are outlined, including how bats emit calls and adjust properties like frequency and loudness. Variants of the bat algorithm are mentioned for solving multi-objective, fuzzy logic, and other problems. Applications discussed include engineering design, scheduling, data clustering, and image processing. Advantages include quick convergence and flexibility, while disadvantages include possible stagnation if parameters are adjusted too rapidly.
The document discusses discrete signal reconstruction using Discreteness-Aware Approximate Message Passing (DAMP). DAMP is an algorithm that can reconstruct discrete signals from compressed measurements by taking the discreteness of the signal into account. It is shown to outperform other methods like AMP and soft thresholding in terms of achieving lower mean square error and higher recovery rates, especially for signals with higher cardinality. The theoretical behavior of DAMP is also analyzed and shown to match empirical results.
Dictionary Learning for Massive Matrix FactorizationArthur Mensch
This document proposes a method for scaling up dictionary learning for massive matrix factorization. It presents an online algorithm that can handle large datasets in both dimensions (many samples and many features) by introducing subsampling. The key steps are:
1) Computing codes on random subsets of samples instead of full samples to reduce complexity from O(p) to O(s) where s is the subsample size.
2) Partially updating the surrogate functions used for dictionary updates instead of full updates to also achieve O(s) complexity.
3) Performing cautious dictionary updates, leaving values unchanged for unseen features, to minimize in O(s) time.
Validation on fMRI and collaborative filtering datasets shows the method
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...IJRESJOURNAL
With the development of productivity and the fast growth of the economy, environmental pollution, resource utilization and low product recovery rate have emerged subsequently, so more and more attention has been paid to the recycling and reuse of products. However, since the complexity of disassembly line balancing problem (DLBP) increases with the number of parts in the product, finding the optimal balance is computationally intensive. In order to improve the computational ability of particle swarm optimization (PSO) algorithm in solving DLBP, this paper proposed an improved adaptive multi-objective particle swarm optimization (IAMOPSO) algorithm. Firstly, the evolution factor parameter is introduced to judge the state of evolution using the idea of fuzzy classification and then the feedback information from evolutionary environment is served in adjusting inertia weight, acceleration coefficients dynamically. Finally, a dimensional learning strategy based on information entropy is used in which each learning object is uncertain. The results from testing in using series of instances with different size verify the effect of proposed algorithm.
Data fusion is the process of combining data from different sources to enhance the utility of the combined product. In remote sensing, input data sources are typically massive, noisy, and have different spatial supports and sampling characteristics. We take an inferential approach to this data fusion problem: we seek to infer a true but not directly observed spatial (or spatio-temporal) field from heterogeneous inputs. We use a statistical model to make these inferences, but like all models it is at least somewhat uncertain. In this talk, we will discuss our experiences with the impacts of these uncertainties and some potential ways addressing them.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Influence of Marketing Strategy and Market Competition on Business Plan
Exploring temporal graph data with Python: a study on tensor decomposition of wearable sensor data (PyData NYC 2015)
1. EXPLORING TEMPORAL GRAPH DATA WITH PYTHON
A STUDY ON TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
ANDRÉ PANISSON
@apanisson
ISI Foundation, Torino, Italy & New York City
2. WHY TENSOR FACTORIZATION + PYTHON?
▸ Matrix Factorization is already used in many fields
▸ Tensor Factorization is becoming very popular
for multiway data analysis
▸ TF is very useful to explore temporal graph data
▸ But still, the most used tool is Matlab
▸ There’s room for improvement in
the Python libraries for TF
▸ Study: NTF of wearable sensor data
4. FACTOR ANALYSIS
Spearman ~1900
X≈WH
Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects
Spearman, 1927: The abilities of man.
≈
tests
subjects subjects
tests
Int.
Int.
X W
H
5. TOPIC MODELING / LATENT SEMANTIC ANALYSIS
Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84.
. , ,
. , ,
. . .
gene
dna
genetic
life
evolve
organism
brai n
neuron
nerve
data
number
computer
. , ,
Topics Documents
Topic proportions and
assignments
0.04
0.02
0.01
0.04
0.02
0.01
0.02
0.01
0.01
0.02
0.02
0.01
data
number
computer
. , ,
0.02
0.02
0.01
6. TOPIC MODELING / LATENT SEMANTIC ANALYSIS
X≈WH
Non-negative Matrix Factorization (NMF):
(~1970 Lawson, ~1995 Paatero, ~2000 Lee & Seung)
2005 Gaussier et al. "Relation between PLSA and NMF and implications."
arg min
W,H
kX WHk s. t. W, H 0
≈
documents
terms terms
documents
topic
topic
Sparse
Matrix!
7. NON-NEGATIVE MATRIX FACTORIZATION (NMF)
NMF gives Part based representation
(Lee & Seung – Nature 1999)
NMF
=×
Original
PCA
×
=
NMF is equivalent to Spectral Clustering
(Ding et al. - SDM 2005)
W W •
VHT
WHHT
H H •
WT
V
WTWH
arg min
W,H
kX WHk s. t. W, H 0
8. from sklearn import datasets, decomposition
digits = datasets.load_digits()
A = digits.data
nmf = decomposition.NMF(n_components=10)
W = nmf.fit_transform(A)
H = nmf.components_
plt.rc("image", cmap="binary")
plt.figure(figsize=(8,4))
for i in range(10):
plt.subplot(2,5,i+1)
plt.imshow(H[i].reshape(8,8))
plt.xticks(())
plt.yticks(())
plt.tight_layout()
9. BEYOND MATRICES: HIGH DIMENSIONAL DATASETS
Cichocki et al. Nonnegative Matrix and Tensor Factorizations
Environmental analysis
▸ Measurement as a function of (Location, Time, Variable)
Sensory analysis
▸ Score as a function of (Food sample, Judge, Attribute)
Process analysis
▸ Measurement as a function of (Batch, Variable, time)
Spectroscopy
▸ Intensity as a function of (Wavelength, Retention, Sample, Time,
Location, …)
…
MULTIWAY DATA ANALYSIS
17. RANK-1 TENSOR
The outer product of N vectors results in a rank-1 tensor
array([[[ 1., 2.],
[ 2., 4.],
[ 3., 6.],
[ 4., 8.]],
[[ 2., 4.],
[ 4., 8.],
[ 6., 12.],
[ 8., 16.]],
[[ 3., 6.],
[ 6., 12.],
[ 9., 18.],
[ 12., 24.]]])
a = np.array([1, 2, 3])
b = np.array([1, 2, 3, 4])
c = np.array([1, 2])
T = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for i in range(a.shape[0]):
for j in range(b.shape[0]):
for k in range(c.shape[0]):
T[i, j, k] = a[i] * b[j] * c[k]
T = a(1)
· · · a(N)
=
a
c
b
Ti,j,k = aibjck
18. TENSOR RANK
▸ Every tensor can be written as a sum of rank-1 tensors
=
a1 aJ
c1 cJ
b1 bJ
+ +
▸ Tensor rank: smallest number of rank-1 tensors
that can generate it by summing up
X ⇡
RX
r=1
a(1)
r a(2)
r · · · a(N)
r ⌘ JA(1)
, A(2)
, · · · , A(N)
K
T ⇡
RX
r=1
ar br cr ⌘ JA, B, CK
19. array([[[ 61., 82.],
[ 74., 100.],
[ 87., 118.],
[ 100., 136.]],
[[ 77., 104.],
[ 94., 128.],
[ 111., 152.],
[ 128., 176.]],
[[ 93., 126.],
[ 114., 156.],
[ 135., 186.],
[ 156., 216.]]])
A = np.array([[1, 2, 3],
[4, 5, 6]]).T
B = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]]).T
C = np.array([[1, 2],
[3, 4]]).T
T = np.zeros((A.shape[0], B.shape[0], C.shape[0]))
for i in range(A.shape[0]):
for j in range(B.shape[0]):
for k in range(C.shape[0]):
for r in range(A.shape[1]):
T[i, j, k] += A[i, r] * B[j, r] * C[k, r]
T = np.einsum('ir,jr,kr->ijk', A, B, C)
: Kruskal Tensorbr cr ⌘ JA, B, CK
20. TENSOR FACTORIZATION
▸ CANDECOMP/PARAFAC factorization (CP)
▸ extensions of SVD / PCA / NMF of matrices
NON-NEGATIVE TENSOR FACTORIZATION
▸ Decompose a non-negative tensor to
a sum of R non-negative rank-1 tensors
arg min
A,B,C
kT JA, B, CKk
with JA, B, CK ⌘
RX
r=1
ar br cr
subject to A 0, B 0, C 0
21. TENSOR FACTORIZATION: HOW TO
Alternating Least Squares(ALS):
Fix all but one factor matrix to which LS is applied
min
A 0
kT(1) A(C B)T
k
min
B 0
kT(2) B(C A)T
k
min
C 0
kT(3) C(B A)T
k
denotes the Khatri-Rao product, which is a
column-wise Kronecker product, i.e., C B = [c1 ⌦ b1, c2 ⌦ b2, . . . , cr ⌦ br]
T(1) = ˆA(ˆC ˆB)T
T(2) = ˆB(ˆC ˆA)T
T(3) = ˆC(ˆB ˆA)T
Unfolded Tensor
on the kth mode
22. F = [zeros(n, r), zeros(m, r), zeros(t, r)]
FF_init = np.rand((len(F), r, r))
def iter_solver(T, F, FF_init):
# Update each factor
for k in range(len(F)):
# Compute the inner-product matrix
FF = ones((r, r))
for i in range(k) + range(k+1, len(F)):
FF = FF * FF_init[i]
# unfolded tensor times Khatri-Rao product
XF = T.uttkrp(F, k)
F[k] = F[k]*XF/(F[k].dot(FF))
# F[k] = nnls(FF, XF.T).T
FF_init[k] = (F[k].T.dot(F[k]))
return F, FF_init
W W •
VHT
WHHT
H H •
WT
V
WTWH
min
A 0
kT(1) A(C B)T
k
min
B 0
kT(2) B(C A)T
k
min
C 0
kT(3) C(B A)T
k
arg min
W,H
kX WHk s.
J. Kim and H. Park. Fast Nonnegative Tensor Factorization with an Active-set-like Method. In High-
Performance Scientific Computing: Algorithms and Applications, Springer, 2012, pp. 311-326.
23. HOW TO INTERPRET: USER X TERM X TIME
X is a 3-way tensor in which xnmt is 1 if the term m was used by user
n at interval t, 0 otherwise
ANxK
is the the association of each user n to a factor k
BMxK
is the association of each term m to a factor k
CTxK
shows the time activity of each factor
users
users
C
=
X
A
B
(N×M×T)
(T×K)
(N×K)
(M×K)
terms
tim
e
tim
e
terms
factors
33. SocioPatterns.org
7 years, 30+ deployments, 10 countries, 50,000+ persons
• Mongan Institute for Health Policy, Boston
• US Army Medical Component of the Armed Forces, Bangkok
• School of Public Health of the University of Hong Kong
• KEMRI Wellcome Trust, Kenya
• London School for Hygiene and Tropical Medicine, London
• Public Health England, London
• Saw Swee Hock School of Public Health, Singapore
40. ANOMALY DETECTION IN TEMPORAL NETWORKS
A. Sapienza et al. ”Detecting anomalies in time-varying networks using tensor decomposition”, ICDM Data Mining in Networks