SlideShare a Scribd company logo
1 of 33
Download to read offline
Parallel Non-blocking Deterministic
Algorithm for Online Topic Modeling
Murat Apishev
great-mel@yandex.ru
Oleksandr Frei
oleksandr.frei@gmail.com
HSE, MSU, MIPT
April 8, 2016
Contents
1 Introduction
Topic modeling
ARTM
BigARTM
2 Parallel implementation
Synchronous algorithms
Asynchronous algorithms
Comparison
3 Applications
The RSF project
Conclusions
Introduction
Parallel implementation
Applications
Topic modeling
ARTM
BigARTM
Topic modeling
Topic modeling — an application of machine learning to statistical
text analysis.
Topic — a specific terminology of the subject area, the set of terms
(unigrams or n−grams) frequently appearing together in
documents.
Topic model uncovers latent semantic structure of a text collection:
topic t is a probability distribution p(w|t) over terms w
document d is a probability distribution p(t|d) over topics t
Applications — information retrieval for long-text queries,
classification, categorization, summarization of texts.
Murat Apishev great-mel@yandex.ru AIST 2016 3 / 33
Introduction
Parallel implementation
Applications
Topic modeling
ARTM
BigARTM
Topic modeling task
Given: W — set (vocabulary) of terms (unigrams or n−grams),
D — set (collection) of text documents d ⊂ W ,
ndw — how many times term w appears in document d.
Find: model p(w|d) =
∑︀
t∈T
𝜑wt 𝜃td with parameters Φ
W×T
и Θ
T×D
:
𝜑wt =p(w|t) — term probabilities w in each topic t,
𝜃td =p(t|d) — topic probabilities t in each document d.
Criteria log-likelihood maximization:
∑︁
d∈D
∑︁
w∈d
ndw ln
∑︁
t∈T
𝜑wt 𝜃td → max
𝜑,𝜃
;
𝜑wt 0;
∑︀
w 𝜑wt = 1; 𝜃td 0;
∑︀
t 𝜃td = 1.
Issue: the problem of stochastic matrix factorization is ill-posed:
ΦΘ = (ΦS)(S−1Θ) = Φ′Θ′.
Murat Apishev great-mel@yandex.ru AIST 2016 4 / 33
Introduction
Parallel implementation
Applications
Topic modeling
ARTM
BigARTM
PLSA and EM-algorithm
Log-likelihood maximization:
∑︁
d∈D
∑︁
w∈W
ndw ln
∑︁
t
𝜑wt 𝜃td → max
Φ,Θ
EM-algorithm: the simple iteration method for the set of equations
E-шаг:
M-шаг:
⎧
⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎩
ptdw = norm
t∈T
(︀
𝜑wt 𝜃td
)︀
𝜑wt = norm
w∈W
(︀
nwt
)︀
, nwt =
∑︀
d∈D
ndw ptdw
𝜃td = norm
t∈T
(︀
ntd
)︀
, ntd =
∑︀
w∈d
ndw ptdw
where norm
i∈I
xi = max{xi ,0}∑︀
j∈I
max{xj ,0}
Murat Apishev great-mel@yandex.ru AIST 2016 5 / 33
Introduction
Parallel implementation
Applications
Topic modeling
ARTM
BigARTM
ARTM and regularized EM-algorithm
Log-likelihood maximization with additive regularization criterion R:
∑︁
d∈D
∑︁
w∈W
ndw ln
∑︁
t
𝜑wt 𝜃td + R(Φ, Θ) → max
Φ,Θ
EM-algorithm: the simple iteration method for the set of equations
E-шаг:
M-шаг:
⎧
⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎩
ptdw = norm
t∈T
(︀
𝜑wt 𝜃td
)︀
𝜑wt = norm
w∈W
(︁
nwt + 𝜑wt
𝜕R
𝜕𝜑wt
)︁
, nwt =
∑︀
d∈D
ndw ptdw
𝜃td = norm
t∈T
(︁
ntd + 𝜃td
𝜕R
𝜕𝜃td
)︁
, ntd =
∑︀
w∈d
ndw ptdw
Murat Apishev great-mel@yandex.ru AIST 2016 6 / 33
Introduction
Parallel implementation
Applications
Topic modeling
ARTM
BigARTM
Examples of regularizers
Many Bayesian models can be reinterpreted as regularizers
in ARTM.
Some examples of regularizes:
1 Smoothing Φ / Θ (leads to popular LDA model)
2 Sparsing Φ / Θ
3 Decorrelation of topics in Φ
4 Semi-supervised learning
5 Topic coherence maximization
6 Topic selection
7 . . .
Murat Apishev great-mel@yandex.ru AIST 2016 7 / 33
Introduction
Parallel implementation
Applications
Topic modeling
ARTM
BigARTM
Multimodal Topic Model
Multimodal Topic Model finds topical distributions for terms
p(w|t), authors p(a|t), time p(y|t), objects of images p(o|t),
linked documents p(d′|t), advertising banners p(b|t), users p(u|t),
and binds all these modalities into a single topic model.
Topics of documents
Words and keyphrases of topics
doc1:
doc2:
doc3:
doc4:
...
Text documents
Topic
Modeling
D
o
c
u
m
e
n
t
s
T
o
p
i
c
s
Metadata:
Authors
Data Time
Conference
Organization
URL
etc.
Ads Images Links
Users
Murat Apishev great-mel@yandex.ru AIST 2016 8 / 33
Introduction
Parallel implementation
Applications
Topic modeling
ARTM
BigARTM
M-ARTM and multimodal regularized EM-algorithm
W m
is a vocabulary of terms of m-th modality, m ∈ M,
W = W 1
⊔ W m
as a joint vocabulary of all modalities
Multimodal log-likelihood maximization with additive regularization
criterion R:
∑︁
m∈M
𝜆m
∑︁
d∈D
∑︁
w∈W m
ndw ln
∑︁
t
𝜑wt 𝜃td + R(Φ, Θ) → max
Φ,Θ
EM-algorithm: the simple iteration method for the set of equations
E-шаг:
M-шаг:
⎧
⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎩
ptdw = norm
t∈T
(︀
𝜑wt 𝜃td
)︀
𝜑wt = norm
w∈W m
(︁
nwt + 𝜑wt
𝜕R
𝜕𝜑wt
)︁
, nwt =
∑︀
d∈D
𝜆m(w)ndw ptdw
𝜃td = norm
t∈T
(︁
ntd + 𝜃td
𝜕R
𝜕𝜃td
)︁
, ntd =
∑︀
w∈d
𝜆m(w)ndw ptdw
Murat Apishev great-mel@yandex.ru AIST 2016 9 / 33
Introduction
Parallel implementation
Applications
Topic modeling
ARTM
BigARTM
BigARTM project
BigARTM features:
Fast1
parallel and online processing of Big Data;
Multimodal and regularized topic modeling;
Built-in library of regularizers and quality measures;
BigARTM community:
Open-source https://github.com/bigartm
Documentation http://bigartm.org
BigARTM license and programming environment:
Freely available for commercial usage (BSD 3-Clause license)
Cross-platform — Windows, Linux, Mac OS X (32 bit, 64 bit)
Programming APIs: command line, C++, Python
1
Vorontsov K., Frei O., Apishev M., Romov P., Dudarenko M. BigARTM:
Open Source Library for Regularized Multimodal Topic Modeling of Large
Collections Analysis of Images, Social Networks and Texts. 2015
Murat Apishev great-mel@yandex.ru AIST 2016 10 / 33
Introduction
Parallel implementation
Applications
Topic modeling
ARTM
BigARTM
BigARTM vs. Gensim vs. Vowpal Wabbit LDA
3.7M articles from Wikipedia, 100K unique words
Framework procs train inference perplexity
BigARTM 1 35 min 72 sec 4000
LdaModel 1 369 min 395 sec 4161
VW.LDA 1 73 min 120 sec 4108
BigARTM 4 9 min 20 sec 4061
LdaMulticore 4 60 min 222 sec 4111
BigARTM 8 4.5 min 14 sec 4304
LdaMulticore 8 57 min 224 sec 4455
procs = number of parallel threads
inference = time to infer 𝜃d for 100K held-out documents
perplexity P is calculated on held-out documents
P(D) = exp
(︂
−
1
n
∑︁
d∈D
∑︁
w∈d
ndw ln
∑︁
t∈T
𝜑wt 𝜃td
)︂
, n =
∑︁
d
nd .
Murat Apishev great-mel@yandex.ru AIST 2016 11 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
Offline algorithm
The collection is split into batches.
Offline algorithm performs scans over the collection.
Each thread process one batch at a time, inferring nwt and 𝜃td
(using Θ regularization).
After each scan algorithm recalculates Φ matrix and apply Φ
regularizers according to the equation
𝜑wt = norm
w∈W
(︁
nwt + 𝜑wt
𝜕R
𝜕𝜑wt
)︁
.
The implementation never stores the entire Θ matrix at any
given time.
Murat Apishev great-mel@yandex.ru AIST 2016 12 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
Offline algorithm: Gantt chart
0s 4s 8s 12s 16s 20s 24s 28s 32s 36s
Main
Proc-1
Proc-2
Proc-3
Proc-4
Proc-5
Proc-6
Batch processing Norm
This and further Gantt charts were created using the NYTimes dataset:
https://archive.ics.uci.edu/ml/datasets/Bag+of+Words.
Size of dataset is ≈ 300k documents, but each algorithm was run on
some subset (from 70% to 100%) to archive the ≈ 36 sec. working time.
Murat Apishev great-mel@yandex.ru AIST 2016 13 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
Online algorithm
The algorithm is a generalization of Online variational Bayes
algorithm for LDA model.
Online ARTM improves the convergence rate of the
Offline ARTM by re-calculating matrix Φ after every 𝜂
batches.
Better suited for large and heterogeneous text collections.
Weighted sum of nwt from previous and current 𝜂 batches to
control the importance of new information.
Issue: all threads has no useful work to do during the update
of Φ matrix.
Murat Apishev great-mel@yandex.ru AIST 2016 14 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
Online algorithm: Gantt chart
0 s. 4 s. 8 s. 12 s. 16 s. 20 s. 24 s. 28 s. 32 s. 36 s.
Main
Proc-1
Proc-2
Proc-3
Proc-4
Proc-5
Proc-6
Odd batch Even batch Norm Merge
Murat Apishev great-mel@yandex.ru AIST 2016 15 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
Async: Asynchronous online algorithm
Processor threads:
ProcessBatch(Db,¡wt)Db ñwt
Merger thread:
Accumulate ñwt
Recalculate ¡wt
Queue
{Db}
Queue
{ñwt}
¡wt
Sync()
Db
Faster asynchronous implementation (it was compared with
Gensim and VW LDA)
Issue: Merger and DataLoader can become a bottleneck.
Issue: the result of such algorithm is non-deterministic.
Murat Apishev great-mel@yandex.ru AIST 2016 16 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
Async: Gantt chart in normal case
0s 4s 8s 12s 16s 20s 24s 28s 32s 36s
Merger
Proc-1
Proc-2
Proc-3
Proc-4
Proc-5
Proc-6
Odd batch Even batch Norm
Merge matrix Merge increments
Murat Apishev great-mel@yandex.ru AIST 2016 17 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
Async: Gantt chart in bad case
0s 4s 8s 12s 16s 20s 24s 28s 32s 36s
Merger
Proc-1
Proc-2
Proc-3
Proc-4
Proc-5
Proc-6
Odd batch Even batch Norm
Merge matrix Merge increments
Murat Apishev great-mel@yandex.ru AIST 2016 18 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
DetAsync: Deterministic asynchronous online algorithm
To avoid the indeterministic behavior lets replace the update
after first 𝜂 batches with update after given 𝜂 batches.
Remove Merger and DataLoader threads. Each Processor
thread reads batches and writes results into nwt matrix by
itself.
Processor threads get a set of batches to process, start
processing and immediately return a future object to main
thread.
The main thread can process the updates of Φ matrix while
Processor threads work, and then get the result by passing
received future object to Await function.
Murat Apishev great-mel@yandex.ru AIST 2016 19 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
DetAsync: schema
MasterModel
Processor threads:
Db = LoadBatch(b)
ProcessBatch(Db,¡wt)
Main thread:
Recalculate ¡wt
nwt¡wt
Transform({Db})
FitOffline({Db})
FitOnline({Db})
Murat Apishev great-mel@yandex.ru AIST 2016 20 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
DetAsync: Gantt chart
0s 4s 8s 12s 16s 20s 24s 28s 32s 36s
Main
Proc-1
Proc-2
Proc-3
Proc-4
Proc-5
Proc-6
Odd batch Even batch Norm Merge
Murat Apishev great-mel@yandex.ru AIST 2016 21 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
Experiments
Datasets: Wikipedia (|D| = 3.7M articles, |W | = 100K words), Pubmed
(|D| = 8.2M abstracts, |W | = 141K words).
Node: Intel Xeon CPU E5-2650 v2 system with 2 processors, 16 physical
cores in total (32 with hyper-threading).
Metric: perplexity P value achieved in the allotted time.
Time: each algorithm was time-boxed to run for a 30 minutes.
Peak memory usage (Gb):
|T| Offline Online DetAsync Async (v0.6)
Pubmed 1000 5.17 4.68 8.18 13.4
Pubmed 100 1.86 1.62 2.17 3.71
Wiki 1000 1.74 2.44 3.93 7.9
Wiki 100 0.54 0.53 0.83 1.28
Murat Apishev great-mel@yandex.ru AIST 2016 22 / 33
Introduction
Parallel implementation
Applications
Synchronous algorithms
Asynchronous algorithms
Comparison
Reached perplexity value
0 5 10 15 20 25 30
2,000
2,200
2,400
Time (min)
Perplexity
Offline
Online
Async
DetAsync
10 15 20 25 30 35
3,800
4,000
4,200
4,400
4,600
4,800
5,000
Time (min)
Perplexity
Offline
Online
Async
DetAsync
Wikipedia (left), Pubmed (right).
DetAsync achives best perplexity in given time-box.
Murat Apishev great-mel@yandex.ru AIST 2016 23 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Mining ethnic-related content from blogosphere
Development of concept and methodology for multi-level
monitoring of the state of inter-ethnic relations with the data from
social media.
The objectives of Topic Modeling in this project:
1 Identify ethnic topics in social media big data
2 Identify event and permanent ethnic topics
3 Identify spatio-temporal patterns of the ethnic discourse
4 Estimate the sentiment of the ethnic discourse
5 Develop the monitoring system of inter-ethnic discourse
The Russian Science Foundation grant 15-18-00091 (2015–2017)
(Higher School of Economics, St. Petersburg School of Social Sciences and
Humanities, Internet Studies Laboratory LINIS)
Murat Apishev great-mel@yandex.ru AIST 2016 24 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Example ethnonyms for semi-supervised topic modeling
османский русич
восточноевропейский сингапурец
эвенк перуанский
швейцарская словенский
аланский вепсский
саамский ниггер
латыш адыги
литовец сомалиец
цыганка абхаз
ханты-мансийский темнокожий
карачаевский нигериец
кубинка лягушатник
гагаузский камбоджиец
Murat Apishev great-mel@yandex.ru AIST 2016 25 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Regularization for finding ethnic topics
smoothing ethnonyms in ethnic topics
sparsing ethnonyms in background topics
Murat Apishev great-mel@yandex.ru AIST 2016 26 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Regularization for finding ethnic topics
smoothing ethnonyms in ethnic topics
sparsing ethnonyms in background topics
smoothing non-ethnonyms for background topics
Murat Apishev great-mel@yandex.ru AIST 2016 27 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Regularization for finding ethnic topics
smoothing ethnonyms in ethnic topics
sparsing ethnonyms in background topics
smoothing non-ethnonyms in background topics
decorrelating ethnic topics
Murat Apishev great-mel@yandex.ru AIST 2016 28 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Regularization for finding ethnic topics
smoothing ethnonyms in ethnic topics
sparsing ethnonyms in background topics
smoothing non-ethnonyms in background topics
decorrelating ethnic topics
adding ethnonyms modality and decorrelating their topics
Murat Apishev great-mel@yandex.ru AIST 2016 29 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Experiment
LiveJournal collection: 1.58M of documents
860K of words in the raw vocabulary after lemmatization
90K of words after filtering out
short words with length 2,
rare words with nw < 20 including:
non-Russian words
250 ethnonyms
Murat Apishev great-mel@yandex.ru AIST 2016 30 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Semi-supervised ARTM for ethnic topic modeling
The number of ethnic topics found by the model:
model ethnic |S| background |B| ++ +− −+ coh20
2
tfidf20
PLSA 400 12 15 17 -1447 -1012
LDA 400 12 15 17 -1540 -1121
ARTM-4 250 150 21 27 20 -1651 -1296
ARTM-5 250 150 38 42 30 -1342 -908
ARTM-4:
ethnic topics: sparsing and decorrelating, ethnonyms smoothing
background topics: smoothing, ethnonyms sparsing
ARTM-5:
ARTM-4 + ethnonyms as additional modality
2
Coherence and TF-IDF coherence are metrics that match the human
judgment of topic quality. The topic is better if it has higher coherence value.
Murat Apishev great-mel@yandex.ru AIST 2016 31 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Ethnic topics examples
(русские): русский, князь, россия, татарин, великий, царить, царь, иван,
император, империя, грозить, государь, век, московская, екатерина, москва,
(русские): акция, организация, митинг, движение, активный, мероприятие,
совет, русский, участник, москва, оппозиция, россия, пикет, протест, проведение,
националист, поддержка, общественный, проводить, участие,
(славяне, византийцы): славянский, святослав, жрец, древние, письменность,
рюрик, летопись, византия, мефодий, хазарский, русский, азбука,
(сирийцы): сирийский, асад, боевик, район, террорист, уничтожать, группировка,
дамаск, оружие, алесио, оппозиция, операция, селение, сша, нусра, турция,
(турки): турция, турецкий, курдский, эрдоган, стамбул, страна, кавказ, горин,
полиция, премьер-министр, регион, курдистан, ататюрк, партия,
(иранцы): иран, иранский, сша, россия, ядерный, президент, тегеран, сирия, оон,
израиль, переговоры, обама, санкция, исламский,
(палестинцы): террорист, израиль, терять, палестинский, палестинец,
террористический, палестина, взрыв, территория, страна, государство,
безопасность, арабский, организация, иерусалим, военный, полиция, газ,
(ливанцы): ливанский, боевик, район, ливан, армия, террорист, али, военный,
хизбалла, раненый, уничтожать, сирия, подразделение, квартал, армейский,
(ливийцы): ливан, демократия, страна, ливийский, каддафи, государство,
алжир, война, правительство, сша, арабский, али, муаммар, сирия,
(евреи): израиль, израильский, страна, израил, война, нетаньяху, тель-авив,
время, сша, сирия, египет, случай, самолет, еврейский, военный, ближний,
Murat Apishev great-mel@yandex.ru AIST 2016 32 / 33
Introduction
Parallel implementation
Applications
The RSF project
Conclusions
Conclusions
BigARTM is an open-source library supporting multimodal
ARTM theory.
Fast implementation of the underlying online EM-algorithm
was even more improved. Memory usage was reduced.
Combination of 8 regularizers in the task of ethnic topics
extraction showed the supirity of ARTM approach.
BigARTM is using to process more than 20 collections in
several different projects.
Join our comunity!
Contacts: bigartm.org, great-mel@yandex.ru
Murat Apishev great-mel@yandex.ru AIST 2016 33 / 33

More Related Content

What's hot

VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learningmooopan
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain홍배 김
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Databricks
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Kentaro Minami
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Mostafa G. M. Mostafa
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionActiveEon
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowTatsuya Shirakawa
 
Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmKatsuki Ohto
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksSang Jun Lee
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
Bidirectional graph search techniques for finding shortest path in image base...
Bidirectional graph search techniques for finding shortest path in image base...Bidirectional graph search techniques for finding shortest path in image base...
Bidirectional graph search techniques for finding shortest path in image base...Navin Kumar
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 

What's hot (20)

VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
D143136
D143136D143136
D143136
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow
 
Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithm
 
Lecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural NetworksLecture 6: Convolutional Neural Networks
Lecture 6: Convolutional Neural Networks
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
CSC446: Pattern Recognition (LN8)
CSC446: Pattern Recognition (LN8)CSC446: Pattern Recognition (LN8)
CSC446: Pattern Recognition (LN8)
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
Bidirectional graph search techniques for finding shortest path in image base...
Bidirectional graph search techniques for finding shortest path in image base...Bidirectional graph search techniques for finding shortest path in image base...
Bidirectional graph search techniques for finding shortest path in image base...
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 

Viewers also liked

Viewers also liked (6)

Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Topic model
Topic modelTopic model
Topic model
 
Topic Models, LDA and all that
Topic Models, LDA and all thatTopic Models, LDA and all that
Topic Models, LDA and all that
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 
20151221 public
20151221 public20151221 public
20151221 public
 
Topic Models
Topic ModelsTopic Models
Topic Models
 

Similar to Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling

Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringArthur Mensch
 
cis97003
cis97003cis97003
cis97003perfj
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfPolytechnique Montréal
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...romovpa
 
Research Away Day Jun 2009
Research Away Day Jun 2009Research Away Day Jun 2009
Research Away Day Jun 2009German Terrazas
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networksGuillaume Costeseque
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Johann Petrak
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues listsJames Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
StacksqueueslistsFraboni Ec
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsTony Nguyen
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsHarry Potter
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsYoung Alista
 

Similar to Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling (20)

Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Gk3611601162
Gk3611601162Gk3611601162
Gk3611601162
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
 
autoTVM
autoTVMautoTVM
autoTVM
 
cis97003
cis97003cis97003
cis97003
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Chapter two
Chapter twoChapter two
Chapter two
 
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
 
Research Away Day Jun 2009
Research Away Day Jun 2009Research Away Day Jun 2009
Research Away Day Jun 2009
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networks
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 

More from AIST

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray ImagesAIST
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныAIST
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...AIST
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискAIST
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...AIST
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...AIST
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...AIST
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAAIST
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeAIST
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesAIST
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationAIST
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsAIST
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceAIST
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...AIST
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...AIST
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumAIST
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...AIST
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...AIST
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingAIST
 
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...AIST
 

More from AIST (20)

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBA
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product Categories
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chants
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
 
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling

  • 1. Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling Murat Apishev great-mel@yandex.ru Oleksandr Frei oleksandr.frei@gmail.com HSE, MSU, MIPT April 8, 2016
  • 2. Contents 1 Introduction Topic modeling ARTM BigARTM 2 Parallel implementation Synchronous algorithms Asynchronous algorithms Comparison 3 Applications The RSF project Conclusions
  • 3. Introduction Parallel implementation Applications Topic modeling ARTM BigARTM Topic modeling Topic modeling — an application of machine learning to statistical text analysis. Topic — a specific terminology of the subject area, the set of terms (unigrams or n−grams) frequently appearing together in documents. Topic model uncovers latent semantic structure of a text collection: topic t is a probability distribution p(w|t) over terms w document d is a probability distribution p(t|d) over topics t Applications — information retrieval for long-text queries, classification, categorization, summarization of texts. Murat Apishev great-mel@yandex.ru AIST 2016 3 / 33
  • 4. Introduction Parallel implementation Applications Topic modeling ARTM BigARTM Topic modeling task Given: W — set (vocabulary) of terms (unigrams or n−grams), D — set (collection) of text documents d ⊂ W , ndw — how many times term w appears in document d. Find: model p(w|d) = ∑︀ t∈T 𝜑wt 𝜃td with parameters Φ W×T и Θ T×D : 𝜑wt =p(w|t) — term probabilities w in each topic t, 𝜃td =p(t|d) — topic probabilities t in each document d. Criteria log-likelihood maximization: ∑︁ d∈D ∑︁ w∈d ndw ln ∑︁ t∈T 𝜑wt 𝜃td → max 𝜑,𝜃 ; 𝜑wt 0; ∑︀ w 𝜑wt = 1; 𝜃td 0; ∑︀ t 𝜃td = 1. Issue: the problem of stochastic matrix factorization is ill-posed: ΦΘ = (ΦS)(S−1Θ) = Φ′Θ′. Murat Apishev great-mel@yandex.ru AIST 2016 4 / 33
  • 5. Introduction Parallel implementation Applications Topic modeling ARTM BigARTM PLSA and EM-algorithm Log-likelihood maximization: ∑︁ d∈D ∑︁ w∈W ndw ln ∑︁ t 𝜑wt 𝜃td → max Φ,Θ EM-algorithm: the simple iteration method for the set of equations E-шаг: M-шаг: ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ ptdw = norm t∈T (︀ 𝜑wt 𝜃td )︀ 𝜑wt = norm w∈W (︀ nwt )︀ , nwt = ∑︀ d∈D ndw ptdw 𝜃td = norm t∈T (︀ ntd )︀ , ntd = ∑︀ w∈d ndw ptdw where norm i∈I xi = max{xi ,0}∑︀ j∈I max{xj ,0} Murat Apishev great-mel@yandex.ru AIST 2016 5 / 33
  • 6. Introduction Parallel implementation Applications Topic modeling ARTM BigARTM ARTM and regularized EM-algorithm Log-likelihood maximization with additive regularization criterion R: ∑︁ d∈D ∑︁ w∈W ndw ln ∑︁ t 𝜑wt 𝜃td + R(Φ, Θ) → max Φ,Θ EM-algorithm: the simple iteration method for the set of equations E-шаг: M-шаг: ⎧ ⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎩ ptdw = norm t∈T (︀ 𝜑wt 𝜃td )︀ 𝜑wt = norm w∈W (︁ nwt + 𝜑wt 𝜕R 𝜕𝜑wt )︁ , nwt = ∑︀ d∈D ndw ptdw 𝜃td = norm t∈T (︁ ntd + 𝜃td 𝜕R 𝜕𝜃td )︁ , ntd = ∑︀ w∈d ndw ptdw Murat Apishev great-mel@yandex.ru AIST 2016 6 / 33
  • 7. Introduction Parallel implementation Applications Topic modeling ARTM BigARTM Examples of regularizers Many Bayesian models can be reinterpreted as regularizers in ARTM. Some examples of regularizes: 1 Smoothing Φ / Θ (leads to popular LDA model) 2 Sparsing Φ / Θ 3 Decorrelation of topics in Φ 4 Semi-supervised learning 5 Topic coherence maximization 6 Topic selection 7 . . . Murat Apishev great-mel@yandex.ru AIST 2016 7 / 33
  • 8. Introduction Parallel implementation Applications Topic modeling ARTM BigARTM Multimodal Topic Model Multimodal Topic Model finds topical distributions for terms p(w|t), authors p(a|t), time p(y|t), objects of images p(o|t), linked documents p(d′|t), advertising banners p(b|t), users p(u|t), and binds all these modalities into a single topic model. Topics of documents Words and keyphrases of topics doc1: doc2: doc3: doc4: ... Text documents Topic Modeling D o c u m e n t s T o p i c s Metadata: Authors Data Time Conference Organization URL etc. Ads Images Links Users Murat Apishev great-mel@yandex.ru AIST 2016 8 / 33
  • 9. Introduction Parallel implementation Applications Topic modeling ARTM BigARTM M-ARTM and multimodal regularized EM-algorithm W m is a vocabulary of terms of m-th modality, m ∈ M, W = W 1 ⊔ W m as a joint vocabulary of all modalities Multimodal log-likelihood maximization with additive regularization criterion R: ∑︁ m∈M 𝜆m ∑︁ d∈D ∑︁ w∈W m ndw ln ∑︁ t 𝜑wt 𝜃td + R(Φ, Θ) → max Φ,Θ EM-algorithm: the simple iteration method for the set of equations E-шаг: M-шаг: ⎧ ⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎩ ptdw = norm t∈T (︀ 𝜑wt 𝜃td )︀ 𝜑wt = norm w∈W m (︁ nwt + 𝜑wt 𝜕R 𝜕𝜑wt )︁ , nwt = ∑︀ d∈D 𝜆m(w)ndw ptdw 𝜃td = norm t∈T (︁ ntd + 𝜃td 𝜕R 𝜕𝜃td )︁ , ntd = ∑︀ w∈d 𝜆m(w)ndw ptdw Murat Apishev great-mel@yandex.ru AIST 2016 9 / 33
  • 10. Introduction Parallel implementation Applications Topic modeling ARTM BigARTM BigARTM project BigARTM features: Fast1 parallel and online processing of Big Data; Multimodal and regularized topic modeling; Built-in library of regularizers and quality measures; BigARTM community: Open-source https://github.com/bigartm Documentation http://bigartm.org BigARTM license and programming environment: Freely available for commercial usage (BSD 3-Clause license) Cross-platform — Windows, Linux, Mac OS X (32 bit, 64 bit) Programming APIs: command line, C++, Python 1 Vorontsov K., Frei O., Apishev M., Romov P., Dudarenko M. BigARTM: Open Source Library for Regularized Multimodal Topic Modeling of Large Collections Analysis of Images, Social Networks and Texts. 2015 Murat Apishev great-mel@yandex.ru AIST 2016 10 / 33
  • 11. Introduction Parallel implementation Applications Topic modeling ARTM BigARTM BigARTM vs. Gensim vs. Vowpal Wabbit LDA 3.7M articles from Wikipedia, 100K unique words Framework procs train inference perplexity BigARTM 1 35 min 72 sec 4000 LdaModel 1 369 min 395 sec 4161 VW.LDA 1 73 min 120 sec 4108 BigARTM 4 9 min 20 sec 4061 LdaMulticore 4 60 min 222 sec 4111 BigARTM 8 4.5 min 14 sec 4304 LdaMulticore 8 57 min 224 sec 4455 procs = number of parallel threads inference = time to infer 𝜃d for 100K held-out documents perplexity P is calculated on held-out documents P(D) = exp (︂ − 1 n ∑︁ d∈D ∑︁ w∈d ndw ln ∑︁ t∈T 𝜑wt 𝜃td )︂ , n = ∑︁ d nd . Murat Apishev great-mel@yandex.ru AIST 2016 11 / 33
  • 12. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison Offline algorithm The collection is split into batches. Offline algorithm performs scans over the collection. Each thread process one batch at a time, inferring nwt and 𝜃td (using Θ regularization). After each scan algorithm recalculates Φ matrix and apply Φ regularizers according to the equation 𝜑wt = norm w∈W (︁ nwt + 𝜑wt 𝜕R 𝜕𝜑wt )︁ . The implementation never stores the entire Θ matrix at any given time. Murat Apishev great-mel@yandex.ru AIST 2016 12 / 33
  • 13. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison Offline algorithm: Gantt chart 0s 4s 8s 12s 16s 20s 24s 28s 32s 36s Main Proc-1 Proc-2 Proc-3 Proc-4 Proc-5 Proc-6 Batch processing Norm This and further Gantt charts were created using the NYTimes dataset: https://archive.ics.uci.edu/ml/datasets/Bag+of+Words. Size of dataset is ≈ 300k documents, but each algorithm was run on some subset (from 70% to 100%) to archive the ≈ 36 sec. working time. Murat Apishev great-mel@yandex.ru AIST 2016 13 / 33
  • 14. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison Online algorithm The algorithm is a generalization of Online variational Bayes algorithm for LDA model. Online ARTM improves the convergence rate of the Offline ARTM by re-calculating matrix Φ after every 𝜂 batches. Better suited for large and heterogeneous text collections. Weighted sum of nwt from previous and current 𝜂 batches to control the importance of new information. Issue: all threads has no useful work to do during the update of Φ matrix. Murat Apishev great-mel@yandex.ru AIST 2016 14 / 33
  • 15. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison Online algorithm: Gantt chart 0 s. 4 s. 8 s. 12 s. 16 s. 20 s. 24 s. 28 s. 32 s. 36 s. Main Proc-1 Proc-2 Proc-3 Proc-4 Proc-5 Proc-6 Odd batch Even batch Norm Merge Murat Apishev great-mel@yandex.ru AIST 2016 15 / 33
  • 16. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison Async: Asynchronous online algorithm Processor threads: ProcessBatch(Db,¡wt)Db ñwt Merger thread: Accumulate ñwt Recalculate ¡wt Queue {Db} Queue {ñwt} ¡wt Sync() Db Faster asynchronous implementation (it was compared with Gensim and VW LDA) Issue: Merger and DataLoader can become a bottleneck. Issue: the result of such algorithm is non-deterministic. Murat Apishev great-mel@yandex.ru AIST 2016 16 / 33
  • 17. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison Async: Gantt chart in normal case 0s 4s 8s 12s 16s 20s 24s 28s 32s 36s Merger Proc-1 Proc-2 Proc-3 Proc-4 Proc-5 Proc-6 Odd batch Even batch Norm Merge matrix Merge increments Murat Apishev great-mel@yandex.ru AIST 2016 17 / 33
  • 18. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison Async: Gantt chart in bad case 0s 4s 8s 12s 16s 20s 24s 28s 32s 36s Merger Proc-1 Proc-2 Proc-3 Proc-4 Proc-5 Proc-6 Odd batch Even batch Norm Merge matrix Merge increments Murat Apishev great-mel@yandex.ru AIST 2016 18 / 33
  • 19. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison DetAsync: Deterministic asynchronous online algorithm To avoid the indeterministic behavior lets replace the update after first 𝜂 batches with update after given 𝜂 batches. Remove Merger and DataLoader threads. Each Processor thread reads batches and writes results into nwt matrix by itself. Processor threads get a set of batches to process, start processing and immediately return a future object to main thread. The main thread can process the updates of Φ matrix while Processor threads work, and then get the result by passing received future object to Await function. Murat Apishev great-mel@yandex.ru AIST 2016 19 / 33
  • 20. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison DetAsync: schema MasterModel Processor threads: Db = LoadBatch(b) ProcessBatch(Db,¡wt) Main thread: Recalculate ¡wt nwt¡wt Transform({Db}) FitOffline({Db}) FitOnline({Db}) Murat Apishev great-mel@yandex.ru AIST 2016 20 / 33
  • 21. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison DetAsync: Gantt chart 0s 4s 8s 12s 16s 20s 24s 28s 32s 36s Main Proc-1 Proc-2 Proc-3 Proc-4 Proc-5 Proc-6 Odd batch Even batch Norm Merge Murat Apishev great-mel@yandex.ru AIST 2016 21 / 33
  • 22. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison Experiments Datasets: Wikipedia (|D| = 3.7M articles, |W | = 100K words), Pubmed (|D| = 8.2M abstracts, |W | = 141K words). Node: Intel Xeon CPU E5-2650 v2 system with 2 processors, 16 physical cores in total (32 with hyper-threading). Metric: perplexity P value achieved in the allotted time. Time: each algorithm was time-boxed to run for a 30 minutes. Peak memory usage (Gb): |T| Offline Online DetAsync Async (v0.6) Pubmed 1000 5.17 4.68 8.18 13.4 Pubmed 100 1.86 1.62 2.17 3.71 Wiki 1000 1.74 2.44 3.93 7.9 Wiki 100 0.54 0.53 0.83 1.28 Murat Apishev great-mel@yandex.ru AIST 2016 22 / 33
  • 23. Introduction Parallel implementation Applications Synchronous algorithms Asynchronous algorithms Comparison Reached perplexity value 0 5 10 15 20 25 30 2,000 2,200 2,400 Time (min) Perplexity Offline Online Async DetAsync 10 15 20 25 30 35 3,800 4,000 4,200 4,400 4,600 4,800 5,000 Time (min) Perplexity Offline Online Async DetAsync Wikipedia (left), Pubmed (right). DetAsync achives best perplexity in given time-box. Murat Apishev great-mel@yandex.ru AIST 2016 23 / 33
  • 24. Introduction Parallel implementation Applications The RSF project Conclusions Mining ethnic-related content from blogosphere Development of concept and methodology for multi-level monitoring of the state of inter-ethnic relations with the data from social media. The objectives of Topic Modeling in this project: 1 Identify ethnic topics in social media big data 2 Identify event and permanent ethnic topics 3 Identify spatio-temporal patterns of the ethnic discourse 4 Estimate the sentiment of the ethnic discourse 5 Develop the monitoring system of inter-ethnic discourse The Russian Science Foundation grant 15-18-00091 (2015–2017) (Higher School of Economics, St. Petersburg School of Social Sciences and Humanities, Internet Studies Laboratory LINIS) Murat Apishev great-mel@yandex.ru AIST 2016 24 / 33
  • 25. Introduction Parallel implementation Applications The RSF project Conclusions Example ethnonyms for semi-supervised topic modeling османский русич восточноевропейский сингапурец эвенк перуанский швейцарская словенский аланский вепсский саамский ниггер латыш адыги литовец сомалиец цыганка абхаз ханты-мансийский темнокожий карачаевский нигериец кубинка лягушатник гагаузский камбоджиец Murat Apishev great-mel@yandex.ru AIST 2016 25 / 33
  • 26. Introduction Parallel implementation Applications The RSF project Conclusions Regularization for finding ethnic topics smoothing ethnonyms in ethnic topics sparsing ethnonyms in background topics Murat Apishev great-mel@yandex.ru AIST 2016 26 / 33
  • 27. Introduction Parallel implementation Applications The RSF project Conclusions Regularization for finding ethnic topics smoothing ethnonyms in ethnic topics sparsing ethnonyms in background topics smoothing non-ethnonyms for background topics Murat Apishev great-mel@yandex.ru AIST 2016 27 / 33
  • 28. Introduction Parallel implementation Applications The RSF project Conclusions Regularization for finding ethnic topics smoothing ethnonyms in ethnic topics sparsing ethnonyms in background topics smoothing non-ethnonyms in background topics decorrelating ethnic topics Murat Apishev great-mel@yandex.ru AIST 2016 28 / 33
  • 29. Introduction Parallel implementation Applications The RSF project Conclusions Regularization for finding ethnic topics smoothing ethnonyms in ethnic topics sparsing ethnonyms in background topics smoothing non-ethnonyms in background topics decorrelating ethnic topics adding ethnonyms modality and decorrelating their topics Murat Apishev great-mel@yandex.ru AIST 2016 29 / 33
  • 30. Introduction Parallel implementation Applications The RSF project Conclusions Experiment LiveJournal collection: 1.58M of documents 860K of words in the raw vocabulary after lemmatization 90K of words after filtering out short words with length 2, rare words with nw < 20 including: non-Russian words 250 ethnonyms Murat Apishev great-mel@yandex.ru AIST 2016 30 / 33
  • 31. Introduction Parallel implementation Applications The RSF project Conclusions Semi-supervised ARTM for ethnic topic modeling The number of ethnic topics found by the model: model ethnic |S| background |B| ++ +− −+ coh20 2 tfidf20 PLSA 400 12 15 17 -1447 -1012 LDA 400 12 15 17 -1540 -1121 ARTM-4 250 150 21 27 20 -1651 -1296 ARTM-5 250 150 38 42 30 -1342 -908 ARTM-4: ethnic topics: sparsing and decorrelating, ethnonyms smoothing background topics: smoothing, ethnonyms sparsing ARTM-5: ARTM-4 + ethnonyms as additional modality 2 Coherence and TF-IDF coherence are metrics that match the human judgment of topic quality. The topic is better if it has higher coherence value. Murat Apishev great-mel@yandex.ru AIST 2016 31 / 33
  • 32. Introduction Parallel implementation Applications The RSF project Conclusions Ethnic topics examples (русские): русский, князь, россия, татарин, великий, царить, царь, иван, император, империя, грозить, государь, век, московская, екатерина, москва, (русские): акция, организация, митинг, движение, активный, мероприятие, совет, русский, участник, москва, оппозиция, россия, пикет, протест, проведение, националист, поддержка, общественный, проводить, участие, (славяне, византийцы): славянский, святослав, жрец, древние, письменность, рюрик, летопись, византия, мефодий, хазарский, русский, азбука, (сирийцы): сирийский, асад, боевик, район, террорист, уничтожать, группировка, дамаск, оружие, алесио, оппозиция, операция, селение, сша, нусра, турция, (турки): турция, турецкий, курдский, эрдоган, стамбул, страна, кавказ, горин, полиция, премьер-министр, регион, курдистан, ататюрк, партия, (иранцы): иран, иранский, сша, россия, ядерный, президент, тегеран, сирия, оон, израиль, переговоры, обама, санкция, исламский, (палестинцы): террорист, израиль, терять, палестинский, палестинец, террористический, палестина, взрыв, территория, страна, государство, безопасность, арабский, организация, иерусалим, военный, полиция, газ, (ливанцы): ливанский, боевик, район, ливан, армия, террорист, али, военный, хизбалла, раненый, уничтожать, сирия, подразделение, квартал, армейский, (ливийцы): ливан, демократия, страна, ливийский, каддафи, государство, алжир, война, правительство, сша, арабский, али, муаммар, сирия, (евреи): израиль, израильский, страна, израил, война, нетаньяху, тель-авив, время, сша, сирия, египет, случай, самолет, еврейский, военный, ближний, Murat Apishev great-mel@yandex.ru AIST 2016 32 / 33
  • 33. Introduction Parallel implementation Applications The RSF project Conclusions Conclusions BigARTM is an open-source library supporting multimodal ARTM theory. Fast implementation of the underlying online EM-algorithm was even more improved. Memory usage was reduced. Combination of 8 regularizers in the task of ethnic topics extraction showed the supirity of ARTM approach. BigARTM is using to process more than 20 collections in several different projects. Join our comunity! Contacts: bigartm.org, great-mel@yandex.ru Murat Apishev great-mel@yandex.ru AIST 2016 33 / 33