Topic Modeling via Tensor Factorization - Use Case for Apache REEF

Cloud and Information Services Lab

Furong Huang
UC Irvine
Anima Anandkumar
UC Irvine
Nikos Karampatziakis
Microsoft CISL
Paul Mineiro + 𝜀
Microsoft CISL
Sergiy Matusevych
Microsoft CISL
Shravan Narayanamurthy
Microsoft CISL
Markus Weimer
Microsoft CISL
Apache REEF Contributors
Worldwide

/pos/cv107_24319.txt
is evil dead ii a bad movie ?
it's full of terrible acting ,
pointless violence , and plot
holes yet it remains a cult
classic nearly fifteen years
after its release ...
/pos/cv108_15571.txt
it's rather strange too have
two computer animated talking
ant movies come out in a single
year , but that is what disney
and pixar animation ; s latest
film represents ...
http://www.cs.cornell.edu/People/pabo/movie-review-data

LDAvis library for R https://github.com/cpsievert/LDAvis

𝑀2 ≝ 𝔼 𝑥1⨂𝑥2𝑀1 ≝ 𝔼 𝑥1

𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3

𝑀2 ≝ 𝔼 𝑥1⨂𝑥2
𝑀1 ≝ 𝔼 𝑥1
𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
−
𝛼0
𝛼0 + 1
𝑀1⨂𝑀1
−[… more shift terms]

𝑀2 =
𝑖=1
𝑘
𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖
𝑀3 =
𝑖=1
𝑘
𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖⨂𝛽𝑖

𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
𝑀3 𝜆1 𝑎1⨂𝑏1⨂𝑐1
= 𝜆1
𝜆2 𝑎2⨂𝑏2⨂𝑐2
+ 𝜆2 + 𝜆3 ⋯
=
𝑖
𝜆𝑖 ∙ 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖

𝜆, 𝐴 ← argmin
𝜆∈ℝ 𝑘
𝐴∈ℝ 𝑘×𝑘
𝐴 ⋅ Diag 𝜆 ⋅ 𝐶⨀𝐵 ⊤
− 𝑀3
2

http://reef.incubator.apache.org

Storage
(Focus: HDFS)
HDFS ...
Azure
Block
Storage
... Office 365
SQL / HIVE /
LINQ
Cloud
Numerics
Pregel
GraphLab
Programming Models
(Domain Specific Languages)
DatalabApplications
Machine
Learning
BI
Power*
Resource Manager
(Focus: YARN)
YARN ... Mesos ...
Azure Tasks
Drawbridge
REEF
The Application Server for Big Data
Communications, Storage, Fault
Management, Interoperability
Operator Layer
(Future Work) REEF Operator API and Library
REEF Logical Abstraction

Easy to reason about
Centralized control flow
• Evaluator allocation and configuration
• Task configuration and submission
Centralized error handling
• Task exceptions thrown to the Driver
• Evaluator failures reported to the Driver
Scalable
Event-based programming
• Driver sends requests as events to REEF
• REEF sends events to the Driver
Mostly stateless design
• REEF maintains minimal state
• Majority of state keeping (e.g. work queues)
is maintained by the Driver

// Submit task to the newly created context
public class ContextActiveHandler implements EventHandler<ActiveContext> {
@Override
public void onNext(final ActiveContext context) {
taskGroups.submitNext(context);
}
}
// Submit next task to current context
public class TaskCompletedHandler implements EventHandler<CompletedTask> {
@Override
public void onNext(final CompletedTask task) {
final ActiveContext context = task.getActiveContext();
taskGroups.submitNext(context);
}
}

@Inject
public WhitenTask(
final @Parameter(TaskConfigurationOptions.Identifier.class) String taskId,
final @Parameter(Launch.DimD.class) int dimD,
final @Parameter(Launch.DimK.class) int dimK,
final GroupCommClient groupCommClient,
final InputData data,
final TaskEnvironment env) {
// ...
}
“ ”Use Java “type system” to validate the configuration

// We can send and receive any Java serializable data, e.g. JBLAS matrices
private final Broadcast.Sender<DoubleMatrix> modelSender;
private final Broadcast.Receiver<DoubleMatrix[]> resultReceiver;
// Broadcast the model, collect the results, repeat.
do {
this.modelSender.send(sliceA);
// ...
final DoubleMatrix[] result = this.resultReceiver.reduce();
} while (notConverged(sliceA, prevSliceA));

https://github.com/Microsoft-CISL/TensorFactorization
http://reef.incubator.apache.org

𝑀2 =
𝑖
𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖
𝑀2 𝜆1 ∙ 𝑢1⨂𝑣1
= 𝜆1
𝜆2 ∙ 𝑢2⨂𝑣2
+ 𝜆2 + 𝜆3 ⋯

𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
𝑀3 𝜆1 𝑢1⨂𝑣1⨂𝑤1
= 𝜆1
𝜆2 𝑢2⨂𝑣2⨂𝑤2
+ 𝜆2 + 𝜆3 ⋯
=
𝑖
𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖⨂𝑤𝑖

• Find whitening matrix s.t. orthogonal
• Use to find s.t.
• Whiten :

Topic Modeling via Tensor Factorization - Use Case for Apache REEF

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Topic Modeling via Tensor Factorization - Use Case for Apache REEF

Similar to Topic Modeling via Tensor Factorization - Use Case for Apache REEF (20)

Recently uploaded

Recently uploaded (20)

Topic Modeling via Tensor Factorization - Use Case for Apache REEF

Editor's Notes