SlideShare a Scribd company logo
1 of 39
Cloud and Information Services Lab
Furong Huang
UC Irvine
Anima Anandkumar
UC Irvine
Nikos Karampatziakis
Microsoft CISL
Paul Mineiro + 𝜀
Microsoft CISL
Sergiy Matusevych
Microsoft CISL
Shravan Narayanamurthy
Microsoft CISL
Markus Weimer
Microsoft CISL
Apache REEF Contributors
Worldwide
/pos/cv107_24319.txt
is evil dead ii a bad movie ?
it's full of terrible acting ,
pointless violence , and plot
holes yet it remains a cult
classic nearly fifteen years
after its release ...
/pos/cv108_15571.txt
it's rather strange too have
two computer animated talking
ant movies come out in a single
year , but that is what disney
and pixar animation ; s latest
film represents ...
http://www.cs.cornell.edu/People/pabo/movie-review-data
LDAvis library for R https://github.com/cpsievert/LDAvis
=*
𝑀2 ≝ 𝔼 𝑥1⨂𝑥2𝑀1 ≝ 𝔼 𝑥1
𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
𝑀2 ≝ 𝔼 𝑥1⨂𝑥2
𝑀1 ≝ 𝔼 𝑥1
𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
−
𝛼0
𝛼0 + 1
𝑀1⨂𝑀1
−[… more shift terms]
𝑀2 =
𝑖=1
𝑘
𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖
𝑀3 =
𝑖=1
𝑘
𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖⨂𝛽𝑖
𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
𝑀3 𝜆1 𝑎1⨂𝑏1⨂𝑐1
= 𝜆1
𝜆2 𝑎2⨂𝑏2⨂𝑐2
+ 𝜆2 + 𝜆3 ⋯
=
𝑖
𝜆𝑖 ∙ 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖
𝜆, 𝐴 ← argmin
𝜆∈ℝ 𝑘
𝐴∈ℝ 𝑘×𝑘
𝐴 ⋅ Diag 𝜆 ⋅ 𝐶⨀𝐵 ⊤
− 𝑀3
2
http://reef.incubator.apache.org
Storage
(Focus: HDFS)
HDFS ...
Azure
Block
Storage
... Office 365
SQL / HIVE /
LINQ
Cloud
Numerics
Pregel
GraphLab
Programming Models
(Domain Specific Languages)
DatalabApplications
Machine
Learning
BI
Power*
Resource Manager
(Focus: YARN)
YARN ... Mesos ...
Azure Tasks
Drawbridge
REEF
The Application Server for Big Data
Communications, Storage, Fault
Management, Interoperability
Operator Layer
(Future Work) REEF Operator API and Library
REEF Logical Abstraction
Container
+
∑⊕
⊗ ⊗
⊗
Easy to reason about
Centralized control flow
• Evaluator allocation and configuration
• Task configuration and submission
Centralized error handling
• Task exceptions thrown to the Driver
• Evaluator failures reported to the Driver
Scalable
Event-based programming
• Driver sends requests as events to REEF
• REEF sends events to the Driver
Mostly stateless design
• REEF maintains minimal state
• Majority of state keeping (e.g. work queues)
is maintained by the Driver
// Submit task to the newly created context
public class ContextActiveHandler implements EventHandler<ActiveContext> {
@Override
public void onNext(final ActiveContext context) {
taskGroups.submitNext(context);
}
}
// Submit next task to current context
public class TaskCompletedHandler implements EventHandler<CompletedTask> {
@Override
public void onNext(final CompletedTask task) {
final ActiveContext context = task.getActiveContext();
taskGroups.submitNext(context);
}
}
@Inject
public WhitenTask(
final @Parameter(TaskConfigurationOptions.Identifier.class) String taskId,
final @Parameter(Launch.DimD.class) int dimD,
final @Parameter(Launch.DimK.class) int dimK,
final GroupCommClient groupCommClient,
final InputData data,
final TaskEnvironment env) {
// ...
}
“ ”Use Java “type system” to validate the configuration
// We can send and receive any Java serializable data, e.g. JBLAS matrices
private final Broadcast.Sender<DoubleMatrix> modelSender;
private final Broadcast.Receiver<DoubleMatrix[]> resultReceiver;
// Broadcast the model, collect the results, repeat.
do {
this.modelSender.send(sliceA);
// ...
final DoubleMatrix[] result = this.resultReceiver.reduce();
} while (notConverged(sliceA, prevSliceA));
https://github.com/Microsoft-CISL/TensorFactorization
http://reef.incubator.apache.org
motus@apache.org
𝑀2 =
𝑖
𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖
𝑀2 𝜆1 ∙ 𝑢1⨂𝑣1
= 𝜆1
𝜆2 ∙ 𝑢2⨂𝑣2
+ 𝜆2 + 𝜆3 ⋯
𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
𝑀3 𝜆1 𝑢1⨂𝑣1⨂𝑤1
= 𝜆1
𝜆2 𝑢2⨂𝑣2⨂𝑤2
+ 𝜆2 + 𝜆3 ⋯
=
𝑖
𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖⨂𝑤𝑖
𝐼
𝑎1
𝑎1
• Find whitening matrix s.t. orthogonal
• Use to find s.t.
• Whiten :

More Related Content

What's hot

Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka StreamsFresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Konrad Malawski
 
Concurrency in Scala - the Akka way
Concurrency in Scala - the Akka wayConcurrency in Scala - the Akka way
Concurrency in Scala - the Akka way
Yardena Meymann
 

What's hot (20)

Curator intro
Curator introCurator intro
Curator intro
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Building reactive distributed systems with Akka
Building reactive distributed systems with Akka Building reactive distributed systems with Akka
Building reactive distributed systems with Akka
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Asynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAsynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbs
 
Above the clouds: introducing Akka
Above the clouds: introducing AkkaAbove the clouds: introducing Akka
Above the clouds: introducing Akka
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka StreamsFresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
 
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Concurrency in Scala - the Akka way
Concurrency in Scala - the Akka wayConcurrency in Scala - the Akka way
Concurrency in Scala - the Akka way
 

Viewers also liked

Viewers also liked (6)

Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
 
Apache REEF - stdlib for big data
Apache REEF - stdlib for big dataApache REEF - stdlib for big data
Apache REEF - stdlib for big data
 
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillBuilding large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
 

Similar to Topic Modeling via Tensor Factorization - Use Case for Apache REEF

Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
OdessaJS Conf
 
containerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brusselscontainerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brussels
Daniel Nüst
 
cover-letter-2016-base+hist
cover-letter-2016-base+histcover-letter-2016-base+hist
cover-letter-2016-base+hist
Rich Andrews
 

Similar to Topic Modeling via Tensor Factorization - Use Case for Apache REEF (20)

How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
 
UniRx - Reactive Extensions for Unity(EN)
UniRx - Reactive Extensions for Unity(EN)UniRx - Reactive Extensions for Unity(EN)
UniRx - Reactive Extensions for Unity(EN)
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
 
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
 
Afanasov14flynet slides
Afanasov14flynet slidesAfanasov14flynet slides
Afanasov14flynet slides
 
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
 
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
 
Spring Cloud Data Flow Overview
Spring Cloud Data Flow OverviewSpring Cloud Data Flow Overview
Spring Cloud Data Flow Overview
 
Evolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsEvolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deployments
 
containerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brusselscontainerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brussels
 
cover-letter-2016-base+hist
cover-letter-2016-base+histcover-letter-2016-base+hist
cover-letter-2016-base+hist
 
Programming the world with Docker
Programming the world with DockerProgramming the world with Docker
Programming the world with Docker
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing framework
 
Continous delivery at docker age
Continous delivery at docker ageContinous delivery at docker age
Continous delivery at docker age
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
 
seven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdfseven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdf
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing
 
UniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtimeUniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtime
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 

Topic Modeling via Tensor Factorization - Use Case for Apache REEF

  • 1. Cloud and Information Services Lab
  • 2. Furong Huang UC Irvine Anima Anandkumar UC Irvine Nikos Karampatziakis Microsoft CISL Paul Mineiro + 𝜀 Microsoft CISL Sergiy Matusevych Microsoft CISL Shravan Narayanamurthy Microsoft CISL Markus Weimer Microsoft CISL Apache REEF Contributors Worldwide
  • 3.
  • 4.
  • 5. /pos/cv107_24319.txt is evil dead ii a bad movie ? it's full of terrible acting , pointless violence , and plot holes yet it remains a cult classic nearly fifteen years after its release ... /pos/cv108_15571.txt it's rather strange too have two computer animated talking ant movies come out in a single year , but that is what disney and pixar animation ; s latest film represents ... http://www.cs.cornell.edu/People/pabo/movie-review-data
  • 6. LDAvis library for R https://github.com/cpsievert/LDAvis
  • 7. =*
  • 8.
  • 9.
  • 10. 𝑀2 ≝ 𝔼 𝑥1⨂𝑥2𝑀1 ≝ 𝔼 𝑥1
  • 11. 𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
  • 12. 𝑀2 ≝ 𝔼 𝑥1⨂𝑥2 𝑀1 ≝ 𝔼 𝑥1 𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3 − 𝛼0 𝛼0 + 1 𝑀1⨂𝑀1 −[… more shift terms]
  • 13. 𝑀2 = 𝑖=1 𝑘 𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖 𝑀3 = 𝑖=1 𝑘 𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖⨂𝛽𝑖
  • 14. 𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3 𝑀3 𝜆1 𝑎1⨂𝑏1⨂𝑐1 = 𝜆1 𝜆2 𝑎2⨂𝑏2⨂𝑐2 + 𝜆2 + 𝜆3 ⋯ = 𝑖 𝜆𝑖 ∙ 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖
  • 15. 𝜆, 𝐴 ← argmin 𝜆∈ℝ 𝑘 𝐴∈ℝ 𝑘×𝑘 𝐴 ⋅ Diag 𝜆 ⋅ 𝐶⨀𝐵 ⊤ − 𝑀3 2
  • 16.
  • 17.
  • 19. Storage (Focus: HDFS) HDFS ... Azure Block Storage ... Office 365 SQL / HIVE / LINQ Cloud Numerics Pregel GraphLab Programming Models (Domain Specific Languages) DatalabApplications Machine Learning BI Power* Resource Manager (Focus: YARN) YARN ... Mesos ... Azure Tasks Drawbridge REEF The Application Server for Big Data Communications, Storage, Fault Management, Interoperability Operator Layer (Future Work) REEF Operator API and Library REEF Logical Abstraction
  • 22. Easy to reason about Centralized control flow • Evaluator allocation and configuration • Task configuration and submission Centralized error handling • Task exceptions thrown to the Driver • Evaluator failures reported to the Driver Scalable Event-based programming • Driver sends requests as events to REEF • REEF sends events to the Driver Mostly stateless design • REEF maintains minimal state • Majority of state keeping (e.g. work queues) is maintained by the Driver
  • 23. // Submit task to the newly created context public class ContextActiveHandler implements EventHandler<ActiveContext> { @Override public void onNext(final ActiveContext context) { taskGroups.submitNext(context); } } // Submit next task to current context public class TaskCompletedHandler implements EventHandler<CompletedTask> { @Override public void onNext(final CompletedTask task) { final ActiveContext context = task.getActiveContext(); taskGroups.submitNext(context); } }
  • 24.
  • 25. @Inject public WhitenTask( final @Parameter(TaskConfigurationOptions.Identifier.class) String taskId, final @Parameter(Launch.DimD.class) int dimD, final @Parameter(Launch.DimK.class) int dimK, final GroupCommClient groupCommClient, final InputData data, final TaskEnvironment env) { // ... } “ ”Use Java “type system” to validate the configuration
  • 26.
  • 27.
  • 28.
  • 29. // We can send and receive any Java serializable data, e.g. JBLAS matrices private final Broadcast.Sender<DoubleMatrix> modelSender; private final Broadcast.Receiver<DoubleMatrix[]> resultReceiver; // Broadcast the model, collect the results, repeat. do { this.modelSender.send(sliceA); // ... final DoubleMatrix[] result = this.resultReceiver.reduce(); } while (notConverged(sliceA, prevSliceA));
  • 30.
  • 33.
  • 34.
  • 35. 𝑀2 = 𝑖 𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖 𝑀2 𝜆1 ∙ 𝑢1⨂𝑣1 = 𝜆1 𝜆2 ∙ 𝑢2⨂𝑣2 + 𝜆2 + 𝜆3 ⋯
  • 36.
  • 37. 𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3 𝑀3 𝜆1 𝑢1⨂𝑣1⨂𝑤1 = 𝜆1 𝜆2 𝑢2⨂𝑣2⨂𝑤2 + 𝜆2 + 𝜆3 ⋯ = 𝑖 𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖⨂𝑤𝑖
  • 39. • Find whitening matrix s.t. orthogonal • Use to find s.t. • Whiten :

Editor's Notes

  1. We are hiring!
  2. What is the problem we are solving, why it’s important, and what are state-of-the-art solutions. New approach and our algorithm etc
  3. In general, given data (e.g. corpus of text, social graph, user pageview/click logs), reveal latent parameters that influence the distribution – communities, user preferences, text topics. We’ll talk about text because it’s easy to demo and reason about even on a small dataset
  4. Top 10 topics. Each document has a mixture of topics; some topics are common, e.g. film/movie/time. Word appear in many topics, e.g. action/crime/cop and action/Jackie Chan. Topics are sparse
  5. Start 3:20
  6. It’s all bag of words to me Nikolai Ge, Portrait of Leo Tolstoy, 1884 Tretyakov gallery, Moscow Writing what I believe
  7. Start 10
  8. Introduced by Karl Pearson in 1894; everything new is well forgotten old; so M1 is a vector, M2 a matrix; M2 is not enough for topics (there is spectral clustering – will talk later if asked). Need to capture triplets – a cube of data…
  9. It was shown that with these shifted terms M1..M3 are sufficient to reveal not only clusters, but mixtures of latent parameters. in fact, if you squint right, M2 is a covariance matrix, and a0 is a Dirichlet hyperprior. Similarly, M3 is skewness (shifted). I will give more details later. So this is information that we collect.. How to get the topics??
  10. 8:25 We can factorize the tensor into a cross product of eigenvectors that reveal the topics. i.e. each vector beta_i contains probabilities of words in topic i.
  11. We can factorize the tensor into a cross product of eigenvectors that reveal the topics. i.e. each vector beta_i contains probabilities of words in topic i.
  12. it’s linear . Need resource manager, e.g. YARN, and distributed FS. . Master node checks for convergence
  13. Markus gave a talk at Hadoop Summit 2014 – see on YouTube
  14. Much nicer in C# REEF itself has very little state; all state is in the driver
  15. 18:00
  16. Centralized error handling: mention Erlang/OTP supervisor architecture
  17. Much nicer in C# REEF itself has very little state; all state is in the driver
  18. Centralized error handling: mention Erlang/OTP supervisor architecture
  19. Java “type system”… Annotate constructor with @Inject, mark leaf parameters with @Parameter, other params must be classes with @Inject
  20. Centralized error handling: mention Erlang/OTP supervisor architecture
  21. Centralized error handling: mention Erlang/OTP supervisor architecture
  22. Centralized error handling: mention Erlang/OTP supervisor architecture
  23. Form a communication tree – nodes pass data along.. On reduce stage we also specify the aggregation operator
  24. Future work: community detection, larger datasets (pubmed), compare with LightLDA; in general: need better support for tensors (libraries, CUDA, parameter server)
  25. Future work: community detection, larger datasets (pubmed), compare with LightLDA; in general: need better support for tensors (libraries, CUDA, parameter server)
  26. Future work: community detection, larger datasets (pubmed), compare with LightLDA; in general: need better support for tensors (libraries, CUDA, parameter server) End: 20 min sharp Total ~24 min with questions
  27. Model (LDA) is independent from inference algorithms (variational Bayes, MCMC, tensors)