SlideShare a Scribd company logo
Cloud and Information Services Lab
Furong Huang
UC Irvine
Anima Anandkumar
UC Irvine
Nikos Karampatziakis
Microsoft CISL
Paul Mineiro + 𝜀
Microsoft CISL
Sergiy Matusevych
Microsoft CISL
Shravan Narayanamurthy
Microsoft CISL
Markus Weimer
Microsoft CISL
Apache REEF Contributors
Worldwide
/pos/cv107_24319.txt
is evil dead ii a bad movie ?
it's full of terrible acting ,
pointless violence , and plot
holes yet it remains a cult
classic nearly fifteen years
after its release ...
/pos/cv108_15571.txt
it's rather strange too have
two computer animated talking
ant movies come out in a single
year , but that is what disney
and pixar animation ; s latest
film represents ...
http://www.cs.cornell.edu/People/pabo/movie-review-data
LDAvis library for R https://github.com/cpsievert/LDAvis
=*
𝑀2 ≝ 𝔼 𝑥1⨂𝑥2𝑀1 ≝ 𝔼 𝑥1
𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
𝑀2 ≝ 𝔼 𝑥1⨂𝑥2
𝑀1 ≝ 𝔼 𝑥1
𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
−
𝛼0
𝛼0 + 1
𝑀1⨂𝑀1
−[… more shift terms]
𝑀2 =
𝑖=1
𝑘
𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖
𝑀3 =
𝑖=1
𝑘
𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖⨂𝛽𝑖
𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
𝑀3 𝜆1 𝑎1⨂𝑏1⨂𝑐1
= 𝜆1
𝜆2 𝑎2⨂𝑏2⨂𝑐2
+ 𝜆2 + 𝜆3 ⋯
=
𝑖
𝜆𝑖 ∙ 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖
𝜆, 𝐴 ← argmin
𝜆∈ℝ 𝑘
𝐴∈ℝ 𝑘×𝑘
𝐴 ⋅ Diag 𝜆 ⋅ 𝐶⨀𝐵 ⊤
− 𝑀3
2
http://reef.incubator.apache.org
Storage
(Focus: HDFS)
HDFS ...
Azure
Block
Storage
... Office 365
SQL / HIVE /
LINQ
Cloud
Numerics
Pregel
GraphLab
Programming Models
(Domain Specific Languages)
DatalabApplications
Machine
Learning
BI
Power*
Resource Manager
(Focus: YARN)
YARN ... Mesos ...
Azure Tasks
Drawbridge
REEF
The Application Server for Big Data
Communications, Storage, Fault
Management, Interoperability
Operator Layer
(Future Work) REEF Operator API and Library
REEF Logical Abstraction
Container
+
∑⊕
⊗ ⊗
⊗
Easy to reason about
Centralized control flow
• Evaluator allocation and configuration
• Task configuration and submission
Centralized error handling
• Task exceptions thrown to the Driver
• Evaluator failures reported to the Driver
Scalable
Event-based programming
• Driver sends requests as events to REEF
• REEF sends events to the Driver
Mostly stateless design
• REEF maintains minimal state
• Majority of state keeping (e.g. work queues)
is maintained by the Driver
// Submit task to the newly created context
public class ContextActiveHandler implements EventHandler<ActiveContext> {
@Override
public void onNext(final ActiveContext context) {
taskGroups.submitNext(context);
}
}
// Submit next task to current context
public class TaskCompletedHandler implements EventHandler<CompletedTask> {
@Override
public void onNext(final CompletedTask task) {
final ActiveContext context = task.getActiveContext();
taskGroups.submitNext(context);
}
}
@Inject
public WhitenTask(
final @Parameter(TaskConfigurationOptions.Identifier.class) String taskId,
final @Parameter(Launch.DimD.class) int dimD,
final @Parameter(Launch.DimK.class) int dimK,
final GroupCommClient groupCommClient,
final InputData data,
final TaskEnvironment env) {
// ...
}
“ ”Use Java “type system” to validate the configuration
// We can send and receive any Java serializable data, e.g. JBLAS matrices
private final Broadcast.Sender<DoubleMatrix> modelSender;
private final Broadcast.Receiver<DoubleMatrix[]> resultReceiver;
// Broadcast the model, collect the results, repeat.
do {
this.modelSender.send(sliceA);
// ...
final DoubleMatrix[] result = this.resultReceiver.reduce();
} while (notConverged(sliceA, prevSliceA));
https://github.com/Microsoft-CISL/TensorFactorization
http://reef.incubator.apache.org
motus@apache.org
𝑀2 =
𝑖
𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖
𝑀2 𝜆1 ∙ 𝑢1⨂𝑣1
= 𝜆1
𝜆2 ∙ 𝑢2⨂𝑣2
+ 𝜆2 + 𝜆3 ⋯
𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
𝑀3 𝜆1 𝑢1⨂𝑣1⨂𝑤1
= 𝜆1
𝜆2 𝑢2⨂𝑣2⨂𝑤2
+ 𝜆2 + 𝜆3 ⋯
=
𝑖
𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖⨂𝑤𝑖
𝐼
𝑎1
𝑎1
• Find whitening matrix s.t. orthogonal
• Use to find s.t.
• Whiten :

More Related Content

What's hot

Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applicationsJoey Echeverria
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesLightbend
 
Building reactive distributed systems with Akka
Building reactive distributed systems with Akka Building reactive distributed systems with Akka
Building reactive distributed systems with Akka Johan Andrén
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformLegacy Typesafe (now Lightbend)
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedMichael Spector
 
Asynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAsynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAnil Gursel
 
Above the clouds: introducing Akka
Above the clouds: introducing AkkaAbove the clouds: introducing Akka
Above the clouds: introducing Akkanartamonov
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTPRoland Kuhn
 
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka StreamsFresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka StreamsKonrad Malawski
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2Gal Marder
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionLightbend
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)Konrad Malawski
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayRoland Kuhn
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Eric Torreborre
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka Dori Waldman
 
Concurrency in Scala - the Akka way
Concurrency in Scala - the Akka wayConcurrency in Scala - the Akka way
Concurrency in Scala - the Akka wayYardena Meymann
 

What's hot (20)

Curator intro
Curator introCurator intro
Curator intro
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Building reactive distributed systems with Akka
Building reactive distributed systems with Akka Building reactive distributed systems with Akka
Building reactive distributed systems with Akka
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Asynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAsynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbs
 
Above the clouds: introducing Akka
Above the clouds: introducing AkkaAbove the clouds: introducing Akka
Above the clouds: introducing Akka
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka StreamsFresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
 
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
[Tokyo Scala User Group] Akka Streams & Reactive Streams (0.7)
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Concurrency in Scala - the Akka way
Concurrency in Scala - the Akka wayConcurrency in Scala - the Akka way
Concurrency in Scala - the Akka way
 

Similar to Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework

How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...Lightbend
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioiguazio
 
UniRx - Reactive Extensions for Unity(EN)
UniRx - Reactive Extensions for Unity(EN)UniRx - Reactive Extensions for Unity(EN)
UniRx - Reactive Extensions for Unity(EN)Yoshifumi Kawai
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Puppet
 
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''OdessaJS Conf
 
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...Timothy Spann
 
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet
 
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개r-kor
 
Spring Cloud Data Flow Overview
Spring Cloud Data Flow OverviewSpring Cloud Data Flow Overview
Spring Cloud Data Flow OverviewVMware Tanzu
 
Evolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsEvolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsRakuten Group, Inc.
 
containerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brusselscontainerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, BrusselsDaniel Nüst
 
cover-letter-2016-base+hist
cover-letter-2016-base+histcover-letter-2016-base+hist
cover-letter-2016-base+histRich Andrews
 
Programming the world with Docker
Programming the world with DockerProgramming the world with Docker
Programming the world with DockerPatrick Chanezon
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkShashank Gautam
 
Continous delivery at docker age
Continous delivery at docker ageContinous delivery at docker age
Continous delivery at docker ageAdrien Blind
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...Amazon Web Services
 
seven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdfseven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdfSergioBruno21
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Ricardo Amaro
 
UniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtimeUniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtimeLee Calcote
 

Similar to Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework (20)

How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
 
UniRx - Reactive Extensions for Unity(EN)
UniRx - Reactive Extensions for Unity(EN)UniRx - Reactive Extensions for Unity(EN)
UniRx - Reactive Extensions for Unity(EN)
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
Alexey Orlenko ''High-performance IPC and RPC for microservices and apps''
 
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
[DoKDayNA2022] - Architecting Your First Event Driven Serverless Streaming Ap...
 
Afanasov14flynet slides
Afanasov14flynet slidesAfanasov14flynet slides
Afanasov14flynet slides
 
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard RomanusPuppet ENC – a ServiceNow Scoped Application; Richard Romanus
Puppet ENC – a ServiceNow Scoped Application; Richard Romanus
 
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
RUCK 2017 R에 날개 달기 - Microsoft R과 클라우드 머신러닝 소개
 
Spring Cloud Data Flow Overview
Spring Cloud Data Flow OverviewSpring Cloud Data Flow Overview
Spring Cloud Data Flow Overview
 
Evolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsEvolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deployments
 
containerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brusselscontainerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brussels
 
cover-letter-2016-base+hist
cover-letter-2016-base+histcover-letter-2016-base+hist
cover-letter-2016-base+hist
 
Programming the world with Docker
Programming the world with DockerProgramming the world with Docker
Programming the world with Docker
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing framework
 
Continous delivery at docker age
Continous delivery at docker ageContinous delivery at docker age
Continous delivery at docker age
 
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
AWS re:Invent 2016: Building a Platform for Collaborative Scientific Research...
 
seven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdfseven-ways-to-run-flink-on-aws.pdf
seven-ways-to-run-flink-on-aws.pdf
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing
 
UniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtimeUniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtime
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsVlad Stirbu
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»QADay
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 

Recently uploaded (20)

Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Topic Modeling via Tensor Factorization Use Case for Apache REEF Framework

  • 1. Cloud and Information Services Lab
  • 2. Furong Huang UC Irvine Anima Anandkumar UC Irvine Nikos Karampatziakis Microsoft CISL Paul Mineiro + 𝜀 Microsoft CISL Sergiy Matusevych Microsoft CISL Shravan Narayanamurthy Microsoft CISL Markus Weimer Microsoft CISL Apache REEF Contributors Worldwide
  • 3.
  • 4.
  • 5. /pos/cv107_24319.txt is evil dead ii a bad movie ? it's full of terrible acting , pointless violence , and plot holes yet it remains a cult classic nearly fifteen years after its release ... /pos/cv108_15571.txt it's rather strange too have two computer animated talking ant movies come out in a single year , but that is what disney and pixar animation ; s latest film represents ... http://www.cs.cornell.edu/People/pabo/movie-review-data
  • 6. LDAvis library for R https://github.com/cpsievert/LDAvis
  • 7. =*
  • 8.
  • 9.
  • 10. 𝑀2 ≝ 𝔼 𝑥1⨂𝑥2𝑀1 ≝ 𝔼 𝑥1
  • 11. 𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3
  • 12. 𝑀2 ≝ 𝔼 𝑥1⨂𝑥2 𝑀1 ≝ 𝔼 𝑥1 𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3 − 𝛼0 𝛼0 + 1 𝑀1⨂𝑀1 −[… more shift terms]
  • 13. 𝑀2 = 𝑖=1 𝑘 𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖 𝑀3 = 𝑖=1 𝑘 𝛼𝑖 ∙ 𝛽𝑖⨂𝛽𝑖⨂𝛽𝑖
  • 14. 𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3 𝑀3 𝜆1 𝑎1⨂𝑏1⨂𝑐1 = 𝜆1 𝜆2 𝑎2⨂𝑏2⨂𝑐2 + 𝜆2 + 𝜆3 ⋯ = 𝑖 𝜆𝑖 ∙ 𝑎𝑖⨂𝑏𝑖⨂𝑐𝑖
  • 15. 𝜆, 𝐴 ← argmin 𝜆∈ℝ 𝑘 𝐴∈ℝ 𝑘×𝑘 𝐴 ⋅ Diag 𝜆 ⋅ 𝐶⨀𝐵 ⊤ − 𝑀3 2
  • 16.
  • 17.
  • 19. Storage (Focus: HDFS) HDFS ... Azure Block Storage ... Office 365 SQL / HIVE / LINQ Cloud Numerics Pregel GraphLab Programming Models (Domain Specific Languages) DatalabApplications Machine Learning BI Power* Resource Manager (Focus: YARN) YARN ... Mesos ... Azure Tasks Drawbridge REEF The Application Server for Big Data Communications, Storage, Fault Management, Interoperability Operator Layer (Future Work) REEF Operator API and Library REEF Logical Abstraction
  • 22. Easy to reason about Centralized control flow • Evaluator allocation and configuration • Task configuration and submission Centralized error handling • Task exceptions thrown to the Driver • Evaluator failures reported to the Driver Scalable Event-based programming • Driver sends requests as events to REEF • REEF sends events to the Driver Mostly stateless design • REEF maintains minimal state • Majority of state keeping (e.g. work queues) is maintained by the Driver
  • 23. // Submit task to the newly created context public class ContextActiveHandler implements EventHandler<ActiveContext> { @Override public void onNext(final ActiveContext context) { taskGroups.submitNext(context); } } // Submit next task to current context public class TaskCompletedHandler implements EventHandler<CompletedTask> { @Override public void onNext(final CompletedTask task) { final ActiveContext context = task.getActiveContext(); taskGroups.submitNext(context); } }
  • 24.
  • 25. @Inject public WhitenTask( final @Parameter(TaskConfigurationOptions.Identifier.class) String taskId, final @Parameter(Launch.DimD.class) int dimD, final @Parameter(Launch.DimK.class) int dimK, final GroupCommClient groupCommClient, final InputData data, final TaskEnvironment env) { // ... } “ ”Use Java “type system” to validate the configuration
  • 26.
  • 27.
  • 28.
  • 29. // We can send and receive any Java serializable data, e.g. JBLAS matrices private final Broadcast.Sender<DoubleMatrix> modelSender; private final Broadcast.Receiver<DoubleMatrix[]> resultReceiver; // Broadcast the model, collect the results, repeat. do { this.modelSender.send(sliceA); // ... final DoubleMatrix[] result = this.resultReceiver.reduce(); } while (notConverged(sliceA, prevSliceA));
  • 30.
  • 33.
  • 34.
  • 35. 𝑀2 = 𝑖 𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖 𝑀2 𝜆1 ∙ 𝑢1⨂𝑣1 = 𝜆1 𝜆2 ∙ 𝑢2⨂𝑣2 + 𝜆2 + 𝜆3 ⋯
  • 36.
  • 37. 𝑀3 ≝ 𝔼 𝑥1⨂𝑥2⨂𝑥3 𝑀3 𝜆1 𝑢1⨂𝑣1⨂𝑤1 = 𝜆1 𝜆2 𝑢2⨂𝑣2⨂𝑤2 + 𝜆2 + 𝜆3 ⋯ = 𝑖 𝜆𝑖 ∙ 𝑢𝑖⨂𝑣𝑖⨂𝑤𝑖
  • 39. • Find whitening matrix s.t. orthogonal • Use to find s.t. • Whiten :

Editor's Notes

  1. We are hiring!
  2. What is the problem we are solving, why it’s important, and what are state-of-the-art solutions. New approach and our algorithm etc
  3. In general, given data (e.g. corpus of text, social graph, user pageview/click logs), reveal latent parameters that influence the distribution – communities, user preferences, text topics. We’ll talk about text because it’s easy to demo and reason about even on a small dataset
  4. Top 10 topics. Each document has a mixture of topics; some topics are common, e.g. film/movie/time. Word appear in many topics, e.g. action/crime/cop and action/Jackie Chan. Topics are sparse
  5. Start 3:20
  6. It’s all bag of words to me Nikolai Ge, Portrait of Leo Tolstoy, 1884 Tretyakov gallery, Moscow Writing what I believe
  7. Start 4:55
  8. Introduced by Karl Pearson in 1894; everything new is well forgotten old; so M1 is a vector, M2 a matrix; M2 is not enough for topics (there is spectral clustering – will talk later if asked). Need to capture triplets – a cube of data…
  9. It was shown that with these shifted terms M1..M3 are sufficient to reveal not only clusters, but mixtures of latent parameters. in fact, if you squint right, M2 is a covariance matrix, and a0 is a Dirichlet hyperprior. Similarly, M3 is skewness (shifted). I will give more details later. So this is information that we collect.. How to get the topics??
  10. 8:25 We can factorize the tensor into a cross product of eigenvectors that reveal the topics. i.e. each vector beta_i contains probabilities of words in topic i.
  11. We can factorize the tensor into a cross product of eigenvectors that reveal the topics. i.e. each vector beta_i contains probabilities of words in topic i.
  12. it’s linear . Need resource manager, e.g. YARN, and distributed FS. . Master node checks for convergence
  13. Markus gave a talk at Hadoop Summit 2014 – see on YouTube
  14. Much nicer in C# REEF itself has very little state; all state is in the driver
  15. Centralized error handling: mention Erlang/OTP supervisor architecture
  16. Much nicer in C# REEF itself has very little state; all state is in the driver
  17. Centralized error handling: mention Erlang/OTP supervisor architecture
  18. Java “type system”… Annotate constructor with @Inject, mark leaf parameters with @Parameter, other params must be classes with @Inject
  19. Centralized error handling: mention Erlang/OTP supervisor architecture
  20. Centralized error handling: mention Erlang/OTP supervisor architecture
  21. Centralized error handling: mention Erlang/OTP supervisor architecture
  22. Form a communication tree – nodes pass data along.. On reduce stage we also specify the aggregation operator
  23. Future work: community detection, larger datasets (pubmed), compare with LightLDA; in general: need better support for tensors (libraries, CUDA, parameter server)
  24. Future work: community detection, larger datasets (pubmed), compare with LightLDA; in general: need better support for tensors (libraries, CUDA, parameter server)
  25. Future work: community detection, larger datasets (pubmed), compare with LightLDA; in general: need better support for tensors (libraries, CUDA, parameter server) End: 20 min sharp Total ~24 min with questions
  26. Model (LDA) is independent from inference algorithms (variational Bayes, MCMC, tensors)