SlideShare a Scribd company logo
DECISION MAKING WITH MLLIB, SPARK AND
SPARK STREAMING
GIRISH S KATHALAGIRI
SAMSUNG SDS RESEARCH AMERICA
See all the presentations from the In-Memory Computing
Summit at http://imcsummit.org
AGENDA
 Introduction
 Decision Making System: Intro and Algorithms
 Decision Making System: Architecture and components
INTRODUCTION
SAMSUNG SDS
SAMSUNG SDS IS THE ENTERPRISE SOLUTIONS ARM OF THE SAMSUNG
GROUP, WITH A MAJOR FOOTPRINT IN ASIA AND EMERGING PRESENCE IN
THE US
3.9 4.1
5.7
6.7
7.2
2010 2011 2012 2013 2014
REVENUE (2014)
$7.2B
GLOBAL PRESENCE
47+ offices1 in 30 countries
EMPLOYEES
21,796
MARKET POSITION2
No. 1 Korean IT services provider
No. 2 largest IT service provider
in the Asia-Pacific region (excluding Japan)
Source: 1 includes IT outsourcing and logistics offices, as of December 31, 2014 2 Market Share, Gartner, 2014 3 Expressed in U.S. dollars at exchange rate in effect on December 31 of respective year
SAMSUNG SDS RESEARCH AMERICA
SDS Research America
Focus Decision Making
Recommendation
Decision
Insights
Model
Feature
Data
TEAM
DECISION MAKING SYSTEM: INTRO AND ALGORITHM
EXAMPLES OF DECISION MAKING IN ONLINE WORLD
 Ad Selection
 News Article Recommendations
 Website Optimization
 Auction and real-time bidding.
 Recommendation Systems.
TERMINOLOGY
• Set of options that are available for a problem.
Action/Arm
• Clicks, profit, revenue
Reward
• Software system that takes the decisions
Agent
• Factors external to the system with which the
agent is interacting
Environment
• Side information that is available
Context
Learning from interaction
EXPLORATION VS EXPLOITATION TRADE OFF
Decision-making involves a fundamental choice
Exploitation :
Make the best decision with existing
information that was collected.
Exploration :
Gather more information to see if there are
better decisions that can be made.
EXPLORATION VS EXPLOITATION EXAMPLES
 Online Advertising :
 Exploitation : Show most successful ad
 Exploration: Show a different ad
 Restaurant Selection:
 Exploitation : favorite restaurant
 Exploration : Trying a new one
 Cuisine selection:
 Exploitation : favorite dish
 Exploration : Try a new one
 Game :
 Exploitation : Play the best move (your belief)
 Exploration : Try a new move
EXPLORATION VS EXPLOITATION TRADE OFF
Area Exploration Exploitation
Economics Risk-Taking Risk-Avoiding
Finance Investing Saving
Marketing Diversification Concentration
Medicine Experimental treatment Safety and efficacy
CUMMULATIVE REWARD
Objective : Maximizing the Expected Cumulative Reward
REGRET
Objective : Minimize the Regret , over time horizon T
CHARACTERISTICS OF LEARNING WITH INTERACTION
 Agent Interacts with the environment to gather more data
 Agent performance is based on Agent’s decision
 Data available to Agent to learn is based on its decision
MULTI ARMED BANDIT
[Robbins ‘52]
MULTI-ARMED BANDIT
Set of K arms ( actions, choices , options )
At each time step t = 1 .. N
Agent selects an arm
Receives a reward from the
environment
Agent updates the belief about the
arms (estimates the value).
How does Agent selects the arm at any point of time ?
MULTI-ARMED BANDIT : EPSILON - GREEDY
Greedy (Exploit) : Highest estimated reward
Epsilon (Explore ) : Random choice
Dealing with Epsilon:
 Constant epsilon value (Epsilon Greedy
Strategy)
 Epsilon-Decreasing Strategy
 Epsilon-First Strategy
MULTI-ARMED BANDIT : SOFTMAX
 Epsilon-Greedy is relatively insensitive towards
relative performance levels
 Arms 0.99 vs. 0.01 and 0.52 vs. 0.48
 Softmax Strategy (Structured Exploration)
 Chooses the arm proportional to the estimated
value of arms
What if the initial few exploration was not so rewarding ?
MULTI-ARMED BANDIT : UPPER CONFIDENCE BOUND (UCB)
1. Take action that has best estimated mean
reward plus confidence
2. Environment generates reward
3. Agent Updates its expected mean reward and
confidence interval.
Optimism in the face of uncertainty
[Auer ’02]
MULTI-ARMED BANDIT : THOMPSON SAMPLING
1. For each arm, sample parameter from Beta
distribution.
2. Choose the arm that has maximum reward for
the chosen parameter.
3. Environment generates reward
4. Agent Updates the distribution for the arm.
[Thompson 1993]
STREAM PROCESSING OF MULTI-ARMED BANDIT
Time
Update
stats for
arms
Update
stats for
arms
Update
stats
Data (t-1) Data (t) Data (t+1)
Arm
stats (t-1)
Arm
stats (t)
Arm
stats (t)
Epsilon Greedy : estimate mean rewards for each arm
Softmax : estimate mean rewards for each arm , calculate softmax
Upper Confidence bound : estimate mean and confidence interval
Thompson Sampling : Update the parameters of beta dist.
CONTEXTUAL MULTI-ARMED BANDIT
 For t = 1, . . . , T:
1. The Environment request with some context xt ∈
X
2. The Agent chooses an action at ∈ {1, . . . ,K} for
the context
1. The Environment reacts with reward rt(at)
2. The Agent updates the model
Goal : Best action for the context.
[Auer-CesaBianchi-Freund-Schapire ’02]
OPTIMIZATION
Initialize Model Parameter
Repeat {
Using data, update the model parameters
} until convergence
ONLINE AND BATCH LEARNING
Online Learning (Stream Processing) Batch Learning
Quick update on
Parameters
Update parameters
from prev mini-batch
Update parameters
from prev mini-batch
Data (t-1)
Data (t)
Data (t+1)
Initialize Parameters
Initialize Parameters
All the training
data
Learn Model
Parameters
Faster Learning ,Approximation
Vs
Long term trends , Accurate Learning
TIMESCALES FOR LEARNING
Algorithms for Contextual Multi-armed Bandit
LinUCB [ Li et al 2010]
Thompson Sampling with Logistic Regression[Chapelle and Li 2011
]
DECISION MAKING
SYSTEM:
ARCHITECTURE AND
COMPONENTS
SOFTWARE STACK
 Real time decision making
 Scalable System
 Batch and Online Learning
Analytics Framework
KAFKA : DISTRIBUTED MESSAGING SYSTEM
 Distributed by design (Fault tolerant).
 Fast and Scalable.
 High throughput for both publishing and
subscribing.
 Multi-subscribers.
 Persist messages on disk : batched
consumption as well as real time applications.
http://kafka.apache.org/
SPARK AND SPARK STREAMING
 High volume data processing for feature
extraction as a means of modeling business
environment state;
 Model training on historical events
 Stream processing for Online updates
 Machine Learning Library
http://spark.apache.org/
MLLIB : MACHINE LEARNING LIBRARY
 Spark Integration
 Distributed Machine Learning
Algorithms
 Algorithmic Optimization
 High and Developer APIs
 Community
Basic Statistics
Summary Statistics
Correlations
Stratified Sampling
Hypothesis testing
Random Data Generator
Classification and
Regression
Linear Models ( SVM,
logistic regression )
Naïve bayes
Tree based models ( GBT,
RF, DT)
Collaborative filtering
Alternating
Least
Squares
(ALS)
Optimization
Stochastic gradient descent
(SGD)
Limited-memory BFGS
(L-BFGS)
Dimensionality
Reduction
Singular value
decomposition
(SVD)
Principal component
analysis
(PCA)
Clustering
K-means
Gaussian Mixture
Power iteration clustering
Latent Dirichlet allocation
Streaming k-means
http://www.jmlr.org/papers/volume17/15-237/15-237.pdf
MODEL STORAGE
 Hbase
 Models stored in PMML format.
 Import and Export from external system
 Model metrics and statistics are stored.
 Configuration information of the system.
http://dmg.org/pmml/pmml_examples/index.html
LAMBDA ARCHITECTURE
SERVING LAYER
 PLAY Framework
 Interfacing with external system
 Low Latency
 Mechanism for Multiple Models.
 Processes Request and Reward messages.
 Retrieves Model from Model store and caches.
 Logs the messages to Kafka topic.
SPEED LAYER
 Spark streaming application
 Receives messages from Kafka in micro
batches for processing.
 Latest model from Model Store and updates and
stores the model.
 Notifies the Model update to serving layer.
HISTORY LOGGER
 Spark Streaming application
 Kafka consumer.
 Archives messages logged by serving layer
 HDFS long term storage.
 Archived data used by batch layer.
BATCH LAYER
 Spark application
 Reads the historical archived data.
 Configured sliding window.
 Generates training data
 New Model from scratch.
 Stores it into Model Storage
MANAGEMENT SERVICES
 Suite of application
 Configuration of the system
 Monitoring the processes
 Administrative UI
 Authorization and Role based access control.
 Scheduling of workflows
LAMBDA ARCHITECTURE
RECAP
 Decision making algorithms that has Exploration vs Exploitation tradeoffs
 Multi-armed bandit and Contextual Multi-armed bandit algorithms.
 Lambda architecture
QUESTIONS ?
REFERENCES
1. A contextual-bandit approach to personalized news article recommendation; Lihong Li, Wei Chu, John
Langford, Robert E. Schapire
2. Generalized Thompson Sampling for Contextual Bandits; Lihong Li
3. Big Data: Principles and best practices of scalable realtime data systems. Nathan Marz & Warren J.
4. Data Mining Group. Predictive Model Markup Language.
5. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits ; Alekh Agarwal, Daniel Hsu,
Satyen Kale, John Langford, Lihong Li, Robert E. Schapire
6. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms; Lihong
Li, Wei Chu, John Langford, Xuanhui Wang
7. Reinforcement Learning: An Introduction ; Richard S. Sutton ,Andrew G. Barto

More Related Content

What's hot

Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2
Dave Gardner
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
lefredbe
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
Antonios Katsarakis
 
Patterns of parallel programming
Patterns of parallel programmingPatterns of parallel programming
Patterns of parallel programming
Alex Tumanoff
 
Knowledge share about scalable application architecture
Knowledge share about scalable application architectureKnowledge share about scalable application architecture
Knowledge share about scalable application architecture
AHM Pervej Kabir
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
Amazon Web Services
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Spark Summit
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud
Amazon Web Services
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
Databricks
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
ijccsa
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
Instaclustr
 
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
Amr Awadallah
 
Re invent 2018 meetup presentation
Re invent 2018 meetup presentationRe invent 2018 meetup presentation
Re invent 2018 meetup presentation
Eliran Yamin
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
Amazon Web Services
 
Lock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data GuaranteedLock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data Guaranteed
Jervin Real
 
LOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMSLOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMStanmayshah95
 
deep learning in production cff 2017
deep learning in production cff 2017deep learning in production cff 2017
deep learning in production cff 2017
Ari Kamlani
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
Amazon Web Services
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
Ben Slater
 

What's hot (20)

Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Patterns of parallel programming
Patterns of parallel programmingPatterns of parallel programming
Patterns of parallel programming
 
Knowledge share about scalable application architecture
Knowledge share about scalable application architectureKnowledge share about scalable application architecture
Knowledge share about scalable application architecture
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
 
Re invent 2018 meetup presentation
Re invent 2018 meetup presentationRe invent 2018 meetup presentation
Re invent 2018 meetup presentation
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Lock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data GuaranteedLock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data Guaranteed
 
LOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMSLOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMS
 
deep learning in production cff 2017
deep learning in production cff 2017deep learning in production cff 2017
deep learning in production cff 2017
 
AWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWSAWS Webcast - An Introduction to High Performance Computing on AWS
AWS Webcast - An Introduction to High Performance Computing on AWS
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 

Viewers also liked

IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
In-Memory Computing Summit
 
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
In-Memory Computing Summit
 
Julio Licinio - An important role for SAHMRI
Julio Licinio - An important role for SAHMRIJulio Licinio - An important role for SAHMRI
Julio Licinio - An important role for SAHMRI
Upstate Medical University
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
In-Memory Computing Summit
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
Traduction_GFontanaAntonelli_English_1Mai2o14
Traduction_GFontanaAntonelli_English_1Mai2o14Traduction_GFontanaAntonelli_English_1Mai2o14
Traduction_GFontanaAntonelli_English_1Mai2o14
Jasmine Desclaux-Salachas
 
Monaco da loi nguoc dong
Monaco da loi nguoc dongMonaco da loi nguoc dong
Monaco da loi nguoc dong
bongda100
 
PMUS de Rivas Vaciamadrid, premio europeo a la movilidad sostenible
PMUS de Rivas Vaciamadrid, premio europeo a la movilidad sosteniblePMUS de Rivas Vaciamadrid, premio europeo a la movilidad sostenible
PMUS de Rivas Vaciamadrid, premio europeo a la movilidad sostenible
Ecologistas en Accion
 
A message of hope from an entrepreneur
A message of hope from an entrepreneurA message of hope from an entrepreneur
A message of hope from an entrepreneur
Angela Ihunweze
 

Viewers also liked (14)

IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
 
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
 
Julio Licinio - An important role for SAHMRI
Julio Licinio - An important role for SAHMRIJulio Licinio - An important role for SAHMRI
Julio Licinio - An important role for SAHMRI
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
Traduction_GFontanaAntonelli_English_1Mai2o14
Traduction_GFontanaAntonelli_English_1Mai2o14Traduction_GFontanaAntonelli_English_1Mai2o14
Traduction_GFontanaAntonelli_English_1Mai2o14
 
Monaco da loi nguoc dong
Monaco da loi nguoc dongMonaco da loi nguoc dong
Monaco da loi nguoc dong
 
PMUS de Rivas Vaciamadrid, premio europeo a la movilidad sostenible
PMUS de Rivas Vaciamadrid, premio europeo a la movilidad sosteniblePMUS de Rivas Vaciamadrid, premio europeo a la movilidad sostenible
PMUS de Rivas Vaciamadrid, premio europeo a la movilidad sostenible
 
A message of hope from an entrepreneur
A message of hope from an entrepreneurA message of hope from an entrepreneur
A message of hope from an entrepreneur
 

Similar to IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, Spark and Spark Streaming

Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Data Con LA
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)
Dinis Cruz
 
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdfMachine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
Carlos Paredes
 
Machine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWERMachine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWER
Ganesan Narayanasamy
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
Splunk
 
AWS The Enterprise Cloud 2015
AWS The Enterprise Cloud 2015AWS The Enterprise Cloud 2015
AWS The Enterprise Cloud 2015
Vadim Zendejas
 
Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016
Arun Karthick Manoharan
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Greg Makowski
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Osman Ali
 
Innovaccer service capabilities with case studies
Innovaccer service capabilities with case studiesInnovaccer service capabilities with case studies
Innovaccer service capabilities with case studies
Abhinav Shashank
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
Splunk
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
Alluxio, Inc.
 
how to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projecthow to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept project
Zenodia Charpy
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
inside-BigData.com
 
Construção de uma plataforma de observabilidade centralizada
Construção de uma plataforma de observabilidade centralizadaConstrução de uma plataforma de observabilidade centralizada
Construção de uma plataforma de observabilidade centralizada
Elasticsearch
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
IRJET Journal
 

Similar to IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, Spark and Spark Streaming (20)

Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)
 
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdfMachine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
 
Machine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWERMachine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWER
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
AWS The Enterprise Cloud 2015
AWS The Enterprise Cloud 2015AWS The Enterprise Cloud 2015
AWS The Enterprise Cloud 2015
 
Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Coverage dallas june20-2006
Coverage dallas june20-2006Coverage dallas june20-2006
Coverage dallas june20-2006
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Innovaccer service capabilities with case studies
Innovaccer service capabilities with case studiesInnovaccer service capabilities with case studies
Innovaccer service capabilities with case studies
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
how to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projecthow to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept project
 
Introduction
IntroductionIntroduction
Introduction
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
 
Construção de uma plataforma de observabilidade centralizada
Construção de uma plataforma de observabilidade centralizadaConstrução de uma plataforma de observabilidade centralizada
Construção de uma plataforma de observabilidade centralizada
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 

More from In-Memory Computing Summit

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
In-Memory Computing Summit
 
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
In-Memory Computing Summit
 
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
In-Memory Computing Summit
 
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
In-Memory Computing Summit
 
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory EasyIMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
In-Memory Computing Summit
 
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
In-Memory Computing Summit
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
In-Memory Computing Summit
 

More from In-Memory Computing Summit (20)

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
 
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
 
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
 
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
IMC Summit 2016 Keynote - Arthur Sainio - NVDIMM: Changes are Here So What’s ...
 
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
IMC Summit 2016 Keynote - Robert Barr - In Memory Computing for Financial Ser...
 
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
 
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
IMC Summit 2016 Keynote - Jason Stamper - In-Memory: The Foundation of the In...
 
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory EasyIMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
 
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
IMCSummit 2016 Keynote - Abe Kleinfeld - The In-Memory Computing Landscape: L...
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
 

Recently uploaded

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, Spark and Spark Streaming

  • 1. DECISION MAKING WITH MLLIB, SPARK AND SPARK STREAMING GIRISH S KATHALAGIRI SAMSUNG SDS RESEARCH AMERICA See all the presentations from the In-Memory Computing Summit at http://imcsummit.org
  • 2. AGENDA  Introduction  Decision Making System: Intro and Algorithms  Decision Making System: Architecture and components
  • 4. SAMSUNG SDS SAMSUNG SDS IS THE ENTERPRISE SOLUTIONS ARM OF THE SAMSUNG GROUP, WITH A MAJOR FOOTPRINT IN ASIA AND EMERGING PRESENCE IN THE US 3.9 4.1 5.7 6.7 7.2 2010 2011 2012 2013 2014 REVENUE (2014) $7.2B GLOBAL PRESENCE 47+ offices1 in 30 countries EMPLOYEES 21,796 MARKET POSITION2 No. 1 Korean IT services provider No. 2 largest IT service provider in the Asia-Pacific region (excluding Japan) Source: 1 includes IT outsourcing and logistics offices, as of December 31, 2014 2 Market Share, Gartner, 2014 3 Expressed in U.S. dollars at exchange rate in effect on December 31 of respective year
  • 5. SAMSUNG SDS RESEARCH AMERICA SDS Research America Focus Decision Making Recommendation Decision Insights Model Feature Data
  • 7. DECISION MAKING SYSTEM: INTRO AND ALGORITHM
  • 8. EXAMPLES OF DECISION MAKING IN ONLINE WORLD  Ad Selection  News Article Recommendations  Website Optimization  Auction and real-time bidding.  Recommendation Systems.
  • 9. TERMINOLOGY • Set of options that are available for a problem. Action/Arm • Clicks, profit, revenue Reward • Software system that takes the decisions Agent • Factors external to the system with which the agent is interacting Environment • Side information that is available Context Learning from interaction
  • 10. EXPLORATION VS EXPLOITATION TRADE OFF Decision-making involves a fundamental choice Exploitation : Make the best decision with existing information that was collected. Exploration : Gather more information to see if there are better decisions that can be made.
  • 11. EXPLORATION VS EXPLOITATION EXAMPLES  Online Advertising :  Exploitation : Show most successful ad  Exploration: Show a different ad  Restaurant Selection:  Exploitation : favorite restaurant  Exploration : Trying a new one  Cuisine selection:  Exploitation : favorite dish  Exploration : Try a new one  Game :  Exploitation : Play the best move (your belief)  Exploration : Try a new move
  • 12. EXPLORATION VS EXPLOITATION TRADE OFF Area Exploration Exploitation Economics Risk-Taking Risk-Avoiding Finance Investing Saving Marketing Diversification Concentration Medicine Experimental treatment Safety and efficacy
  • 13. CUMMULATIVE REWARD Objective : Maximizing the Expected Cumulative Reward
  • 14. REGRET Objective : Minimize the Regret , over time horizon T
  • 15. CHARACTERISTICS OF LEARNING WITH INTERACTION  Agent Interacts with the environment to gather more data  Agent performance is based on Agent’s decision  Data available to Agent to learn is based on its decision
  • 17. MULTI-ARMED BANDIT Set of K arms ( actions, choices , options ) At each time step t = 1 .. N Agent selects an arm Receives a reward from the environment Agent updates the belief about the arms (estimates the value). How does Agent selects the arm at any point of time ?
  • 18. MULTI-ARMED BANDIT : EPSILON - GREEDY Greedy (Exploit) : Highest estimated reward Epsilon (Explore ) : Random choice Dealing with Epsilon:  Constant epsilon value (Epsilon Greedy Strategy)  Epsilon-Decreasing Strategy  Epsilon-First Strategy
  • 19. MULTI-ARMED BANDIT : SOFTMAX  Epsilon-Greedy is relatively insensitive towards relative performance levels  Arms 0.99 vs. 0.01 and 0.52 vs. 0.48  Softmax Strategy (Structured Exploration)  Chooses the arm proportional to the estimated value of arms What if the initial few exploration was not so rewarding ?
  • 20. MULTI-ARMED BANDIT : UPPER CONFIDENCE BOUND (UCB) 1. Take action that has best estimated mean reward plus confidence 2. Environment generates reward 3. Agent Updates its expected mean reward and confidence interval. Optimism in the face of uncertainty [Auer ’02]
  • 21. MULTI-ARMED BANDIT : THOMPSON SAMPLING 1. For each arm, sample parameter from Beta distribution. 2. Choose the arm that has maximum reward for the chosen parameter. 3. Environment generates reward 4. Agent Updates the distribution for the arm. [Thompson 1993]
  • 22. STREAM PROCESSING OF MULTI-ARMED BANDIT Time Update stats for arms Update stats for arms Update stats Data (t-1) Data (t) Data (t+1) Arm stats (t-1) Arm stats (t) Arm stats (t) Epsilon Greedy : estimate mean rewards for each arm Softmax : estimate mean rewards for each arm , calculate softmax Upper Confidence bound : estimate mean and confidence interval Thompson Sampling : Update the parameters of beta dist.
  • 23. CONTEXTUAL MULTI-ARMED BANDIT  For t = 1, . . . , T: 1. The Environment request with some context xt ∈ X 2. The Agent chooses an action at ∈ {1, . . . ,K} for the context 1. The Environment reacts with reward rt(at) 2. The Agent updates the model Goal : Best action for the context. [Auer-CesaBianchi-Freund-Schapire ’02]
  • 24. OPTIMIZATION Initialize Model Parameter Repeat { Using data, update the model parameters } until convergence
  • 25. ONLINE AND BATCH LEARNING Online Learning (Stream Processing) Batch Learning Quick update on Parameters Update parameters from prev mini-batch Update parameters from prev mini-batch Data (t-1) Data (t) Data (t+1) Initialize Parameters Initialize Parameters All the training data Learn Model Parameters Faster Learning ,Approximation Vs Long term trends , Accurate Learning
  • 26. TIMESCALES FOR LEARNING Algorithms for Contextual Multi-armed Bandit LinUCB [ Li et al 2010] Thompson Sampling with Logistic Regression[Chapelle and Li 2011 ]
  • 28. SOFTWARE STACK  Real time decision making  Scalable System  Batch and Online Learning Analytics Framework
  • 29. KAFKA : DISTRIBUTED MESSAGING SYSTEM  Distributed by design (Fault tolerant).  Fast and Scalable.  High throughput for both publishing and subscribing.  Multi-subscribers.  Persist messages on disk : batched consumption as well as real time applications. http://kafka.apache.org/
  • 30. SPARK AND SPARK STREAMING  High volume data processing for feature extraction as a means of modeling business environment state;  Model training on historical events  Stream processing for Online updates  Machine Learning Library http://spark.apache.org/
  • 31. MLLIB : MACHINE LEARNING LIBRARY  Spark Integration  Distributed Machine Learning Algorithms  Algorithmic Optimization  High and Developer APIs  Community Basic Statistics Summary Statistics Correlations Stratified Sampling Hypothesis testing Random Data Generator Classification and Regression Linear Models ( SVM, logistic regression ) Naïve bayes Tree based models ( GBT, RF, DT) Collaborative filtering Alternating Least Squares (ALS) Optimization Stochastic gradient descent (SGD) Limited-memory BFGS (L-BFGS) Dimensionality Reduction Singular value decomposition (SVD) Principal component analysis (PCA) Clustering K-means Gaussian Mixture Power iteration clustering Latent Dirichlet allocation Streaming k-means http://www.jmlr.org/papers/volume17/15-237/15-237.pdf
  • 32. MODEL STORAGE  Hbase  Models stored in PMML format.  Import and Export from external system  Model metrics and statistics are stored.  Configuration information of the system. http://dmg.org/pmml/pmml_examples/index.html
  • 34. SERVING LAYER  PLAY Framework  Interfacing with external system  Low Latency  Mechanism for Multiple Models.  Processes Request and Reward messages.  Retrieves Model from Model store and caches.  Logs the messages to Kafka topic.
  • 35. SPEED LAYER  Spark streaming application  Receives messages from Kafka in micro batches for processing.  Latest model from Model Store and updates and stores the model.  Notifies the Model update to serving layer.
  • 36. HISTORY LOGGER  Spark Streaming application  Kafka consumer.  Archives messages logged by serving layer  HDFS long term storage.  Archived data used by batch layer.
  • 37. BATCH LAYER  Spark application  Reads the historical archived data.  Configured sliding window.  Generates training data  New Model from scratch.  Stores it into Model Storage
  • 38. MANAGEMENT SERVICES  Suite of application  Configuration of the system  Monitoring the processes  Administrative UI  Authorization and Role based access control.  Scheduling of workflows
  • 40. RECAP  Decision making algorithms that has Exploration vs Exploitation tradeoffs  Multi-armed bandit and Contextual Multi-armed bandit algorithms.  Lambda architecture
  • 42. REFERENCES 1. A contextual-bandit approach to personalized news article recommendation; Lihong Li, Wei Chu, John Langford, Robert E. Schapire 2. Generalized Thompson Sampling for Contextual Bandits; Lihong Li 3. Big Data: Principles and best practices of scalable realtime data systems. Nathan Marz & Warren J. 4. Data Mining Group. Predictive Model Markup Language. 5. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits ; Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire 6. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms; Lihong Li, Wei Chu, John Langford, Xuanhui Wang 7. Reinforcement Learning: An Introduction ; Richard S. Sutton ,Andrew G. Barto

Editor's Notes

  1. Focus : Decision making algorithms and solutions using these algorithms. Some of it we will be talking about through course of the presentation.
  2. Lets first look at decision making in general and algorithms in this section
  3. Learning from interaction
  4. Fields
  5. Imagine a casino setting … Also, K-armed bandit problem where a Gambler is faced with set of slot machines with different payout distributions. At each time Gambler has to choose an arm , which pays out some reward. Objective : To maximize the sum of rewards earned in a sequence of lever pulls.
  6. Little more formal definition.
  7. Under explore the options that initially gave less reward.
  8. the Agent’s aim is to collect enough information about how the context vectors and rewards relate to each other, so that it can predict the next best arm to play by looking at the feature vectors
  9. More explanation ….. ----- Meeting Notes (5/22/16 20:01) ----- Iterative jobs and In Memory Computing.... Moves to optimal value.
  10. Challenges that are presented by these algorithms Lambda Architecture
  11. Sliding window on the data , so that we can decrease the influence of historical data. New article example ..