SlideShare a Scribd company logo
Boosting Machine Learning with Redis
Modules and Spark
Dvir Volk, Redis Labs, November 2016
2
Hello World
Open source. The leading in-memory database
The open source home and commercial provider of
Redis - cloud and on-premise
Senior System Architect at Redis Labs. Redis user
and contributor for ~6 years
@dvirsky
dvirvolk
3
A Brief Overview of Redis
● Started in 2009 by Salvatore Sanfilippo
● Mostly a one man show
● Most popular KV store
● Notable Users:
○ Twitter, Netflix, Uber, Groupon, Twitch
○ Many, many more...
4
A Brief Overview of Redis
▪ Key => Data Structure server
▪ In memory disk backed
▪ Optional cluster mode
▪ Embedded Lua scripting
▪ Single Threaded!
▪ Key features: Fast, Flexible, Simple
5
A Lego For Your Database
Key
"I'm a Plain Text String!"
{ A: “foo”, B: “bar”, C: “baz” }
Strings/Blobs/Bitmaps
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
Geo Sets
HyperLogLog
{ A , B , C , D , E }
[ A → B → C → D → E ]
{ A: 0.1, B: 0.3, C: 100, D: 1337 }
{ A: (51.5, 0.12), B: (32.1, 34.7) }
00110101 11001110 10101010
6
Redis In Practice
▪ “Front End Database”
▪ Real Time Counters
▪ Ad Serving
▪ Message Queues
▪ Geo Database
▪ Time Series
▪ Cache
▪ Session State
▪ Etc
7
But Can Redis Do X?
Secondary Index?
Time Series?
Full Text Search?
Graph?
Machine Learning?
AutoComplete?
SQL?
8
So You Want a New Feature?
▪ Try a Lua script
▪ Convince @antirez
▪ Fork Redis
▪ Build Your Own Database!
9
Enter Redis Modules
▪ In development since March 2016
▪ Redis 4.0 RC out soon
▪ Several modules already exist
▪ Key paradigm shift for Redis
10
New Capabilities
What Modules Actually Are
▪ Dynamic libraries loaded to redis
▪ Written in C/C++
▪ Use a C ABI/API isolating redis internals
▪ Near Zero latency access to data
New Commands
New Data Types
11
Obligatory Module Example
12
LEFTPAD Example
127.0.0.1:6379> MODULE LOAD "./example.so"
OK
127.0.0.1:6379> COMMAND INFO EXAMPLE.LEFTPAD
1) 1) "example.leftpad"
...
127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8
foo
127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 "_"
_____foo
13
Real Module: RediSearch
▪ From-Scratch search index over redis
▪ Uses Strings for holding compressed index data
▪ Includes stemming, exact phrase match, etc.
▪ Fast Fuzzy Auto-complete
▪ Up to X5 faster than Elastic / Solr
> FT.SEARCH “lcd tv” FILTER price 100 +inf
> FT.SUGGET “lcd” FUZZY
14
More Modules Out There
▪ Native JSON Support
▪ Time Series
▪ Secondary Indexing
▪ Encryption
▪ Bloom Filters
▪ Online Neural Network
▪ Many Many more...
15
Spark ML + Redis modules
16
Redis + Spark So Far
▪ Current connector:
- RDD abstraction
- SparkSQL
- Streaming Source
▪ ML is not addressed specifically
▪ Used for pre-computed results
▪ We felt that we can take it further
17
Addressing The ML Pain
▪ The missing piece of ML: Serving your model
- Not standardized
- Vendor-lock with cloud platforms
- Reliable services are hard to do
- If only we had a “database” for this!
- Well, maybe we do?
18
Why Modules for ML?
With modules we can:
▪ Define data structures for models
▪ Store training output as “hot model”
▪ Perform evaluation directly in Redis
▪ Easily integrate existing C/C++ libs
19
Spark + Modules = AWESOME
▪ Train ML model on Spark
▪ Save model to Redis and get:
- High availability
- Clustering
- Persistence
- Performance
- Client libraries
20
Spark-ML End-to-End Flow
Spark Training
Custom Server
Model saved to
Parquet file
Data Loaded
to Spark
Pre-computed
results
Batch Evaluation
?
ClientApp
21
Adding Redis Into The Mix
Redis-ML “Active Model”
Any Training Platform
ClientApp
Spark Training
Data Loaded
to Spark
22
Redis Module
Tree Ensembles
Linear Regression
Logistic Regression
Matrix + Vector Operations
More to come...
The Redis-ML Module
23
Example: Random Forest
24
Forest Data Type
▪ A collection of decision trees
▪ Supports classification & regression
▪ Splitter Node can be
- Categorical (e.g. day == “Sunday”)
- Numerical (e.g. age < 43)
25
Decision Tree Example
The famous Titanic survival predictor
sex=male?yes no
Survived
Died
Age > 9.5?
sibsp > 2.5?
Died Survived *sibsp = siblings + spouses
26
Forest Data Type Example
> MODULE LOAD "./redis-ml.so"
OK
> ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L
LEAF 1 .R LEAF 0
OK
> ML.FOREST.RUN myforest sex:male
"1"
> ML.FOREST.RUN myforest sex:yes_please
"0"
27
Using Redis-ML With Spark
scala> import com.redislabs.client.redisml.MLClient
scala> import com.redislabs.provider.redis.ml.Forest
scala> val rfModel =
pipelineModel.stages.last.asInstanceOf[RandomForestClassificationModel]
scala> val f = new Forest(rfModel.trees)
scala> f.loadToRedis("forest-test", "localhost")
scala> val jedis = new Jedis("localhost")
scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN,
"forest-test", makeInputString (0))
scala> jedis.getClient.getStatusCodeReply
res53: String = 1
28
Benchmarking Redis-ML
- Spark + Parquet Spark + Redis ML
Model Preparation + Save 3785ms 292ms
Model Load 2769ms 0ms (model is on memory)
Classification (AVG) 13ms 1ms
● Forest size: 15000 trees
● Data: $(SPARK_HOME)/data/mllib/sample_libsvm_data.txt
29
Going Forward - More Features
▪ Implement more Spark-ML model types
- SVM
- Naive Bayes Classifier
- Neural Networks
▪ Integration with Redis’ native types
▪ Data Processing (e.g. Word2Vec, TF-IDF)
▪ PMML Support
30
PS: Neural Redis
▪ Developed by Salvatore
▪ Training is done inside redis
▪ Online continuous training process
▪ Builds Fully Connected NNs
31
More Resources
Redis-ML:
https://github.com/RedisLabsModules/redis-ml
Spark-Redis-ML:
https://github.com/RedisLabs/spark-redis-ml
Neural-Redis:
https://github.com/antirez/neural-redis
32

More Related Content

What's hot

Redis modules 101
Redis modules 101Redis modules 101
Redis modules 101
Dvir Volk
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Erik Krogen
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
Christopher Spring
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
distributed matters
 
Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Tim Lossen
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
Aniruddha Chakrabarti
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
Tanu Siwag
 
Background Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitBackground Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbit
Redis Labs
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Lucidworks
 
Caching solutions with Redis
Caching solutions   with RedisCaching solutions   with Redis
Caching solutions with Redis
George Platon
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redis
Zhichao Liang
 
Redis tutoring
Redis tutoringRedis tutoring
Redis tutoring
Chen-Tien Tsai
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdf
Erin O'Neill
 
Redis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsRedis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale Apps
Dave Nielsen
 
Scaling php applications with redis
Scaling php applications with redisScaling php applications with redis
Scaling php applications with redis
jimbojsb
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Erik Krogen
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
Yoshinori Matsunobu
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
DataStax Academy
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
NexThoughts Technologies
 

What's hot (20)

Redis modules 101
Redis modules 101Redis modules 101
Redis modules 101
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
 
Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
Background Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitBackground Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbit
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Caching solutions with Redis
Caching solutions   with RedisCaching solutions   with Redis
Caching solutions with Redis
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redis
 
Redis tutoring
Redis tutoringRedis tutoring
Redis tutoring
 
Redis memcached pdf
Redis memcached pdfRedis memcached pdf
Redis memcached pdf
 
Redis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsRedis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale Apps
 
Scaling php applications with redis
Scaling php applications with redisScaling php applications with redis
Scaling php applications with redis
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 

Similar to Boosting Machine Learning with Redis Modules and Spark

Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit
 
Redis by-hari
Redis by-hariRedis by-hari
Redis by-hari
Hari Bachala
 
Cost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWSCost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWS
Amazon Web Services
 
What's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis LabsWhat's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis Labs
Redis Labs
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis Usage
Timothy Spann
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014soujavajug
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Itamar Haber
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
Real time Object Detection and Analytics using RedisEdge and Docker
Real time Object Detection and Analytics using RedisEdge and DockerReal time Object Detection and Analytics using RedisEdge and Docker
Real time Object Detection and Analytics using RedisEdge and Docker
Ajeet Singh Raina
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017
HashedIn Technologies
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Redis Labs
 
MySQL Document Store - A Document Store with all the benefts of a Transactona...
MySQL Document Store - A Document Store with all the benefts of a Transactona...MySQL Document Store - A Document Store with all the benefts of a Transactona...
MySQL Document Store - A Document Store with all the benefts of a Transactona...
Olivier DASINI
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
Swiss Data Forum Swiss Data Forum
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Redispresentation apac2012
Redispresentation apac2012Redispresentation apac2012
Redispresentation apac2012
Ankur Gupta
 
Developing a Redis Module - Hackathon Kickoff
 Developing a Redis Module - Hackathon Kickoff Developing a Redis Module - Hackathon Kickoff
Developing a Redis Module - Hackathon Kickoff
Itamar Haber
 
Hadoop & no sql new generation database systems
Hadoop & no sql   new generation database systemsHadoop & no sql   new generation database systems
Hadoop & no sql new generation database systemsramazan fırın
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
cdmaxime
 

Similar to Boosting Machine Learning with Redis Modules and Spark (20)

Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 
Redis by-hari
Redis by-hariRedis by-hari
Redis by-hari
 
Cost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWSCost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWS
 
What's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis LabsWhat's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis Labs
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis Usage
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014Hadoop Spark - Reuniao SouJava 12/04/2014
Hadoop Spark - Reuniao SouJava 12/04/2014
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
Real time Object Detection and Analytics using RedisEdge and Docker
Real time Object Detection and Analytics using RedisEdge and DockerReal time Object Detection and Analytics using RedisEdge and Docker
Real time Object Detection and Analytics using RedisEdge and Docker
 
NoSQL
NoSQLNoSQL
NoSQL
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
 
MySQL Document Store - A Document Store with all the benefts of a Transactona...
MySQL Document Store - A Document Store with all the benefts of a Transactona...MySQL Document Store - A Document Store with all the benefts of a Transactona...
MySQL Document Store - A Document Store with all the benefts of a Transactona...
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Redispresentation apac2012
Redispresentation apac2012Redispresentation apac2012
Redispresentation apac2012
 
Developing a Redis Module - Hackathon Kickoff
 Developing a Redis Module - Hackathon Kickoff Developing a Redis Module - Hackathon Kickoff
Developing a Redis Module - Hackathon Kickoff
 
Hadoop & no sql new generation database systems
Hadoop & no sql   new generation database systemsHadoop & no sql   new generation database systems
Hadoop & no sql new generation database systems
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
 

More from Dvir Volk

RediSearch
RediSearchRediSearch
RediSearch
Dvir Volk
 
Searching Billions of Documents with Redis
Searching Billions of Documents with RedisSearching Billions of Documents with Redis
Searching Billions of Documents with Redis
Dvir Volk
 
Tales Of The Black Knight - Keeping EverythingMe running
Tales Of The Black Knight - Keeping EverythingMe runningTales Of The Black Knight - Keeping EverythingMe running
Tales Of The Black Knight - Keeping EverythingMe running
Dvir Volk
 
10 reasons to be excited about go
10 reasons to be excited about go10 reasons to be excited about go
10 reasons to be excited about go
Dvir Volk
 
Kicking ass with redis
Kicking ass with redisKicking ass with redis
Kicking ass with redisDvir Volk
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
Dvir Volk
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to ThriftDvir Volk
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 

More from Dvir Volk (8)

RediSearch
RediSearchRediSearch
RediSearch
 
Searching Billions of Documents with Redis
Searching Billions of Documents with RedisSearching Billions of Documents with Redis
Searching Billions of Documents with Redis
 
Tales Of The Black Knight - Keeping EverythingMe running
Tales Of The Black Knight - Keeping EverythingMe runningTales Of The Black Knight - Keeping EverythingMe running
Tales Of The Black Knight - Keeping EverythingMe running
 
10 reasons to be excited about go
10 reasons to be excited about go10 reasons to be excited about go
10 reasons to be excited about go
 
Kicking ass with redis
Kicking ass with redisKicking ass with redis
Kicking ass with redis
 
Introduction to redis - version 2
Introduction to redis - version 2Introduction to redis - version 2
Introduction to redis - version 2
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 

Recently uploaded

Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 

Recently uploaded (20)

Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 

Boosting Machine Learning with Redis Modules and Spark

  • 1. Boosting Machine Learning with Redis Modules and Spark Dvir Volk, Redis Labs, November 2016
  • 2. 2 Hello World Open source. The leading in-memory database The open source home and commercial provider of Redis - cloud and on-premise Senior System Architect at Redis Labs. Redis user and contributor for ~6 years @dvirsky dvirvolk
  • 3. 3 A Brief Overview of Redis ● Started in 2009 by Salvatore Sanfilippo ● Mostly a one man show ● Most popular KV store ● Notable Users: ○ Twitter, Netflix, Uber, Groupon, Twitch ○ Many, many more...
  • 4. 4 A Brief Overview of Redis ▪ Key => Data Structure server ▪ In memory disk backed ▪ Optional cluster mode ▪ Embedded Lua scripting ▪ Single Threaded! ▪ Key features: Fast, Flexible, Simple
  • 5. 5 A Lego For Your Database Key "I'm a Plain Text String!" { A: “foo”, B: “bar”, C: “baz” } Strings/Blobs/Bitmaps Hash Tables (objects!) Linked Lists Sets Sorted Sets Geo Sets HyperLogLog { A , B , C , D , E } [ A → B → C → D → E ] { A: 0.1, B: 0.3, C: 100, D: 1337 } { A: (51.5, 0.12), B: (32.1, 34.7) } 00110101 11001110 10101010
  • 6. 6 Redis In Practice ▪ “Front End Database” ▪ Real Time Counters ▪ Ad Serving ▪ Message Queues ▪ Geo Database ▪ Time Series ▪ Cache ▪ Session State ▪ Etc
  • 7. 7 But Can Redis Do X? Secondary Index? Time Series? Full Text Search? Graph? Machine Learning? AutoComplete? SQL?
  • 8. 8 So You Want a New Feature? ▪ Try a Lua script ▪ Convince @antirez ▪ Fork Redis ▪ Build Your Own Database!
  • 9. 9 Enter Redis Modules ▪ In development since March 2016 ▪ Redis 4.0 RC out soon ▪ Several modules already exist ▪ Key paradigm shift for Redis
  • 10. 10 New Capabilities What Modules Actually Are ▪ Dynamic libraries loaded to redis ▪ Written in C/C++ ▪ Use a C ABI/API isolating redis internals ▪ Near Zero latency access to data New Commands New Data Types
  • 12. 12 LEFTPAD Example 127.0.0.1:6379> MODULE LOAD "./example.so" OK 127.0.0.1:6379> COMMAND INFO EXAMPLE.LEFTPAD 1) 1) "example.leftpad" ... 127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 foo 127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 "_" _____foo
  • 13. 13 Real Module: RediSearch ▪ From-Scratch search index over redis ▪ Uses Strings for holding compressed index data ▪ Includes stemming, exact phrase match, etc. ▪ Fast Fuzzy Auto-complete ▪ Up to X5 faster than Elastic / Solr > FT.SEARCH “lcd tv” FILTER price 100 +inf > FT.SUGGET “lcd” FUZZY
  • 14. 14 More Modules Out There ▪ Native JSON Support ▪ Time Series ▪ Secondary Indexing ▪ Encryption ▪ Bloom Filters ▪ Online Neural Network ▪ Many Many more...
  • 15. 15 Spark ML + Redis modules
  • 16. 16 Redis + Spark So Far ▪ Current connector: - RDD abstraction - SparkSQL - Streaming Source ▪ ML is not addressed specifically ▪ Used for pre-computed results ▪ We felt that we can take it further
  • 17. 17 Addressing The ML Pain ▪ The missing piece of ML: Serving your model - Not standardized - Vendor-lock with cloud platforms - Reliable services are hard to do - If only we had a “database” for this! - Well, maybe we do?
  • 18. 18 Why Modules for ML? With modules we can: ▪ Define data structures for models ▪ Store training output as “hot model” ▪ Perform evaluation directly in Redis ▪ Easily integrate existing C/C++ libs
  • 19. 19 Spark + Modules = AWESOME ▪ Train ML model on Spark ▪ Save model to Redis and get: - High availability - Clustering - Persistence - Performance - Client libraries
  • 20. 20 Spark-ML End-to-End Flow Spark Training Custom Server Model saved to Parquet file Data Loaded to Spark Pre-computed results Batch Evaluation ? ClientApp
  • 21. 21 Adding Redis Into The Mix Redis-ML “Active Model” Any Training Platform ClientApp Spark Training Data Loaded to Spark
  • 22. 22 Redis Module Tree Ensembles Linear Regression Logistic Regression Matrix + Vector Operations More to come... The Redis-ML Module
  • 24. 24 Forest Data Type ▪ A collection of decision trees ▪ Supports classification & regression ▪ Splitter Node can be - Categorical (e.g. day == “Sunday”) - Numerical (e.g. age < 43)
  • 25. 25 Decision Tree Example The famous Titanic survival predictor sex=male?yes no Survived Died Age > 9.5? sibsp > 2.5? Died Survived *sibsp = siblings + spouses
  • 26. 26 Forest Data Type Example > MODULE LOAD "./redis-ml.so" OK > ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L LEAF 1 .R LEAF 0 OK > ML.FOREST.RUN myforest sex:male "1" > ML.FOREST.RUN myforest sex:yes_please "0"
  • 27. 27 Using Redis-ML With Spark scala> import com.redislabs.client.redisml.MLClient scala> import com.redislabs.provider.redis.ml.Forest scala> val rfModel = pipelineModel.stages.last.asInstanceOf[RandomForestClassificationModel] scala> val f = new Forest(rfModel.trees) scala> f.loadToRedis("forest-test", "localhost") scala> val jedis = new Jedis("localhost") scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN, "forest-test", makeInputString (0)) scala> jedis.getClient.getStatusCodeReply res53: String = 1
  • 28. 28 Benchmarking Redis-ML - Spark + Parquet Spark + Redis ML Model Preparation + Save 3785ms 292ms Model Load 2769ms 0ms (model is on memory) Classification (AVG) 13ms 1ms ● Forest size: 15000 trees ● Data: $(SPARK_HOME)/data/mllib/sample_libsvm_data.txt
  • 29. 29 Going Forward - More Features ▪ Implement more Spark-ML model types - SVM - Naive Bayes Classifier - Neural Networks ▪ Integration with Redis’ native types ▪ Data Processing (e.g. Word2Vec, TF-IDF) ▪ PMML Support
  • 30. 30 PS: Neural Redis ▪ Developed by Salvatore ▪ Training is done inside redis ▪ Online continuous training process ▪ Builds Fully Connected NNs
  • 32. 32