SlideShare a Scribd company logo
Accelerating Spark-ML
with Redis modules
Dvir Volk, Shay Nativ
Hello World
Open source. The leading in-memory
database
The open source home and commercial
provider of Redis - cloud and on-premise
Senior System Architect at Redis Labs.
Redis user and contributor for ~6 years
Senior Software Developer at Redis Labs
dvirvolk
shaynativ
A Brief Overview of Redis
● Started in 2009 by Salvatore Sanfilippo
● Mostly a one man show
● Most popular KV store
● Notable Users:
○ Twitter, Netflix, Uber, Groupon, Twitch
○ Many, many more...
A Brief Overview of Redis
• Key => Data Structure server
• In memory disk backed
• Optional cluster mode
• Embedded Lua scripting
• Single Threaded!
• Key features: Fast, Flexible, Simple
A Lego For Your Database
Key
"I'm a Plain Text String!"
{ A: “foo”, B: “bar”, C: “baz” }
Strings/Blobs/Bitmaps
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
Geo Sets
HyperLogLog
{ A , B , C , D , E }
[ A → B → C → D → E ]
{ A: 0.1, B: 0.3, C: 100, D: 1337 }
{ A: (51.5, 0.12), B: (32.1, 34.7) }
00110101 11001110 10101010
Redis In Practice
▪ “Front End Database”
▪ Real Time Counters
▪ Ad Serving
▪ Message Queues
▪ Geo Database
▪ Time Series
▪ Cache
▪ Session State
▪ Etc
Redis + Spark
• Spark-Redis connector
• Redis RDD
• SparkSQL integration
• Redis as a data source
• Redis as the final output
But Can Redis Do X?
Secondary Index?
Time Series?
Full Text Search?
Graph?
Machine Learning?
AutoComplete?
SQL?
So You Want a New Feature?
• Try a Lua script
• Convince @antirez
• Fork Redis
• Build Your Own Database!
Enter Redis Modules
• In development since March 2016
• Redis 4.0 RC out soon
• Several modules already exist
• Key paradigm shift for Redis
Modules In Action
New Capabilities
What Modules Actually Are
• Dynamic libraries loaded to redis
• Written in C/C++
• Use a C ABI/API isolating redis internals
• Near Zero latency access to data
New Commands
New Data Types
Obligatory Module Example
LEFTPAD Example
127.0.0.1:6379> MODULE LOAD "./example.so"
OK
127.0.0.1:6379> COMMAND INFO EXAMPLE.LEFTPAD
1) 1) "example.leftpad"
...
127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8
foo
127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 "_"
_____foo
Real Module: RediSearch
• From-Scratch search index over redis
• Uses Strings for holding compressed index data
• Includes stemming, exact phrase match, etc.
• Fast Fuzzy Auto-complete
• Up to X5 faster than Elastic / Solr
> FT.SEARCH “lcd tv” FILTER price 100 +inf
> FT.SUGGET “lcd” FUZZY
Real Module: Indexing
• Support for secondary indexes for redis
• Supports indexing HASH keys with their properties
• Optional raw indexes as data types
• SQL-like syntax for querying indexes
> IDX.CREATE users_name_age TYPE HASH SCHEMA name STRING age INT32
> IDX.INTO users_name_age HMSET user1 name "alice" age 30
> IDX.FROM users_name_age WHERE "name LIKE 'ali%' AND age < 31" HGETALL $
Real Module: JSON
• Stores JSON objects into redis
• Allows retrieval of part of a document
• Allows atomic manipulation of document elements
> JSON.SET foo ‘{“name”: {“first”: “bob”, “last”:”doe”},
“age”: 32}`
> JSON.GET foo name.first
> JSON.SET foo age 33
Spark ML + Redis modules
Redis + Spark So Far
• ML is not addressed specifically
• Used for pre-computed results
• We felt that we can take it further
Addressing The ML Pain
• The missing piece of ML: Serving your model
– Not standardized
– Vendor-lock with cloud platforms
– Reliable services are hard to do
– If only we had a “database” for this!
– Well, maybe we do?
Why Modules for ML?
With modules we can:
• Define data structures for models
• Store training output as “hot model”
• Perform evaluation directly in Redis
• Easily integrate existing C/C++ libs
Spark + Modules = AWESOME
• Train ML model on Spark
• Save model to Redis and get:
– High availability
– Clustering
– Persistence
– Performance
– Client libraries
Spark-ML End-to-End Flow
Spark Training
Custom Server
Model saved to
Parquet file
Data Loaded
to Spark
Pre-computed
results
Batch Evaluation
?
ClientApp
Adding Redis Into The Mix
Redis-ML “Active Model”
Any Training Platform
ClientApp
Spark Training
Data Loaded
to Spark
Redis-ML Module
Tree Ensembles
Linear Regression
Logistic Regression
Matrix + Vector Operations
More to come...
Example: Random Forest
Forest Data Type
• A collection of decision trees
• Supports classification & regression
• Splitter Node can be
– Categorical (e.g. day == “Sunday”)
– Numerical (e.g. age < 43)
Decision Tree Example
The famous Titanic survival predictor
sex=male?yes no
Survived
Died
Age > 9.5?
sibsp > 2.5?
Died Survived
*sibsp = siblings + spouses
Forest Data Type API
Add nodes to a tree in a forest:
ML.FOREST.ADD <forestId> <treeId> <path>
[ [NUMERIC|CATEGORIC] <splitterAttr> <splitterVal> ] |
[LEAF] <predVal>
ML.FOREST.RUN <forestId> <features>
[CLASSIFICATION|REGRESSION]
Perform classification/regression of a feature vector:
*feature vector is in libSVM format k:v k:v ...
Forest Data Type Example
> MODULE LOAD "./redis-ml.so"
OK
> ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L
LEAF 1 .R LEAF 0
OK
> ML.FOREST.RUN myforest sex:male
"1"
> ML.FOREST.RUN myforest sex:yes_please
"0"
Using Redis-ML With Spark
scala> import com.redislabs.client.redisml.MLClient
scala> import com.redislabs.provider.redis.ml.Forest
scala> val rfModel =
pipelineModel.stages.last.asInstanceOf[RandomForestClassificationModel]
scala> val f = new Forest(rfModel.trees)
scala> f.loadToRedis("forest-test", "localhost")
scala> val jedis = new Jedis("localhost")
scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN,
"forest-test", makeInputString (0))
scala> jedis.getClient.getStatusCodeReply
res53: String = 1
Benchmarking Redis-ML
- Spark + Parquet Spark + Redis ML
Model Preparation + Save 3785ms 292ms
Model Load 2769ms 0ms (model is on memory)
Classification (AVG) 13ms 1ms
● Forest size: 15000 trees
● Data: $(SPARK_HOME)/data/mllib/sample_libsvm_data.txt
Going Forward - More Features
• Implement more Spark-ML model types
– SVM
– Naive Bayes Classifier
– Neural Networks
• Integration with Redis’ native types
PS: Neural Redis
▪ Developed by Salvatore
▪ Training is done inside redis
▪ Online continuous training process
▪ Builds Fully Connected NNs
More Resources
Redis-ML:
https://github.com/RedisLabsModules/redis-ml
Spark-Redis-ML:
https://github.com/RedisLabs/spark-redis-ml
Neural-Redis:
https://github.com/antirez/neural-redis
Spark Summit EU talk by Shay Nativ and Dvir Volk

More Related Content

What's hot

NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et ZeppelinNigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
Zenika
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
Spark Summit
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
Spark Summit
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn Whittick
Spark Summit
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
Databricks
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
Spark Summit
 
Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick Pentreath
Spark Summit
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
Miklos Christine
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
Databricks
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit
 
Spark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital KediaSpark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital Kedia
Spark Summit
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
Spark Summit
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelin
felixcss
 
Apache Spark and Online Analytics
Apache Spark and Online Analytics Apache Spark and Online Analytics
Apache Spark and Online Analytics
Databricks
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
Spark Summit
 

What's hot (20)

NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et ZeppelinNigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn Whittick
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
 
Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick Pentreath
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
 
Spark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital KediaSpark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital Kedia
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelin
 
Apache Spark and Online Analytics
Apache Spark and Online Analytics Apache Spark and Online Analytics
Apache Spark and Online Analytics
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
 

Viewers also liked

How china is ruled: Unpacking the CCP
How china is ruled: Unpacking the CCPHow china is ruled: Unpacking the CCP
How china is ruled: Unpacking the CCP
Bright Mhango
 
豆瓣数据架构实践
豆瓣数据架构实践豆瓣数据架构实践
豆瓣数据架构实践
Xupeng Yun
 
Como Planejar sua Campanha Promocional com Brindes | Apresentação 2 de 3
Como Planejar sua Campanha Promocional com Brindes | Apresentação 2 de 3Como Planejar sua Campanha Promocional com Brindes | Apresentação 2 de 3
Como Planejar sua Campanha Promocional com Brindes | Apresentação 2 de 3
Memory Promotional Enterprise
 
1114 sasaki-metadata
1114 sasaki-metadata1114 sasaki-metadata
1114 sasaki-metadata
Felix Sasaki
 
Kristumata chedanam chattampiswamikal-englishtranslation
Kristumata chedanam chattampiswamikal-englishtranslationKristumata chedanam chattampiswamikal-englishtranslation
Kristumata chedanam chattampiswamikal-englishtranslation
Kannan Kunnathully
 
Convegno “ Stress, molestie lavorative e organizzative del lavoro: aspetti pr...
Convegno “ Stress, molestie lavorative e organizzative del lavoro: aspetti pr...Convegno “ Stress, molestie lavorative e organizzative del lavoro: aspetti pr...
Convegno “ Stress, molestie lavorative e organizzative del lavoro: aspetti pr...
Drughe .it
 
Laying Down the Groundwork for Financial Stability for Architecture & Enginee...
Laying Down the Groundwork for Financial Stability for Architecture & Enginee...Laying Down the Groundwork for Financial Stability for Architecture & Enginee...
Laying Down the Groundwork for Financial Stability for Architecture & Enginee...
Citrin Cooperman
 
CUIDAPP - Cuida tu Ciudad - PITCH
CUIDAPP - Cuida tu Ciudad - PITCHCUIDAPP - Cuida tu Ciudad - PITCH
CUIDAPP - Cuida tu Ciudad - PITCH
Carlos J Carvajalino
 
Pensar, sentir, necesitar
Pensar, sentir, necesitarPensar, sentir, necesitar
Pensar, sentir, necesitar
Felicidad Campal
 
利用規約 The policy about commercial use
利用規約 The policy about commercial use利用規約 The policy about commercial use
利用規約 The policy about commercial use
Masahiro Onoguchi
 
What Rockstars Can Teach You About Kicking Ass With Social Media
What Rockstars Can Teach You About Kicking Ass With Social MediaWhat Rockstars Can Teach You About Kicking Ass With Social Media
What Rockstars Can Teach You About Kicking Ass With Social Media
Mack Collier
 
Pros y contras de que mi hijo tenga un perro
Pros y contras de que mi hijo tenga un perroPros y contras de que mi hijo tenga un perro
Pros y contras de que mi hijo tenga un perro
ALEJANDRO CHÁVEZ
 
Eyjafjardara svak 2017_03_23
Eyjafjardara svak 2017_03_23Eyjafjardara svak 2017_03_23
Eyjafjardara svak 2017_03_23
Erlendur Steinar Friðriksson
 
プログラミング Coq
プログラミング Coqプログラミング Coq
プログラミング Coqltf14
 
Convocatoria Guardavida Eventual
Convocatoria Guardavida EventualConvocatoria Guardavida Eventual
Convocatoria Guardavida Eventual
ComunicacionSocialTecoman2015-2018
 

Viewers also liked (15)

How china is ruled: Unpacking the CCP
How china is ruled: Unpacking the CCPHow china is ruled: Unpacking the CCP
How china is ruled: Unpacking the CCP
 
豆瓣数据架构实践
豆瓣数据架构实践豆瓣数据架构实践
豆瓣数据架构实践
 
Como Planejar sua Campanha Promocional com Brindes | Apresentação 2 de 3
Como Planejar sua Campanha Promocional com Brindes | Apresentação 2 de 3Como Planejar sua Campanha Promocional com Brindes | Apresentação 2 de 3
Como Planejar sua Campanha Promocional com Brindes | Apresentação 2 de 3
 
1114 sasaki-metadata
1114 sasaki-metadata1114 sasaki-metadata
1114 sasaki-metadata
 
Kristumata chedanam chattampiswamikal-englishtranslation
Kristumata chedanam chattampiswamikal-englishtranslationKristumata chedanam chattampiswamikal-englishtranslation
Kristumata chedanam chattampiswamikal-englishtranslation
 
Convegno “ Stress, molestie lavorative e organizzative del lavoro: aspetti pr...
Convegno “ Stress, molestie lavorative e organizzative del lavoro: aspetti pr...Convegno “ Stress, molestie lavorative e organizzative del lavoro: aspetti pr...
Convegno “ Stress, molestie lavorative e organizzative del lavoro: aspetti pr...
 
Laying Down the Groundwork for Financial Stability for Architecture & Enginee...
Laying Down the Groundwork for Financial Stability for Architecture & Enginee...Laying Down the Groundwork for Financial Stability for Architecture & Enginee...
Laying Down the Groundwork for Financial Stability for Architecture & Enginee...
 
CUIDAPP - Cuida tu Ciudad - PITCH
CUIDAPP - Cuida tu Ciudad - PITCHCUIDAPP - Cuida tu Ciudad - PITCH
CUIDAPP - Cuida tu Ciudad - PITCH
 
Pensar, sentir, necesitar
Pensar, sentir, necesitarPensar, sentir, necesitar
Pensar, sentir, necesitar
 
利用規約 The policy about commercial use
利用規約 The policy about commercial use利用規約 The policy about commercial use
利用規約 The policy about commercial use
 
What Rockstars Can Teach You About Kicking Ass With Social Media
What Rockstars Can Teach You About Kicking Ass With Social MediaWhat Rockstars Can Teach You About Kicking Ass With Social Media
What Rockstars Can Teach You About Kicking Ass With Social Media
 
Pros y contras de que mi hijo tenga un perro
Pros y contras de que mi hijo tenga un perroPros y contras de que mi hijo tenga un perro
Pros y contras de que mi hijo tenga un perro
 
Eyjafjardara svak 2017_03_23
Eyjafjardara svak 2017_03_23Eyjafjardara svak 2017_03_23
Eyjafjardara svak 2017_03_23
 
プログラミング Coq
プログラミング Coqプログラミング Coq
プログラミング Coq
 
Convocatoria Guardavida Eventual
Convocatoria Guardavida EventualConvocatoria Guardavida Eventual
Convocatoria Guardavida Eventual
 

Similar to Spark Summit EU talk by Shay Nativ and Dvir Volk

Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
Dvir Volk
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Cost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWSCost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWS
Amazon Web Services
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
0bbleedingedge long-140614012258-phpapp02 lynn-langit
0bbleedingedge long-140614012258-phpapp02 lynn-langit0bbleedingedge long-140614012258-phpapp02 lynn-langit
0bbleedingedge long-140614012258-phpapp02 lynn-langit
Data Con LA
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
Lynn Langit
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
Yahoo Developer Network
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Xiao Li
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Domino Data Lab
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
ScyllaDB
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
Lucidworks
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017
HashedIn Technologies
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5
Mike King
 
AWS Webinar-Introducing Amazon ElastiCache for Redis
AWS Webinar-Introducing Amazon ElastiCache for RedisAWS Webinar-Introducing Amazon ElastiCache for Redis
AWS Webinar-Introducing Amazon ElastiCache for Redis
Amazon Web Services
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 

Similar to Spark Summit EU talk by Shay Nativ and Dvir Volk (20)

Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Cost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWSCost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWS
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
NoSQL
NoSQLNoSQL
NoSQL
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
0bbleedingedge long-140614012258-phpapp02 lynn-langit
0bbleedingedge long-140614012258-phpapp02 lynn-langit0bbleedingedge long-140614012258-phpapp02 lynn-langit
0bbleedingedge long-140614012258-phpapp02 lynn-langit
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5
 
AWS Webinar-Introducing Amazon ElastiCache for Redis
AWS Webinar-Introducing Amazon ElastiCache for RedisAWS Webinar-Introducing Amazon ElastiCache for Redis
AWS Webinar-Introducing Amazon ElastiCache for Redis
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Spark Summit EU talk by Shay Nativ and Dvir Volk

  • 1. Accelerating Spark-ML with Redis modules Dvir Volk, Shay Nativ
  • 2. Hello World Open source. The leading in-memory database The open source home and commercial provider of Redis - cloud and on-premise Senior System Architect at Redis Labs. Redis user and contributor for ~6 years Senior Software Developer at Redis Labs dvirvolk shaynativ
  • 3. A Brief Overview of Redis ● Started in 2009 by Salvatore Sanfilippo ● Mostly a one man show ● Most popular KV store ● Notable Users: ○ Twitter, Netflix, Uber, Groupon, Twitch ○ Many, many more...
  • 4. A Brief Overview of Redis • Key => Data Structure server • In memory disk backed • Optional cluster mode • Embedded Lua scripting • Single Threaded! • Key features: Fast, Flexible, Simple
  • 5. A Lego For Your Database Key "I'm a Plain Text String!" { A: “foo”, B: “bar”, C: “baz” } Strings/Blobs/Bitmaps Hash Tables (objects!) Linked Lists Sets Sorted Sets Geo Sets HyperLogLog { A , B , C , D , E } [ A → B → C → D → E ] { A: 0.1, B: 0.3, C: 100, D: 1337 } { A: (51.5, 0.12), B: (32.1, 34.7) } 00110101 11001110 10101010
  • 6. Redis In Practice ▪ “Front End Database” ▪ Real Time Counters ▪ Ad Serving ▪ Message Queues ▪ Geo Database ▪ Time Series ▪ Cache ▪ Session State ▪ Etc
  • 7. Redis + Spark • Spark-Redis connector • Redis RDD • SparkSQL integration • Redis as a data source • Redis as the final output
  • 8. But Can Redis Do X? Secondary Index? Time Series? Full Text Search? Graph? Machine Learning? AutoComplete? SQL?
  • 9. So You Want a New Feature? • Try a Lua script • Convince @antirez • Fork Redis • Build Your Own Database!
  • 10. Enter Redis Modules • In development since March 2016 • Redis 4.0 RC out soon • Several modules already exist • Key paradigm shift for Redis
  • 12. New Capabilities What Modules Actually Are • Dynamic libraries loaded to redis • Written in C/C++ • Use a C ABI/API isolating redis internals • Near Zero latency access to data New Commands New Data Types
  • 14. LEFTPAD Example 127.0.0.1:6379> MODULE LOAD "./example.so" OK 127.0.0.1:6379> COMMAND INFO EXAMPLE.LEFTPAD 1) 1) "example.leftpad" ... 127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 foo 127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 "_" _____foo
  • 15. Real Module: RediSearch • From-Scratch search index over redis • Uses Strings for holding compressed index data • Includes stemming, exact phrase match, etc. • Fast Fuzzy Auto-complete • Up to X5 faster than Elastic / Solr > FT.SEARCH “lcd tv” FILTER price 100 +inf > FT.SUGGET “lcd” FUZZY
  • 16. Real Module: Indexing • Support for secondary indexes for redis • Supports indexing HASH keys with their properties • Optional raw indexes as data types • SQL-like syntax for querying indexes > IDX.CREATE users_name_age TYPE HASH SCHEMA name STRING age INT32 > IDX.INTO users_name_age HMSET user1 name "alice" age 30 > IDX.FROM users_name_age WHERE "name LIKE 'ali%' AND age < 31" HGETALL $
  • 17. Real Module: JSON • Stores JSON objects into redis • Allows retrieval of part of a document • Allows atomic manipulation of document elements > JSON.SET foo ‘{“name”: {“first”: “bob”, “last”:”doe”}, “age”: 32}` > JSON.GET foo name.first > JSON.SET foo age 33
  • 18. Spark ML + Redis modules
  • 19. Redis + Spark So Far • ML is not addressed specifically • Used for pre-computed results • We felt that we can take it further
  • 20. Addressing The ML Pain • The missing piece of ML: Serving your model – Not standardized – Vendor-lock with cloud platforms – Reliable services are hard to do – If only we had a “database” for this! – Well, maybe we do?
  • 21. Why Modules for ML? With modules we can: • Define data structures for models • Store training output as “hot model” • Perform evaluation directly in Redis • Easily integrate existing C/C++ libs
  • 22. Spark + Modules = AWESOME • Train ML model on Spark • Save model to Redis and get: – High availability – Clustering – Persistence – Performance – Client libraries
  • 23. Spark-ML End-to-End Flow Spark Training Custom Server Model saved to Parquet file Data Loaded to Spark Pre-computed results Batch Evaluation ? ClientApp
  • 24. Adding Redis Into The Mix Redis-ML “Active Model” Any Training Platform ClientApp Spark Training Data Loaded to Spark
  • 25. Redis-ML Module Tree Ensembles Linear Regression Logistic Regression Matrix + Vector Operations More to come...
  • 27. Forest Data Type • A collection of decision trees • Supports classification & regression • Splitter Node can be – Categorical (e.g. day == “Sunday”) – Numerical (e.g. age < 43)
  • 28. Decision Tree Example The famous Titanic survival predictor sex=male?yes no Survived Died Age > 9.5? sibsp > 2.5? Died Survived *sibsp = siblings + spouses
  • 29. Forest Data Type API Add nodes to a tree in a forest: ML.FOREST.ADD <forestId> <treeId> <path> [ [NUMERIC|CATEGORIC] <splitterAttr> <splitterVal> ] | [LEAF] <predVal> ML.FOREST.RUN <forestId> <features> [CLASSIFICATION|REGRESSION] Perform classification/regression of a feature vector: *feature vector is in libSVM format k:v k:v ...
  • 30. Forest Data Type Example > MODULE LOAD "./redis-ml.so" OK > ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L LEAF 1 .R LEAF 0 OK > ML.FOREST.RUN myforest sex:male "1" > ML.FOREST.RUN myforest sex:yes_please "0"
  • 31. Using Redis-ML With Spark scala> import com.redislabs.client.redisml.MLClient scala> import com.redislabs.provider.redis.ml.Forest scala> val rfModel = pipelineModel.stages.last.asInstanceOf[RandomForestClassificationModel] scala> val f = new Forest(rfModel.trees) scala> f.loadToRedis("forest-test", "localhost") scala> val jedis = new Jedis("localhost") scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN, "forest-test", makeInputString (0)) scala> jedis.getClient.getStatusCodeReply res53: String = 1
  • 32. Benchmarking Redis-ML - Spark + Parquet Spark + Redis ML Model Preparation + Save 3785ms 292ms Model Load 2769ms 0ms (model is on memory) Classification (AVG) 13ms 1ms ● Forest size: 15000 trees ● Data: $(SPARK_HOME)/data/mllib/sample_libsvm_data.txt
  • 33. Going Forward - More Features • Implement more Spark-ML model types – SVM – Naive Bayes Classifier – Neural Networks • Integration with Redis’ native types
  • 34. PS: Neural Redis ▪ Developed by Salvatore ▪ Training is done inside redis ▪ Online continuous training process ▪ Builds Fully Connected NNs