Redis accelerates Apache Spark execution by 45 times, when used as a shared distributed in-memory datastore for Spark in analyses like time series data range queries. With the redis module for machine learning, redis-ml, implementation of spark-ml models gains a new real time serving layer that offloads processing of models directly in Redis, allows multiple applications to reuse the same models and speeds up classification and execution of these models by 13x. Join this session to learn more about the Redis Labs’ connector for Apache Spark that enhances production implementations of real-time big data processing.
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manish Gupta
1. Home of Redis
Analytics at the Real-Time Speed of Business
Manish Gupta
CMO, Redis Labs
February 8, 2017
2. 2
Introduction
The open source home and commercial provider of Redis
Enterprise (Redise) software and Redis-as-a-Service.
Open source. The leading in-memory database platform,
supporting any high performance operational, analytical or
hybrid use case.
6,800+ enterprise customers 200+ enterprise customers
60,000+ total customers
Redise Cloud PrivateRedise Cloud Redise Pack ManagedRedise Pack
SERVICE SOFTWARE
5. 5
Spark is Certainly Fast But…
Read to RDD Deserialization Processing Serialization Write to RDD
Analytics & BI
1 2 3 4 5 6
Data SinkData Source
6. 6
Spark SQL &
Data Frame
When Spark Meets Redis
Data Source Serving Layer
Analytics & BI
1 2
Processing
Spark-Redis connector
Read
filtered/sorted
data
Write
filtered/sorted
data
7. 7
Accelerating Spark Time-Series with Redis
Redis is faster by up to 100 times compared to HDFS
and over 45 times compared to Tachyon or Spark
8. 8
Redis-ML
• A complex problem to solve
• Crowded with smart people
• Years of investment
• Tons of open source project
So Why ?
9. 9
Machine / Deep Learning Stages
(1) Training (2) Creating a model (3) Serving the model
10. 10
Machine / Deep Learning Stages
(1) Training (2) Creating a model (3) Serving the model
Homegrown
11. 11
Machine / Deep Learning Stages
(1) Training (2) Creating a model (3) Serving the model
12. 12
Accurate Model is Important
Otherwise….
The Ads serving company example:
If your model is too small, it’s not accurate; if too large, difficult to fit at app layer
13. 13
Real World Challenge
• Ads serving company
• Need to serve 20,000 ads/sec @ 50msec data-center latency
• Runs 1k campaigns 1K random forest
• Each forest has 15K trees
• On average each tree has 7 levels (depth)
14. 14
Why Accurate Models are Complex to Serve ?
Item Calculation Total
Random forest
ops/sec
20K (ads/sec) x
1K (forests) x
15K(trees) x
7 x 0.5 (levels)
1.05 trillion
Max ops/sec on the
strongest AWS
instance vcore
2.6Ghz x
0.9 (OS overhead) x
0.1 (10 lines of code per ops) x
0.1 (Java overhead)
23.4 million
# of vcores needed 2.1 trillion / 23.4 million 44,872
# of c4.8xlarge
instances needed
44,872 / 36 1,247
Total cost
reserved instances
1,247 x $ 9213 ~$11.5M/yr
15. 15
Ads Serving Use-Case w/ and w/o Redise + ML
Homegrown
1,247 x c4.8xlarge 35 x c4.8xlarge
Cut computing infrastructure by
97%
Resource Savings
The Redis ML module
provides native data types
for models like tree
ensembles, linear
regressions, logistic
regressions, matrix and
vector operations, more..
16. 16
Ads Serving Use-Case w/ and w/o Redise + ML
msec msec
x2,000 Faster
Reduced Latency
Unlike training where
Spark does parallel
processing, serving is
a serial process, so
the Redis advantage
increases as number
of forests increase
18. 18
Resources
Dvir Volk : Senior Architect @Redis Labs
http://www.slideshare.net/SparkSummit/spark-summit-eu-talk-by-shay-nativ-and-dvir-volk
Getting Started with Spark and Redis:
https://redislabs.com/docs/getting-started-with-spark-and-redis/
Spark ML package: https://github.com/RedisLabs/spark-redis-ml
Redis ML module: https://github.com/RedisLabsModules/redis-ml
More resources at:
https://redislabs.com/solutions/use-cases/spark-and-redis/