Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redis and Spark, Dave Neilsen, Developer Relations, Redis Labs

275 views

Published on

Spark is in-memory, Redis is in-memery. The Spark-Redis connector gives Spark access to Redis' data structures as RDDs. Redis, with its blazing fast performance and optimized in-memory data structures, reduces Spark processing time by up to 98%. In this talk, Dave will share the top use cases for Spark-Redis such as time-series, recommendations and real-time bid management.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redis and Spark, Dave Neilsen, Developer Relations, Redis Labs

  1. 1. Home of Redis Analytics at the Speed of Business with Redis and Spark Leena Joshi VP Product Marketing
  2. 2. 2 Who We Are The open source home and commercial provider of Redis Open source. The leading in-memory data structure store, supporting any high performance operational or analytic use case. Redis Cloud Available since mid-2013 6,100+ enterprise customers Redis Labs Enterprise Cluster (RLEC) Available since early-2015 100+ enterprise customers 50,000 + total customers
  3. 3. 3 Redis is a Game Changer Simplicity (through Data Structures) Extensibility (through Redis Modules) Performance ListsSorted Sets Hashes Hyperlog-logs Geospatial Indexes Bitmaps SetsStrings Bit field
  4. 4. Why Use Redis in Analytics
  5. 5. 5 Popular Redis Use Cases Geo SearchData Ingestion Social Functionality Following, Followers, Relations Location-based ApplicationsHigh Throughput Buffering Job & Queue Caching Any Business Application Any Web or Mobile App High Speed Transactions Time-Series Business Applications Analytics Real-time Computations Time-Based Analysis
  6. 6. 6 Example : Redis For Bid Management The Application Problem • Many users bidding on items • Need to instantly show who’s leading, in what order and by how much • May also need to display analytics like how many users are bidding in what range • Disk-based DBMS-es are too slow for real-time, high scale calculations Why Redis Rocks This • Sorted sets automatically keep list of users and scores updated and in order (ZADD) • ZRANGE, ZREVRANGE will get your top users • ZRANK will get any users rank instantaneously • ZCOUNT will return a count of users in a range, • ZRANGEBYSCORE will return all the users in a range by their bids
  7. 7. 7 Redis Sorted Sets ZADD item:1 10000 id:2 21000 id: 1 ZADD item:1 34000 id:3 35000 id 4 ZINCRBY item1:1 10000 id:3 ZREVRANGE item:1 0 0 id:3 Item: 1 id:3 44000 id:4 35000 id:1 id:2 21000 10000
  8. 8. 8 Example : Redis For Recommendations The Application Problem • Users, items, likes, dislikes, similarities • Set comparisons of user likes, user dislikes should help create similarity scores, which can then be stored in a sorted set • Set comparisons of similar user likes/dislikes with items not purchased by current user should yield suggestions • High speed and low latency requirements Why Redis Rocks This • Redis Sets are unordered collections of strings- SADD to add objects to each tag • Set operations executed in – memory, blazing fast speeds • SINTER, SINTERSTORE to intersect multiple sets • SUNIONSTORE to add multiple sets • SISMEMBER to determine membership, SMEMBERS to retrieve all values • Sets and Sorted sets combined are a great choice for recommendation engines
  9. 9. 9 Redis Sets SADD item:1 tag:1 tag:22 tag:24 SADD tag:1 item:1 SADD tag: 2 item:22 item:14 item:3 SINTER tag1 tag2 item:3 SUNIONSTORE tag:x tag1 tag2 SMEMBERS tag:x item:1 item:3 item:22 item:14 item:3 item 1 {tag:1, tag:22, tag:24} {item:1, item:3}tag 1 {item:22, item:14, item: 3}tag 2 {item:1, item:22, item:14, item: 3}tag x
  10. 10. Redis & Spark
  11. 11. 11Redis Labs proprietary & confidential information Spark Operation w/o Redis Read to RDD Deserialization Processing Serialization Write to RDD Analytics & BI 1 2 3 4 5 6 Data SinkData Source
  12. 12. 12Redis Labs proprietary & confidential information Spark SQL & Data Frame Spark Operation with Redis Data Source Serving Layer Analytics & BI 1 2 Processing Spark-Redis connector Read filtered/sorted data Write filtered/sorted data
  13. 13. 13Redis Labs proprietary & confidential information Accelerating Spark Time-Series with Redis Redis is faster by upto 100 times compared to HDFS and over 45 times compared to Tachyon or Spark
  14. 14. 14 More Details About the Redis & Spark Integration Github link: Spark-Redis Connector Package https://github.com/RedisLabs/spark-redis How to get started with Spark and Redis: https://redislabs.com/solutions/spark-and-redis Blog: https://redislabs.com/blog/connecting-spark- and-redis
  15. 15. Cost Effective Analytics
  16. 16. 16 Price/Performance of Memory Technology
  17. 17. 17 Redis on Flash Flash used as a RAM extender and NOT as persistent storage
  18. 18. 18 How to Achieve Optimal Price/Performance By dynamically setting RAM/Flash ratio Behind the scenes…
  19. 19. 19 Single Server Results with Dell & Samsung NVMe read write read write Avg: 2.04M ops/sec Max: 2.14M ops/sec Avg: 0.91msec Max: 0.98 msec % below 1msec: 100% Avg: 313RMB / 9.4WMB Max: 1.71RGB / 96WMB Avg: 1.45Gbps (Tx) / 0.97Gbps (Rx) Max: 1.6Gbps (Tx) / 1.2Gbps (Rx) Test setup: • Redis Labs Enterprise Cluster v3.2 • Dell Xeon CPU E5- 2670 v3 @ 2.50GHz • 4x Samsung NVMe PM1725 • Memtier benchmark- open source tool • 100B object size • 80% read • 20% write Throughput – ops/sec Latency – msec Disk Bandwidth – MB/sec NW Bandwidth – Gb/sec >2M Ops/sec, <1 ms latency, > 1GB disk bandwidth
  20. 20. 20 Customer Example : Redis on Flash • Genome dataset: 31TBs of raw data • Optimized data set through encoding and using Redis Hashes • Resulting data runs high speed analyses with 55GB of RAM and 4.5TB of Flash • 97% annual savings compared to a pure RAM solution Redis on RAM Redis on Flash RAM Size 5TB 0.5TB Flash size N/A 4.5TB Servers on AWS : 21x r3.8xlarge on P8: 2x s822 LC 1yr costs $489,333 $15,677 P8 savings 97%
  21. 21. 21 RLEC Flash on AWS SSDs - Customer Example • Next gen community engagement platform , >200 M unique users per month • Uses Redis as their only database for handling 400k-1M user requests/day (peak of 500k messages/sec on AWS) • RLEC Flash on AWS SSD instances helps reduce operational costs by up to 70% “I am yet to encounter limits with Redis Labs’ scalability. It allows me to handle peaks in traffic that grow 2000% without any need to scale my database infrastructure.” Ishay Green CTO Spot.IM
  22. 22. Extending Redis Analytics 22
  23. 23. 23 What Can Modules Do 23 • All modules are certified by Redis Labs for full compliance with OSS Redis, Redis Cloud and Redis Labs Enterprise Cluster (RLEC) Full Text Search Enhanced JSON Graph Operations Secondary Indexes Linear Algebra SQL Support Image Processing N-Dimension Queries …
  24. 24. 24 24 3.15 2.40 21.00 8.70 24.57 10.61 0.00 5.00 10.00 15.00 20.00 25.00 30.00 Full text search Prefix search Average Latency (msec) RLEC Elasticsearch Solr 20,045 6,831 690 3,686 621 3,133 0 5,000 10,000 15,000 20,000 25,000 Full text search Prefix search Ops/sec RLEC Elasticsearch Solr 85% higher 32x higher 7.8x faster 4.1x faster redisearch The world fastest text search engine
  25. 25. 25 Redis Module Hub (www.redismodules.com)
  26. 26. 26Redis Labs proprietary & confidential information Next Steps Learn More: Redis with Spark: https://redislabs.com/solutions/spark-and-redis Redis on Flash : https://redislabs.com/solutions/redis-for-very-large- datasets Redis Modules : www.redismodules.com 26
  27. 27. Home of Redis Questions? @socialeena

×