Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Day LA 2016/ NoSQL track - Using Redis Data Structures to Make Your App Blazing Fast, Adi Foulger, Solutions Architect, Redis Labs

261 views

Published on

Open Source Redis is not only the fastest NoSQL database but also the most popular among the new wave of databases running in containers. This talk introduces the data structures used to speed up applications and solve the everyday use cases that are driving Redis' popularity.

Published in: Technology
  • Be the first to comment

Big Data Day LA 2016/ NoSQL track - Using Redis Data Structures to Make Your App Blazing Fast, Adi Foulger, Solutions Architect, Redis Labs

  1. 1. Overclock your Data with Redis
  2. 2. 2 Who? Me: Adi Foulger, Pro Geek @ Redis Labs, the open source home and provider of enterprise Redis About Redis Labs: • 5300+ paying customers, 35k+ free customers • 118k Redis databases managed • Withstood several datacenter outages with no loss of data for customers who chose HA options • Two main products: Redis Cloud, Redis Labs Enterprise Cluster
  3. 3. 3 What? Redis is an open source (BSD licensed), in-memory data structure store, used as database, cache and message broker and much more!
  4. 4. Redis… How does it work?
  5. 5. 5 REmote DIctionary Server • Data Structure Database • An overly simplistic definition : CREATE TABLE redis ( k VARCHAR(512MB) NOT NULL, v VARCHAR(512MB), PRIMARY KEY (k) ); • 8ish data types, 180+ commands, blazing fast • Created by @antirez (a.k.a Salvatore Sanfilippo) • v1.0 August 9th, 2009 … v3.2 May 6th, 2016 • Source: https://github.com/antirez/redis • Website: http://redis.io
  6. 6. 6 Data structures are used by developers like “Lego” building blocks, saving them much coding effort and time Redis : A Data Structure Database Strings Hashes Lists Sets Sorted Sets Bitmaps Hyper- LogLogs Geospatial indexes
  7. 7. 7 Why? Because It Is Fun! • Simplicity  rich functionality, great flexibility • Performance  easily serves 100K’s of ops/sec • Lightweight  ~ 2MB footprint • Production proven (name dropping)
  8. 8. 8 • about how data is stored • about how data is accessed • about efficiency • about performance • about the network • … • Redis is a database construction kit • Beware of Maslow's "Golden" Gavel/Law of Instrument: "If all you have is a hammer, everything looks like a nail" Redis Makes You Think
  9. 9. Okay… so how do I use it?
  10. 10. 10 Key Points About Key Names • Key names are "limited" to 512MB (also the values btw) • To conserve RAM & CPU, try avoid using unnecessarily_longish_names_for_your_redis_keys because they are more expensive to store and compare (unlike an RDBMS's column names, key names are saved for each key-value pair) • On the other hand, don't be too stringent (e.g 'u:<uid>:r') • Although not mandatory, the convention is to use colons (':') to separate the parts of the key's name • Your schema is your keys' names so keep them in order
  11. 11. 11 STRINGs • Are the most basic data type • Are binary-safe • Is used for storing: Strings (duh) – APPEND, GETRANGE, SETRANGE, STRLEN Integers – INCR, INCRBY, DECR, DECRBY Floats – INCRBYFLOAT Bits – SETBIT, GETBIT, BITPOS, BITCOUNT, BITOP http://xkcd.com/171/
  12. 12. 12 Pattern: Caching Calls to the DB • Motivation: quick responses, reduce load on DBMS • How: keep the statement's results using the Redis STRING data type def get_results(sql): hash = md5.new(sql).digest() result = redis.get(hash) if result is None: result = db.execute(sql) redis.set(hash, result) # or use redis.setex to set a TTL for the key return result
  13. 13. 13 The HASH Data Type • Acts as a Redis-within-Redis  contains key-value pairs • Have their own commands: HINCRBY, HINCRBYFLOAT, HLEN, HKEYS, HVALS… • Usually used for aggregation, i.e. keeping related data together for easy fetching/updating (remember that Redis is not a relational database). Example: Using separate keys Using hash aggregation user:1:id  1 user:1 id  1 user:1:fname  Foo fname  Foo user:1:lname  Bar lname  Bar user:1:email  foo@acme.com email  foo@acme.com
  14. 14. 14 Pattern: Avoiding Calls to the DB • Motivation: server-side storage and sharing of transient data that doesn't need a full-fledged RDBMS, e.g. sessions and shopping carts • How: depending on the case, use STRING or HASH to store data in Redis def add_to_cart(session, product, quantity): if quantity > 0: redis.hset('cart:' + userId, product, quantity) else: redis.hrem('cart:' + userId, product) redis.expire('cart:' + userId,Cart_Timeout) def get_cart_contents(session): return redis.hgetall('cart:' + userId)
  15. 15. 15 Pattern: Counting Things • Motivation: statistics, real-time analytics, dashboards, throttling • How #1: use the *INCR commands • How #2: use a little bit of BIT* def user_log_login(uid): joined = redis.hget('user:' + uid, 'joined') d0 = datetime.strptime(joined, '%Y-%m-$d') d1 = datetime.date.today() delta = d1 – d0 redis.setbit('user:' + uid + ':logins', delta, 1) def user_logins_count(uid): return redis.bitcount( 'user:' + uid + ':logins', 0, -1)
  16. 16. 16 De-normalization • Non relational  no foreign keys, no referential integrity constraints • Thus, data normalization isn't practical ( Mostly) • Be prepared to have duplicated data, e.g.: > HSET user:1 country Mordor > HSET user:2 country Mordor … • Tradeoff: • Processing Complexity ↔ Data Volume
  17. 17. 17 LISTs 17 • Lists of strings sorted by insertion order • Usually have a head and a tail • Top n, bottom n, constant length list operations as well as passing items from one list to another are extremely popular and extremely fast
  18. 18. 18 Pattern: Lists of Items • Motivation: keeping track of a sequence, e.g. last viewed profiles • How: use Redis' LIST data type def view_product(uid, product): redis.lpush('user:' + uid + ':viewed', product) redis.ltrim('user:' + uid + ':viewed', 0, 9) … def get_last_viewed_products(uid): return redis.lrange('user:' + uid + ':viewed', 0, -1)
  19. 19. 19 Pattern: Queues • Motivation: a producer-consumer use case, asynchronous job management, e.g. processing photo uploads def enqueue(queue, item): redis.lpush(queue, item) def dequeue(queue): return redis.rpop(queue) # or use brpop for blocking pop
  20. 20. 20 SETs • Unordered collection of strings • ADD, REMOVE or TEST for membership –O(1) • Unions, intersections, differences computed very very fast
  21. 21. 21 Pattern: Searching • Motivation: finding keys in the database, for example all the users • How #1: use a LIST to store key names • How #2: the *SCAN commands def do_something_with_all_users(): first = True cursor = 0 while cursor != 0 or first: first = False cursor, data = redis.scan(cursor, 'user:*') do_something(data)
  22. 22. 22 Pattern: Indexing • Motivation: Redis doesn't have indices, you need to maintain them • How: the SET data type (a collection of unordered unique members) def update_country_idx(country, uid): redis.sadd('country:' + country, uid) def get_users_in_country(country): return redis.smembers('country:' + country)
  23. 23. 23 Pattern: Relationships • Motivation: Redis doesn't have foreign keys, you need to maintain them > SADD user:1:friends 3 4 5 // Foo is social and makes friends > SCARD user:1:friends // How many friends does Foo have? > SINTER user:1:friends user:2:friends // Common friends > SDIFF user:1:friends user:2:friends // Exclusive friends > SUNION user:1:friends user:2:friends // All the friends
  24. 24. 24 ZSETs (Sorted Sets) Are just like SETs: • Members are unique • ZADD, ZCARD, ZINCRBY, … ZSET members have a score that's used for sorting • ZCOUNT, ZRANGE, ZRANGEBYSCORE When the scores are identical, members are sorted alphabetically Lexicographical ranges are also supported: • ZLEXCOUNT, ZRANGEBYLEX
  25. 25. 25 Pattern: Sorting Motivation: anything that needs to be sorted How: ZSETs > ZADD friends_count 3 1 1 2 999 3 > ZREVRANGE friends_count 0 -1 3 1 2 Set members (uids) Scores (friends count)
  26. 26. 26 The SORT Command • A command that sorts LISTs, SETs and SORTED SETs • SORT's syntax is the most complex (comparatively) but SQLers should feel right at home with it: • SORT key [BY pattern] [LIMIT offset count] [GET pattern [GET pattern ...]] [ASC|DESC] [ALPHA] [STORE destination] • SORT is also expensive in terms of complexity  O(N+M*log(M)) • BTW, SORT is perhaps the only ad-hoc-like command in Redis
  27. 27. 27 Pattern: Counting Unique Items • How #1: SADD items and SCARD for the count • Problem: more unique items  more RAM  • How #2: the HyperLogLog data structure > PFADD counter item1 item2 item3 … HLL is a probabilistic data structure that counts (PFCOUNT) unique items Sacrifices accuracy: standard error of 0.81% Gains: constant complexity and memory – 12KB per counter Bonus: HLLs are merge-able with PFMERGE 
  28. 28. 28 Is Redis ACID? (mostly) Yes! • Redis is (mostly) single threaded, hence every operation is o Atomic o Isolated • WATCH/MULTI/EXEC allow something like transactions (no rollbacks) • Server-side Lua scripts ("stored procedures") also behave like transactions • Durability is configurable and is a tradeoff between efficiency and safety • Consistency achieved using the “wait” command
  29. 29. 30 Wait, There's More! • There are additional commands that we didn't cover  • Expiration and eviction policies • Publish/Subscribe • Data persistency and durability • Server-side scripting with Lua • Master-Slave(s) replication • High Availability • Clustering
  30. 30. Modules, huh?
  31. 31. 33 Before • Redis is ubiquitous for fast data, fits lots of cases (Swiss™ Army knife) • Some use cases need special care • Open source has its own agenda
  32. 32. 34 After • Core still fits lots of cases • Module extensions for special cases • A new community-driven ecosystem • “Give power to users to go faster”
  33. 33. 35 What are they? • Dynamically (server-)loaded libraries • Future-compatible • Will be (mostly) written in C • (Almost) As fast as the core • Planned for public release Q3 2016
  34. 34. 36 What can I do with them? • Process: where the data is at • Compose: call core & other modules • Extend: new structures, commands
  35. 35. 37 It’s got layers! • Operational: admin, memory, disk, replication, arguments, replies… • High-level: client-like access to core and modules’ commands • Low-level: (almost) native access to core data structures memory
  36. 36. 38 They make everything faster! 38
  37. 37. 39 Make your own!
  38. 38. 40 Make your own!
  39. 39. 41 Make your own!
  40. 40. 42 Learn More • https://www.youtube.com/watch?v=EglSYFodaqw
  41. 41. Redis in your Big Data
  42. 42. 44 Spark Operation w/o Redis Read to RDD Deserialization Processing Serialization Write to RDD Analytics & BI 1 2 3 4 5 6 Data SinkData Source
  43. 43. 45 Spark SQL & Data Frame Spark Operation with Redis Data Source Serving Layer Analytics & BI 1 2 Processing Spark-Redis connector Read filtered/sorted data Write filtered/sorted data
  44. 44. 46 Accelerating Spark Time-Series with Redis Redis is faster by up to 100 times compared to HDFS and over 45 times compared to Tachyon or Spark
  45. 45. 47 Goal: • Accelerate Hadoop operation by orders of magnitude: ‒ Phase 1 – use Redis as a caching solution for HBase (Hadoop’s default database) and HDFS (Hadoop Distributed File System) ‒ phase 2 – completely replace HBase with Redis Hadoop is Turbo Charged with Redis Milestone: • Demonstrated Hadoop acceleration by HBase caching with Redis Real-life scenario: ~500% acceleration
  46. 46. Thank you

×