Your SlideShare is downloading. ×

PostgreSQL and Redis - talk at pgcon 2013


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. PostgreSQL + Redis Andrew Dunstan
  • 2. Topics ● What is Redis? ● The Redis Foreign Data Wrapper ● The Redis Command wrapper for Postgres ● Case study – a high performance Ad server using Postgres and Redis
  • 3. What is Redis? ● High performance in-memory key/value data store
  • 4. Redis is easy to use ● Almost no configuration ● On Fedora sudo yum install redis sudo systemctl enable redis.service sudo systemctl start redis.service redis-cli
  • 5. Redis keys ● Are just strings
  • 6. Redis data values ● Values can be scalars ● Strings ● Integers ● Values can be structured ● Lists ● Sets ● Ordered sets ● Hashes – name value pairs – c.f. Hstore
  • 7. Simple command set ● Nothing like SQL, table joins ● Command set is large but most commands only take 2 or 3 parameters ●
  • 8. Examples - adding values ● SET mykey myvalue ● HMSET myhashkey prop1 val1 prop2 val2 ● SADD mysetkey val1 val2 val3 ● LPUSH mylist val1 val2 ● ZADD myzsetkey 1 val1 5 val2
  • 9. No creation command ● You create an object by setting or adding to it ● Almost schema-less ● Can't use a command for one object to another
  • 10. Redis keys all live in a single global namespace ● No schemas ● No separation by object type ● Very common pattern is to use fine grained keys, like (for a web session) web:111a7c9ff5afa0a7eb598b2c719c7975 ● KEYS command can find keys by pattern: ● KEYS web:* – Dangerous
  • 11. How Redis users do “tables” ● They use a prefix: ● INCR hits:2013.05.25 ● They can find all these by doing ● KEYS hits:* ● Or they keep a set with all the keys for a given type of data ● SADD hitkeyset hits:2013.05.25 ● The application has to make use of these keys – Redis itself won't
  • 12. Redis Client library ● “hiredis” ● Moderately simple ●
  • 13. Redis Foreign Data Wrapper ● ● Originally written by Dave Page ● Brought up to date and extended by me
  • 14. Originally ● Only supported scalar data ● No support for segmenting namespace or use of key sets
  • 15. Updates by me ● All data types supported ● Table key prefixes supported ● Table key sets supported ● Array data returned as a PostgreSQL array literal
  • 16. Hash tables ● Most important type ● Most like PostgreSQL tables ● Best to define the table as having array of text for second column ● Turn that into json, hstore or a record.
  • 17. Example ● CREATE FOREIGN TABLE web_sessions( key text, values text[]) SERVER localredis OPTIONS (tabletype hash, tablekeyprefix 'web:'); SELECT * from web_sessions;
  • 18. Use with hstore ● CREATE TYPE websession AS ( id text, browser text, username text); SELECT populate_record(null::websession, hstore(values)) FROM websessions;
  • 19. Use with json_object ● ● CREATE EXTENSION json_object; SELECT json_object(values) FROM websessions;
  • 20. Key prefix vs Key Set ● Key sets are much faster ● Ad server could not meet performance goals until it switched to using key sets ● Recommended by Redis docs
  • 21. Using a key set to filter rows ● Sort of “where” clause ● Put the keys of the entries you want in a set somehow ● Can use command wrapper ● Define a new foreign table that uses that set as the keyset
  • 22. 9.3 notes ● In 9.3 there is json_populate_record() ● Could avoid use of hstore ● For post 9.3, would be a good idea to have a function converting an array of key value pairs to a record directly
  • 23. Brand new – Singleton Key tables ● Each object is a table, not a row ● Sets and lists come back as single field rows ● Ordered sets come back as one or two field rows – second field can be score ● Hashes come back as rows of key/value
  • 24. Coming soon ● Writable tables ● Supported in upcoming release 9.3
  • 25. Redis Command Wrapper ● Fills in the missing gaps in functionality ● Sponsored by IVC: ●
  • 26. Redis wrapper functionality ● Thin layer over hiredis library ● Four basic functions ● redis_connect() ● redis_disconnect() ● redis_command() ● redis_command_argv()
  • 27. redis_connect() ● First argument is “handle” ● Remaining arguments are all optional ● con_host text DEFAULT ''::text ● con_port integer DEFAULT 6379 ● con_pass text DEFAULT ''::text ● con_db integer DEFAULT 0 ● ignore_duplicate boolean DEFAULT false
  • 28. Redis wrapper connections are persistent ● Unlike FDW package, where they are made at the beginning of each table fetch ● Makes micro operations faster
  • 29. redis_command and redis_command_argv ● Thin layers over similarly named functions in client library ● redis_command has max 4 arguments after command string – for more use redis_command_argv ● Might switch from VARIADIC text[] to VARIADIC “any”
  • 30. Uses ● Push data into redis ● Redis utility statements from within Postgres
  • 31. Higher level functions ● redis_push_record ● con_num integer ● data record ● push_keys boolean ● key_set text ● key_prefix text ● key_fields text[]
  • 32. Why use Redis? ● Did I mention it's FAST? ● But not safe
  • 33. Our use case ● An ad server for the web ● If Redis crashes, not a tragedy ● If it's slow, it's a tragedy
  • 34. Ad Server Project by IVC Remaining slides are mostly info from IVC
  • 35. System Goals ● Serve 10,000 ads per second per application server cpu ● Use older existing hardware ● 5 ms for Postgres database to filter from 100k+ total ads to ~ 30 that can fit a page and meet business criteria ● 5 ms to filter to 1-5 best ads per page using statistics from Redis for freshness, revenue maximization etc. ● Record ad requests, confirmations and clicks. ● 24x7 operation with automatic fail over
  • 36. Physical View 802.3ad /4 /2ea /4 /4 /4 /4 /4 Cisco 3750 stacked 1G HSRP Xen Hosts SLES 11.2 Dell R810 128G Intel e6540 24cores SLES 11.2 Dell 2950 32G Intel e5430 8 cores
  • 37. Redundancy View Cisco HSRP Keepalived NGINX Node Sentinel Tier 1 Client Tier 2 Web Tier 3 Application Tier 4 Database Shorewall Keepalived Redis Multiple Instances Postgres 9.2 Sentinel TransactionDB Pgpool Hot Replication Business DB Hot Replication Data Warehouse DB Hot Replication Skytools Londiste3
  • 38. Postgres databases ● 6 Postgres databases ● Two for business model – master and streaming hot standby (small VM) ● Two for serving ads – master and streaming hot standby (physical Dell 2950) ● Two for for storing clicks and impressions – master and hot standby (physical Del 2950) ● Fronted by redundant pg pool load balancers with fail over and automated db fail over.
  • 39. Business DB ● 30+ tables ● Example tables: ads, advertisers, publishers, ip locations ● Small number of users that manipulate the data (< 100) ● Typical application and screens ● Joining too slow to serve ads ● Tables get materialized into 2 tables in the ad serving database
  • 40. ● Two tables ● First has ip ranges so we know where the user is coming from. Ad serving is often by country, region etc. ● Second has ad sizes, ad types, campaigns, keywords, channels, advertisers etc. ● Postgres inet type and index was a must have to be successful for table one ● Tsquery/tsvector, boxes, arrays were all a must have for table two (with associated index types) Ad Serving Database
  • 41. Ad serving Database ● Materialized and copied from Business database every 3 minutes ● Indexes are created and new tables are vacuum analyzed then renamed. ● Performance goals were met. ● We doubt this could be done without Postgres data types and associated indexes ● Thanks
  • 42. Recording Ad requests/confirmations and clicks ● At 10k/sec/cpu recording ads one row at a time + updates on confirmation is too slow ● Approach: record in Redis, update in Redis and once every six minutes we batch load from Redis to Postgres. - FDW was critical. ● Partitioning (inheritance) with constraint exclusion to segregate data by day using nightly batch job. One big table with a month's worth of data would not work. ● Table partitioning is not cheap in the leading commercial product. ● Thanks
  • 43. Recording DB continued. ● Used heavily for reporting. ● Statistics tables (number of clicks, impressions etc.) are calculated every few minutes on today's data ● Calculated nightly for the whole day tables ● For reporting we needed some business data so we selectively replicate business tables in the ad recording database using Skytools. DB linking tables is too slow when joining.
  • 44. Recording DB cont'd ● Another usage is fraud detection. ● Medium and long term frequency fraud detection is one type of fraud that this database is used for.
  • 45. Redis ● In memory Database. ● Rich type support. ● Multiple copies and replication. ● Real time and short term fraud detection ● Dynamic pricing ● Statistical best Ad decision making ● Initial place to record and batch to Postgres ● Runs on VM with 94Gb of dedicated RAM.
  • 46. Redis cont'd ● FDW and commands reduce the amount of code we had to write dramatically ● FDW good performance characteristics. ● Key success factor: In memory redis DB + postgres relational DB.
  • 47. Postgres – Redis interaction ● Pricing data is pushed to Redis from Business DB via command wrapper ● Impression and Click data is pulled from Redis into Recording DB via Redis FDW
  • 48. Current Status ● In production with 4 significant customers since March 1 ● Scaling well
  • 49. Conclusions ● Postgres' rich data types and associated indexes were absolutely essential ● Redis + Postgres with good FDW integration was the second key success factor ● Node.js concurrency was essential in getting good application throughput ● Open source allowed the system to be built for less than 2% of the cost of a competing commercial system
  • 50. Questions?