Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Why Zalando
trusts in PostgreSQL
A developer’s view on using the most advanced
open-source database
Henning Jacobs - Techn...
Who we are
Henning Jacobs <henning.jacobs@zalando.de>
with Zalando since 2010
NO DBA, NO PostgreSQL Expert
Valentine Gogic...
About Zalando
14 countries
> 1 billion € revenue 2012
> 150,000 products
3 warehouses
Europe's largest online fashion reta...
Our ZEOS Platform
5/37
Our Main Stack
Java → PostgreSQL
Tomcat
CXF-WS
API Schema
Data Schemas
JDBC
SProcWrapper
Main/Production
Postgres
Java 7, ...
Our Main Stacks
Different Use Cases — same Database
Tomcat
CXF-WS
API Schema
Data Schemas
JDBC
SProcWrapper
Tomcat
CXF-WS
...
Some Numbers
> 90 deployment units (WARs)
> 800 production Tomcat instances
> 50 different databases
> 90 database master ...
Stored Procedures
register_customer(..)
find_orders(..)
...
z_api_v13_42API Schemas
stored procedures,
custom types, ...
re...
Stored Procedures
1:1 Mapping using SProcWrapper
@SProcService
public interface CustomerSProcService {
@SProcCall
int regi...
Stored Procedures
Complex Types
@SProcCall
List<Order> findOrders(@SProcParam String email);
JAVA
CREATE FUNCTION find_ord...
Stored Procedures
Experience
Performance benefits
Easy to change live behavior
Validation close to data
Simple transaction...
Stored Procedures
Rolling out database changes
API versioning
Modularization
DB-Diffs
·
Automatic roll-out during deployme...
Stored Procedures & Constraints
Protect Your Most Valuable Asset: Your Data
14/37
Constraints
Ensuring Data Quality
Simple SQL expressions:
Combining check constraints and stored procedures:
CREATE TABLE ...
Constraints
What about MySQL?
The MySQL manual says:
The CHECK clause is parsed but ignored
by all storage engines.
Open B...
Constraints
Custom Type Validation using Triggers
value:
"john.doe@example.org"
key:
recipientEmail
type:
EMAIL_ADDRESS
ct...
Custom Type Validation using Triggers
CREATE FUNCTION validate_value_trigger() RETURNS trigger AS
$$
BEGIN
PERFORM validat...
Custom Type Validation with PL/Python
class TypeValidator(object):
@staticmethod
def load_by_id(id):
tdef = plpy.execute('...
Scaling?
20/37
Sharding
Scaling Horizontally
App
Shard 1
Cust. A-F
Shard 2
Cust. G-K Cust. Y-Z
Shard N
SProcWrapper
21/37
Sharding
SProcWrapper Support
Sharding customers by ID:
Sharding articles by SKU (uses MD5 hash):
Collecting information f...
Sharding
Auto Partitioning Collections
@SProcCall(parallel = true)
void updateStockItems(@SProcParam @ShardKey List<StockI...
Sharding
Bitmap Data Source Providers
24/37
Sharding
Experience
Scalability without sacrificing any SQL features
Start with a reasonable number of shards (8)
Some too...
Fail Safety
Replication
WAL
WAL Slave
with 1 h delay
Hot Standby
for readonly queries
Service IP
App
All databases use str...
Fail Safety
Failover
Service IP for switching/failover
Monitoring with Java and custom web frontend
Failover is manual tas...
Fail Safety
General Remarks
Good hardware
No downtimes allowed
Two data centers
Dedicated 24x7 team
Maintenance
·
G8 serve...
Monitoring
You need it...
Nagios/Icinga
PGObserver
pg_view
·
·
·
29/37
Monitoring
PGObserver
30/37
Monitoring
PGObserver
Locate hot spots
Helps tuning DB performance
Creating the right indices
often a silver bullet
·
Freq...
Monitoring
pg_view
Top-like command
line utility
DBA's daily tool
Analyze live
problems
Monitor data
migrations
·
·
·
·
32...
NoSQL
Relational is dead?
If your project is not particularly vital and/or your team is
not particularly good, use relatio...
NoSQL
Relational is dead?
The key goals of F1's design are:
3. Consistency: The system must provide ACID
transactions, [.....
Document
Column-Family
Key-Value Graph
Aggregate-Oriented
Schemaless
NoSQL – Comparison
Aggregate oriented?
Schemaless?
Sc...
YeSQL
PostgreSQL at Zalando
Fast, stable and well documented code base
Performs as well as (and outperforms some) NoSQL
da...
Thank You!
Please Visit also
tech.zalando.com
Please rate this session!
SProcWrapper – Java library for stored procedure a...
GOTO 2013: Why Zalando trusts in PostgreSQL
Upcoming SlideShare
Loading in …5
×

GOTO 2013: Why Zalando trusts in PostgreSQL

NoSQL is on the rise but sadly when people compare the usual NoSQL candidates (Redis, MongoDB, Riak, Cassandra, HBase, ..) to relational databases they often only mention MySQL. In our presentation we tried to explain the power of the world’s most advanced opensource database - PostgreSQL. In our session we showed various examples of why we at Zalando trust PostgreSQL to reliably handle all our data. We make use of it in various scenarios, from less complex CRUD applications on a single node, to highly critical and more complex scenarios. This involves customer and order data with strong constraints for high performance and availability, sharded across multiple nodes. We believe that PostgreSQL is massively underrated and that you should have very good reasons to ignore its great features.

  • Be the first to comment

GOTO 2013: Why Zalando trusts in PostgreSQL

  1. 1. Why Zalando trusts in PostgreSQL A developer’s view on using the most advanced open-source database Henning Jacobs - Technical Lead Platform/Software Zalando GmbH Valentine Gogichashvili - Technical Lead Platform/Database Zalando GmbH GOTO Berlin 2013, 2013-10-18
  2. 2. Who we are Henning Jacobs <henning.jacobs@zalando.de> with Zalando since 2010 NO DBA, NO PostgreSQL Expert Valentine Gogichashvili <valentine.gogichashvili@zalando.de> with Zalando since 2010 DBA, PostgreSQL Expert 3/37
  3. 3. About Zalando 14 countries > 1 billion € revenue 2012 > 150,000 products 3 warehouses Europe's largest online fashion retailer · · · · · 4/37
  4. 4. Our ZEOS Platform 5/37
  5. 5. Our Main Stack Java → PostgreSQL Tomcat CXF-WS API Schema Data Schemas JDBC SProcWrapper Main/Production Postgres Java 7, Tomcat 7, PostgreSQL 9.0–9.3 6/37
  6. 6. Our Main Stacks Different Use Cases — same Database Tomcat CXF-WS API Schema Data Schemas JDBC SProcWrapper Tomcat CXF-WS Data Schemas JDBC JPA CherryPy HTTP/REST Data Schemas psycopg2 SQLAlchemy Main/Production CRUD/JPA Scripting/Python Postgres Postgres Postgres Java 7, Tomcat 7, Python 2.7, PostgreSQL 9.0–9.3 7/37
  7. 7. Some Numbers > 90 deployment units (WARs) > 800 production Tomcat instances > 50 different databases > 90 database master instances > 5 TB of PostgreSQL data > 200 developers, 8 DBAs · · · · · · 8/37
  8. 8. Stored Procedures register_customer(..) find_orders(..) ... z_api_v13_42API Schemas stored procedures, custom types, ... register_customer(..) find_orders(..) ... Data Schema(s) tables, views, ... SET search_path .... SELECT register_customer(...) customer order order_address ... z_data 9/37
  9. 9. Stored Procedures 1:1 Mapping using SProcWrapper @SProcService public interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender); } JAVA CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS int AS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id $$ LANGUAGE 'sql' SECURITY DEFINER; SQL 10/37
  10. 10. Stored Procedures Complex Types @SProcCall List<Order> findOrders(@SProcParam String email); JAVA CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_date timestamp, OUT shipping_address order_address) RETURNS SETOF RECORD AS $$ SELECT o_id, o_date, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data.order JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email $$ LANGUAGE 'sql' SECURITY DEFINER; SQL 11/37
  11. 11. Stored Procedures Experience Performance benefits Easy to change live behavior Validation close to data Simple transaction scope Makes moving to new software version easy Cross language API layer CRUD ⇒ JPA · · · · · · · 12/37
  12. 12. Stored Procedures Rolling out database changes API versioning Modularization DB-Diffs · Automatic roll-out during deployment Java backend selects "right" API version - - · shared SQL gets own Maven artifacts feature/library bundles of Java+SQL - - · SQL scripts for database changes Review process Tooling support - - - 13/37
  13. 13. Stored Procedures & Constraints Protect Your Most Valuable Asset: Your Data 14/37
  14. 14. Constraints Ensuring Data Quality Simple SQL expressions: Combining check constraints and stored procedures: CREATE TABLE z_data.host ( h_hostname varchar(63) NOT NULL UNIQUE CHECK (h_hostname ~ '^[a-z][a-z0-9-]*$'), h_ip inet UNIQUE CHECK (masklen(h_ip) = 32), h_memory_bytes bigint CHECK (h_memory_bytes > 8*1024*1024) ); COMMENT ON COLUMN z_data.host.h_memory_bytes IS 'Main memory (RAM) of host in Bytes'; SQL CREATE TABLE z_data.customer ( c_email text NOT NULL UNIQUE CHECK (is_valid_email(c_email)), .. ); SQL 15/37
  15. 15. Constraints What about MySQL? The MySQL manual says: The CHECK clause is parsed but ignored by all storage engines. Open Bug ticket since 2004: http://bugs.mysql.com/bug.php?id=3464 http://dev.mysql.com/doc/refman/5.7/en/create-table.html 16/37
  16. 16. Constraints Custom Type Validation using Triggers value: "john.doe@example.org" key: recipientEmail type: EMAIL_ADDRESS ct_id : serial ct_name : text ct_type : base_type ct_regex : text ... config_type cv_id : serial cv_key : text cv_value : json cv_type_id : int config_value 17/37
  17. 17. Custom Type Validation using Triggers CREATE FUNCTION validate_value_trigger() RETURNS trigger AS $$ BEGIN PERFORM validate_value(NEW.cv_value, NEW.cv_type_id); END; $$ LANGUAGE 'plpgsql'; SQL CREATE FUNCTION validate_value(p_value json, p_type_id int) RETURNS void AS $$ import json import re # ... Python code, see next slide $$ LANGUAGE 'plpythonu'; SQL 18/37
  18. 18. Custom Type Validation with PL/Python class TypeValidator(object): @staticmethod def load_by_id(id): tdef = plpy.execute('SELECT * FROM config_type WHERE ct_id = %d' % int(id))[0] return _get_validator(tdef) def check(self, condition, msg): if not condition: raise ValueError('Value does not match the type "%s". Details: %s' % (self.type_name, msg)) class BooleanValidator(TypeValidator): def validate(self, value): self.check(type(value) is bool, 'Boolean expected') validator = TypeValidator.load_by_id(p_type_id) validator.validate(json.loads(p_value)) PYTHON 19/37
  19. 19. Scaling? 20/37
  20. 20. Sharding Scaling Horizontally App Shard 1 Cust. A-F Shard 2 Cust. G-K Cust. Y-Z Shard N SProcWrapper 21/37
  21. 21. Sharding SProcWrapper Support Sharding customers by ID: Sharding articles by SKU (uses MD5 hash): Collecting information from all shards concurrently: @SProcCall int registerCustomer(@SProcParam @ShardKey int customerId, @SProcParam String email, @SProcParam Gender gender); JAVA @SProcCall Article getArticle(@SProcParam @ShardKey Sku sku); JAVA @SProcCall(runOnAllShards = true, parallel = true) List<Order> findOrders(@SProcParam String email); JAVA 22/37
  22. 22. Sharding Auto Partitioning Collections @SProcCall(parallel = true) void updateStockItems(@SProcParam @ShardKey List<StockItem> items); JAVA 23/37
  23. 23. Sharding Bitmap Data Source Providers 24/37
  24. 24. Sharding Experience Scalability without sacrificing any SQL features Start with a reasonable number of shards (8) Some tooling required Avoid caching if you can and scale horizontally · · · · 25/37
  25. 25. Fail Safety Replication WAL WAL Slave with 1 h delay Hot Standby for readonly queries Service IP App All databases use streaming replication Every database has a (hot) standby and a delayed slave · · 26/37
  26. 26. Fail Safety Failover Service IP for switching/failover Monitoring with Java and custom web frontend Failover is manual task · · · 27/37
  27. 27. Fail Safety General Remarks Good hardware No downtimes allowed Two data centers Dedicated 24x7 team Maintenance · G8 servers from HP ECC RAM, RAID - - · Major PostgreSQL version upgrades?- · · · Concurrent index rebuild, table compactor- 28/37
  28. 28. Monitoring You need it... Nagios/Icinga PGObserver pg_view · · · 29/37
  29. 29. Monitoring PGObserver 30/37
  30. 30. Monitoring PGObserver Locate hot spots Helps tuning DB performance Creating the right indices often a silver bullet · Frequent stored procedure calls Long running stored procedures I/O patterns - - - · · 31/37
  31. 31. Monitoring pg_view Top-like command line utility DBA's daily tool Analyze live problems Monitor data migrations · · · · 32/37
  32. 32. NoSQL Relational is dead? If your project is not particularly vital and/or your team is not particularly good, use relational databases! Martin Fowler, GOTO Aarhus 2012 People vs. NoSQL, GOTO Aarhus 2012 33/37
  33. 33. NoSQL Relational is dead? The key goals of F1's design are: 3. Consistency: The system must provide ACID transactions, [..] 4. Usability: The system must provide full SQL query support [..] Features like indexes and ad hoc query are not just nice to have, but absolute requirements for our business. F1: A Distributed SQL Database That Scales 34/37
  34. 34. Document Column-Family Key-Value Graph Aggregate-Oriented Schemaless NoSQL – Comparison Aggregate oriented? Schemaless? Scaling? Auth*? Use cases? ⇒ "Polyglot Persistence" · SProcWrapper Changes? - - · ⇒ Implicit Schema HStore, JSON - - · · · Introduction to NoSQL, GOTO Aarhus 2012 35/37
  35. 35. YeSQL PostgreSQL at Zalando Fast, stable and well documented code base Performs as well as (and outperforms some) NoSQL databases while retaining deeper exibility Scaling horizontally can be easy Know your tools — whatever they are! · · · · PostgreSQL and NoSQL 36/37
  36. 36. Thank You! Please Visit also tech.zalando.com Please rate this session! SProcWrapper – Java library for stored procedure access github.com/zalando/java-sproc-wrapper PGObserver – monitoring web tool for PostgreSQL github.com/zalando/PGObserver pg_view – top-like command line activity monitor github.com/zalando/pg_view · · · 37/37

×