Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Really Big Elephants: PostgreSQL DW


Published on

Published in: Technology
  • you may be interested in our challenges of some Oracle to Postgres 9.3 reporting/dwh migration on basic window functionality:
    Are you sure you want to  Yes  No
    Your message goes here

Really Big Elephants: PostgreSQL DW

  1. 1. Really Big Elephants Data Warehousing with PostgreSQL Josh Berkus MySQL User Conference 2011
  2. 2. Included/ExcludedI will cover: ● I wont cover: ● advantages of ● hardware selection Postgres for DW ● EAV / blobs ● configuration ● denormalization ● tablespaces ● DW query tuning ● ETL/ELT ● external DW tools ● windowing ● backups & ● partitioning upgrades ● materialized views
  3. 3. What is a“data warehouse”?
  4. 4. synonyms etc.● Business Intelligence ● also BI/DW● Analytics database● OnLine Analytical Processing (OLAP)● Data Mining● Decision Support
  5. 5. OLTP vs DW● many single-row ● few large batch writes imports● current data ● years of data● queries generated ● queries generated by user activity by large reports● < 1s response ● queries can run for times hours● 0.5 to 5x RAM ● 5x to 2000x RAM
  6. 6. OLTP vs DW● 100 to 1000 users ● 1 to 10 users● constraints ● no constraints
  7. 7. Why use PostgreSQL fordata warehousing?
  8. 8. Complex QueriesSELECT CASE WHEN ((SUM(inventory.closed_on_hand) + SUM(changes.received) + SUM(changes.adjustments) +SUM(changes.transferred_in-changes.transferred_out)) <> 0) THEN ROUND((CAST(SUM(changes.sold_and_closed +changes.returned_and_closed) AS numeric) * 100) / CAST(SUM(starting.closed_on_hand) + SUM(changes.received) +SUM(changes.adjustments) + SUM(changes.transferred_in-changes.transferred_out) AS numeric), 5) ELSE 0 END AS "Percent_Sold", CASE WHEN (SUM(changes.sold_and_closed) <> 0) THEN ROUND(100*((SUM(changes.closed_markdown_units_sold)*1.0) /SUM(changes.sold_and_closed)), 5) ELSE 0 END AS "Percent_of_Units_Sold_with_Markdown", CASE WHEN (SUM(changes.sold_and_closed * _sku.retail_price) <> 0) THENROUND(100*(SUM(changes.closed_markdown_dollars_sold)*1.0) / SUM(changes.sold_and_closed * _sku.retail_price), 5) ELSE 0 END AS"Markdown_Percent", 0 AS "Percent_of_Total_Sales", CASE WHEN SUM((changes.sold_and_closed + changes.returned_and_closed) * _sku.retail_price) IS NULL THEN 0 ELSESUM((changes.sold_and_closed + changes.returned_and_closed) * _sku.retail_price) END AS "Net_Sales_at_Retail", 0 AS "Percent_of_Ending_Inventory_at_Retail", SUM(inventory.closed_on_hand * _sku.retail_price) AS"Ending_Inventory_at_Retail", "_store"."label" AS "Store", "_department"."label" AS "Department", "_vendor"."name" AS "Vendor_Name"FROM inventory JOIN inventory as starting ON inventory.warehouse_id = starting.warehouse_id AND inventory.sku_id = starting.sku_id LEFT OUTER JOIN ( SELECT warehouse_id, sku_id, sum(received) as received, sum(transferred_in) as transferred_in, sum(transferred_out) as transferred_out, sum(adjustments) as adjustments, sum(sold) as sold FROM movement WHERE movement.movement_date BETWEEN 2010-08-05 AND 2010-08-19 GROUP BY sku_id, warehouse_id ) as changes ON inventory.warehouse_id = changes.warehouse_id AND inventory.sku_id = changes.sku_id JOIN _sku ON = inventory.sku_id JOIN _warehouse ON = inventory.warehouse_id JOIN _location_hierarchy AS _store ON = _warehouse.store_id AND _store.type = Store JOIN _product ON = _sku.product_id JOIN _merchandise_hierarchy AS _department
  9. 9. Complex Queries● JOIN optimization ● 5 different JOIN types ● approximate planning for 20+ table joins● subqueries in any clause ● plus nested subqueries● windowing queries● recursive queries
  10. 10. Big Data Features● big tables partitioning● big databases tablespaces● big backups PITR● big updates binary replication● big queries resource control
  11. 11. Extensibility● add data analysis functionality from external libraries inside the database ● financial analysis ● genetic sequencing ● approximate queries● create your own: ● data types functions ● aggregates operators
  12. 12. Community“Im running a partitioning scheme using 256 tables with a maximumof 16 million rows (namely IPv4-addresses) and a current total ofabout 2.5 billion rows, there are no deletes though, but lots ofupdates.”“I use PostgreSQL basically as a data warehouse to store all thegenetic data that our lab generates … With this configuration I figureIll have ~3TB for my main data tables and 1TB for indexes. ” ● lots of experience with large databases ● blogs, tools, online help
  13. 13. Sweet Spot 0 5 10 15 20 25 30 MySQL PostgreSQLDW Database 0 5 10 15 20 25 30
  14. 14. DW Databases● Vertica ● Netezza● Greenplum ● HadoopDB● Aster Data ● LucidDB● Infobright ● MonetDB● Teradata ● SciDB● Hadoop/HBase ● Paraccel
  15. 15. DW Databases● Vertica ● Netezza● Greenplum ● HadoopDB● Aster Data ● LucidDB● Infobright ● MonetDB● Teradata ● SciDB● Hadoop/HBase ● Paraccel
  16. 16. How do I configure PostgreSQL fordata warehousing?
  17. 17. General Setup● Latest version of PostgreSQL● System with lots of drives ● 6 to 48 drives – or 2 to 12 SSDs ● High-throughput RAID● Write ahead log (WAL) on separate disk(s) ● 10 to 50 GB space
  18. 18. separate theDW workloadonto its own server
  19. 19. Settingsfew connectionsmax_connections = 10 to 40raise those memory limits!shared_buffers = 1/8 to ¼ of RAMwork_mem = 128MB to 1GBmaintenance_work_mem = 512MB to 1GBtemp_buffers = 128MB to 1GBeffective_cache_size = ¾ of RAMwal_buffers = 16MB
  20. 20. No autovacuumautovacuum = offvacuum_cost_delay = off● do your VACUUMs and ANALYZEs as part of the batch load process ● usually several of them● also maintain tables by partitioning
  21. 21. What aretablespaces?
  22. 22. logical data extents● lets you put some of your data on specific devices / disksCREATE TABLESPACE history_logLOCATION /mnt/san2/history_log;ALTER TABLE history_log TABLESPACEhistory_log;
  23. 23. tablespace reasons● parallelize access ● your largest “fact table” on one tablespace ● its indexes on another – not as useful if you have a good SAN● temp tablespace for temp tables● move key join tables to SSD● migrate to new storage one table at a time
  24. 24. What is ETLand how do I do it?
  25. 25. Extract, Transform, Load● how you turn external raw data into normalized database data ● Apache logs → web analytics DB ● CSV POS files → financial reporting DB ● OLTP server → 10-year data warehouse● also called ELT when the transformation is done inside the database ● PostgreSQL is particularly good for ELT
  26. 26. L: INSERT● batch INSERTs into 100s or 1000s per transaction ● row-at-a-time is very slow● create and load import tables in one transaction● add indexes and constraints after load● insert several streams in parallel ● but not more than CPU cores
  27. 27. L: COPY● Powerful, efficient delimited file loader ● almost bug-free - we use it for backup ● 3-5X faster than inserts ● works with most delimited files● Not fault-tolerant ● also have to know structure in advance ● try pg_loader for better COPY
  28. 28. L: COPYCOPY weblog_new FROM/mnt/transfers/weblogs/weblog-20110605.csv with csv;COPY traffic_snapshot FROMtraffic_20110605192241 delimiter| nulls as N;copy weblog_summary_june TODesktop/weblog-june2011.csv withcsv header;
  29. 29. L: in 9.1: FDWCREATE FOREIGN TABLE raw_hits( hit_time TIMESTAMP, page TEXT )SERVER file_fdwOPTIONS (format csv, delimiter;, filename /var/log/hits.log);
  30. 30. L: in 9.1: FDWCREATE TABLE hits_2011041617 ASSELECT page, count(*)FROM raw_hitsWHERE hit_time > 2011-04-16 16:00:00 AND hit_time <= 2011-04-16 17:00:00GROUP BY page;
  31. 31. T: temporary tablesCREATE TEMPORARY TABLEON COMMIT DROPsales_records_june_rollup ASSELECT seller_id, location, sell_date, sum(sale_amount), array_agg(item_id)FROM raw_salesWHERE sell_date BETWEEN 2011-06-01 AND 2011-06-30 23:59:59.999GROUP BY seller_id, location, sell_date;
  32. 32. in 9.1: unlogged tables● like myISAM without the riskCREATE UNLOGGED TABLEcleaned_log_importAS SELECT hit_time, pageFROM raw_hits, hit_watermarkWHERE hit_time > last_watermark AND is_valid(page);
  33. 33. T: stored procedures● multiple languages ● SQL PL/pgSQL ● PL/Perl PL/Python PL/PHP ● PL/R PL/Java ● allows you to use exernal data processing libraries in the database● custom aggregates, operators, more
  34. 34. CREATE OR REPLACE FUNCTION normalize_query ( queryin text )RETURNS TEXT LANGUAGE PLPERL STABLE STRICT AS $f$# this function "normalizes" queries by stripping out constants.# some regexes by Guillaume Smet under The PostgreSQL License.local $_ = $_[0];#first cleanup the whitespace s/s+/ /g; s/s,/,/g; s/,(S)/, $1/g; s/^s//g; s/s$//g;#remove any double quotes and quoted text s///g; s/[^]*//g; s/()+//g;#remove TRUE and FALSE s/(W)TRUE(W)/$1BOOL$2/gi; s/(W)FALSE(W)/$1BOOL$2/gi;#remove any bare numbers or hex numbers s/([^a-zA-Z_$-])-?([0-9]+)/${1}0/g; s/([^a-z_$-])0x[0-9a-f]{1,10}/${1}0x/ig;#normalize any IN statements s/(INs*)([0x,s]*)/${1}(...)/ig;#return the normalized queryreturn $_;$f$;
  35. 35. CREATE OR REPLACE FUNCTION f_graph2() RETURNS text AS sql <- paste("SELECT id as x,hit as y FROM mytemp LIMIT30",sep="");str <- c(pg.spi.exec(sql));mymain <- "Graph 2";mysub <- paste("The worst offender is: ",str[1,3]," with",str[1,2]," hits",sep="");myxlab <- "Top 30 IP Addresses";myylab <- "Number of Hits";pdf(/tmp/graph2.pdf);plot(str,type="b",main=mymain,sub=mysub,xlab=myxlab,ylab=myylab,lwd=3);mtext("Probes by intrusive IP Addresses",side=3);;print(DONE); LANGUAGE plr;
  36. 36. ELT Tips● bulk insert into a new table instead of updating/deleting an existing table● update all columns in one operation instead of one at a time● use views and custom functions to simplify your queries● inserting into your long-term tables should be the very last step – no updates after!
  37. 37. Whats awindowing query?
  38. 38. regular aggregate
  39. 39. windowing function
  40. 40. TABLE events ( event_id INT, event_type TEXT, start TIMESTAMPTZ, duration INTERVAL, event_desc TEXT);
  41. 41. SELECT MAX(concurrent)FROM ( SELECT SUM(tally) OVER (ORDER BY start) AS concurrent FROM ( SELECT start, 1::INT as tally FROM events UNION ALL SELECT (start + duration), -1 FROM events ) AS event_vert) AS ec;
  42. 42. UPDATE partition_name SET drop_month = dropitFROM (SELECT round_id, CASE WHEN ( ( row_number() over (partition by team_id order by team_id, total_points) ) <= ( drop_lowest ) ) THEN 0 ELSE 1 END as dropit FROM ( SELECT team.team_id, round.round_id, month_points as total_points, row_number() OVER ( partition by team.team_id, kal.positions order by team.team_id, kal.positions, month_points desc ) as ordinal, at_least, numdrop as drop_lowest FROM partition_name as rdrop JOIN round USING (round_id) JOIN team USING (team_id) JOIN pick ON round.round_id = pick.round_id and pick.pick_period @> this_period LEFT OUTER JOIN keep_at_least kal ON rdrop.pool_id = kal.pool_id and pick.position_id = any ( kal.positions ) WHERE rdrop.pool_id = this_pool AND team.team_id = this_team ) as ranking WHERE ordinal > at_least or at_least is null ) as droplow WHERE droplow.round_id = partition_name .round_id AND partition_name .pool_id = this_pool AND dropit = 0;
  43. 43. SELECT round_id, CASE WHEN ( ( row_number() OVER (partition by team_id order by team_id, total_points) ) <= ( drop_lowest ) ) THEN 0 ELSE 1 END as dropitFROM ( SELECT team.team_id, round.round_id, month_points as total_points, row_number() OVER ( partition by team.team_id, kal.positions order by team.team_id, kal.positions, month_points desc ) as ordinal
  44. 44. stream processing SQL● replace multiple queries with a single query ● avoid scanning large tables multiple times● replace pages of application code ● and MB of data transmission● SQL alternative to map/reduce ● (for some data mining tasks)
  45. 45. How do I partition my tables?
  46. 46. Postgres partitioning● based on table inheritance and constraint exclusion ● partitions are also full tables ● explicit constraints define the range of the partion ● triggers or RULEs handle insert/update
  47. 47. CREATE TABLE sales ( sell_date TIMESTAMPTZ NOT NULL, seller_id INT NOT NULL, item_id INT NOT NULL, sale_amount NUMERIC NOT NULL, narrative TEXT );
  48. 48. CREATE TABLE sales_2011_06 ( CONSTRAINT partition_date_range CHECK (sell_date >= 2011-06-01 AND sell_date < 2011-07-01 ) ) INHERITS ( sales );
  49. 49. CREATE FUNCTION sales_insert ()RETURNS triggerLANGUAGE plpgsql AS $f$BEGIN CASE WHEN sell_date < 2011-06-01 THEN INSERT INTO sales_2011_05 VALUES (NEW.*) WHEN sell_date < 2011-07-01 THEN INSERT INTO sales_2011_06 VALUES (NEW.*) WHEN sell_date >= 2011-07-01 THEN INSERT INTO sales_2011_07 VALUES (NEW.*) ELSE INSERT INTO sales_overflow VALUES (NEW.*) END; RETURN NULL;END;$f$;CREATE TRIGGER sales_insert BEFORE INSERT ON salesFOR EACH ROW EXECUTE PROCEDURE sales_insert();
  50. 50. Postgres partitioning● Good for: ● Bad for: ● “rolling off” data ● administration ● DB maintenance ● queries which do ● queries which use not use the the partition key partition key ● under 300 ● JOINs partitions ● over 300 partitions ● insert performance ● update performance
  51. 51. you need a data expiration policy● you cant plan your DW otherwise ● sets your storage requirements ● lets you project how queries will run when database is “full”● will take a lot of meetings ● people dont like talking about deleting data
  52. 52. you need a data expiration policy● raw import data 1 month● detail-level transactions 3 years● detail-level web logs 1 year● rollups 10 years
  53. 53. Whats amaterialized view?
  54. 54. query results as table● calculate once, read many time ● complex/expensive queries ● frequently referenced● not necessarily a whole query ● often part of a query● manually maintained in PostgreSQL ● automagic support not complete yet
  55. 55. SELECT page, COUNT(*) as total_hitsFROM hit_counterWHERE date_trunc(day, hit_date) BETWEEN ( now() AND now() - INTERVAL 7 days )ORDER BY total_hits DESC LIMIT 10;
  56. 56. CREATE TABLE page_hits ( page TEXT, hit_day DATE, total_hits INT, CONSTRAINT page_hits_pk PRIMARY KEY(hit_day, page));
  57. 57. each day:INSERT INTO page_hitsSELECT page, date_trunc(day, hit_date) as hit_day, COUNT(*) as total_hitsFROM hit_counterWHERE date_trunc(day, hit_date) = date_trunc(day, now() - INTERVAL 1 day)ORDER BY total_hits DESC;
  58. 58. SELECT page, total_hitsFROM page_hitsWHERE hit_date BETWEEN now() AND now() - INTERVAL 7 days;
  59. 59. maintaining matviewsBEST: update matviews at batch load timeGOOD: update matview according to clock/calendarBAD for DW: update matviews using a trigger
  60. 60. matview tips● matviews should be small ● 1/10 to ¼ of RAM● each matview should support several queries ● or one really really important one● truncate + insert, dont update● index matviews like crazy
  61. 61. Contact● Josh Berkus: ● blog:● PostgreSQL: ● pgexperts:● Upcoming Events ● pgCon: Ottawa: May 17-20 ● OpenSourceBridge: Portland: June This talk is copyright 2010 Josh Berkus and is licensed under the creative commons attribution license. Special thanks for materials to: Elein Mustain (PL/R), Hitoshi Harada and David Fetter (windowing functions), Andrew Dunstan (file_FDW)