Leveraging Hadoop in your PostgreSQL Environment


Published on

This talk will begin with a discussion of the strengths of PostgreSQL and Hadoop. We will then lead into a high level overview of Hadoop and its community of projects like Hive, Flume and Sqoop. Finally, we will dig down into various use cases detailing how you can leverage Hadoop technologies for your PostgreSQL databases today. The use cases will range from using HDFS for simple database backups to using PostgreSQL and Foreign Data Wrappers to do low latency analytics on your Big Data.

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Leveraging Hadoop in your PostgreSQL Environment

  1. 1. Postgres & Hadoop
  2. 2. Who am I? Jim Mlodgenski CTO, OpenSCG Co-organizer, NYCPUG Co-organizer, Philly PUG Co-chair, PGConf US jim@openscg.com @jim_mlodgenski
  3. 3. Agenda Strengths of PostgreSQL Strengths of Hadoop Hadoop Community Use Cases
  4. 4. Best of Both World Postgres World’s most advanced open source database solution Enterprise class including MVCC, streaming replication & rich data type support (to name a few!) Robust transaction support with strong ANSI-SQL compliance Hadoop Big data distributed framework Reliable, massively scalable & proven Failures handled at the application layer allowing commodity hardware
  5. 5. Strengths of PostgreSQL Strong Data Types Concurrency Transactions Security Indexes Connectors
  6. 6. Components of PostgreSQL Database Connectors – JDBC – ODBC – Libpq Foreign Data Wrappers And more...
  7. 7. Strengths of Hadoop Parallelism Flexibility Redundancy Scalability
  8. 8. Components of Hadoop HDFS Hive Flume Sqoop ZooKeeper Hbase And many more...
  9. 9. HDFS Hadoop Distributed File System
  10. 10. Hbase Modeled after Google BigTable Column-oriented database on top of HDFS
  11. 11. ZooKeeper Distributed Configuration Service Supports synchronization and distributed locking Automatic leader election
  12. 12. Hive Adds SQL on Hadoop Converts SQL (HQL) to MapReduce Jobs
  13. 13. Flume Streams data into HDFS Distributed and Highly Available
  14. 14. Sqoop Allows for bulk transfers of data between Hadoop and a RDBMS
  15. 15. Hadoop Community Much more like the Linux community than the PostgreSQL community Some competing commercial interests makes the direction unclear to some
  16. 16. Use Cases
  17. 17. Hive Metastore All of the meta data of the Hive tables reside in a RDBMS The default is to use Derby – Limits to a single connection
  18. 18. Hive Metastore (cont.) Use PostgreSQL for scalability and reliability Many concurrent users
  19. 19. PostgreSQL Backups PostgreSQL's WAL archiving and Point In Time Recovery is powerful – But it requires a lot of storage Typically used with some sort of NFS
  20. 20. PostgreSQL Backups (cont.) Use HDFS – Redundancy & Scalability
  21. 21. PostgreSQL Backups (cont.) Archive Command archive_command = 'hadoop dfs ­copyFromLocal %p /user/postgres/wal/%f'
  22. 22. Log Files Maintain log files for months or years May use Syslog to consolidate multiple database logs Turning on query logging makes the log file huge
  23. 23. Log Files (cont.) Use Flume Consolidates logs across databases MapReduce allows for parallel analysis
  24. 24. Log Files (cont.) Setup Syslog to forward messages to Flume rsyslog.conf: *.* @ Configure Flume to act as a Syslog server pglogs.sources.sl.type = syslogudp pglogs.sources.sl.port = 5140 pglogs.sources.sl.host =
  25. 25. Log Files (cont.) MapReduce jobs can quickly analyze the logs public static class MapClass extends MapReduceBase implements Mapper<StatementOffset, Text, Text, LongWritable> { private final static String STATEMENT_DELIM = "statement: "; private final static String SYSLOG_IDENT = "postgres"; private final static LongWritable one = new LongWritable(1); public void map(StatementOffset key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException { String line = value.toString(); if (line.startsWith(SYSLOG_IDENT) && line.contains(STATEMENT_DELIM)) { output.collect(getStatementType(line), one); } } ...
  26. 26. Transaction History History Tables grow very rapidly Maintaining the tables over time is a huge undertaking Partitioning frequently used
  27. 27. Transaction History (cont.) Use Sqoop – Add a sequence to the table for fast incremental loads
  28. 28. OLAP Cubes Can take a very long time to build PostgreSQL will use only a single CPU Drilling down to the details can be a very long query
  29. 29. OLAP Cubes Use a Foreign Data Wrapper Looks like a native table to reporting tools Drill down takes place on Hadoop
  30. 30. OLAP Cubes (cont.) Create a Foreign Server CREATE EXTENSION hadoop_fdw; CREATE SERVER hadoop_server FOREIGN DATA WRAPPER hadoop_fdw OPTIONS (address '', port '10000'); CREATE USER MAPPING FOR PUBLIC SERVER hadoop_server;
  31. 31. OLAP Cubes (cont.) Create a Foreign Table CREATE FOREIGN TABLE order_line ( ol_w_id integer, ol_d_id integer, ol_o_id integer, ol_number integer, ol_i_id integer, ol_delivery_d timestamp, ol_amount decimal(6,2), ol_supply_w_id integer, ol_quantity decimal(2,0), ol_dist_info varchar(24) ) SERVER hadoop_server OPTIONS (table 'order_line');
  32. 32. OLAP Cubes (cont.) Loading PostgreSQL aggregate tables is a simple SQL statement Use Hive views for more complex aggregations INSERT INTO item_sale_month SELECT ol_i_id as i_id, EXTRACT(YEAR FROM ol_delivery_d) as year, EXTRACT(MONTH FROM ol_delivery_d) as month, sum(ol_amount) as amount FROM order_line GROUP BY 1, 2, 3;
  33. 33. OLAP Cubes (cont.) Drill downs pass the processing down to Hive postgres=# explain verbose select sum(ol_amount) from order_line where ol_i_id = 34928; QUERY PLAN ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ ­­­­­Aggregate (cost=11002.50..11002.51 rows=1 width=14) Output: sum(ol_amount) ­> Foreign Scan on public.order_line (cost=10000.00..11000.00 rows=1000 width=14) Output: ol_w_id, ol_d_id, ol_o_id, ol_number, ol_i_id, ol_delivery_d, ol_amount, ol_supply_w_id, ol_quantity, ol_dist_info Remote SQL: SELECT * FROM order_line WHERE ((ol_i_id = 34928)) (5 rows)
  34. 34. Audit History All database access should be audited and autonomously logged Must be maintained for years
  35. 35. Audit History (cont.) Use the Hadoop Foreign Data Wrapper to Flume
  36. 36. Audit History (cont.) Create a writable foreign table CREATE FORIEGN TABLE audit ( audit_id bigint, event_d timestamp, table varchar, action varchar, user varchar, ) SERVER hadoop_server OPTIONS (table 'audit', flume_port '44444');
  37. 37. Message Queue Tables have a lot of churn with many updates and deletes Causes a lot of table and index bloat in PostgreSQL AKA a vacuuming nightmare
  38. 38. Message Queue (cont.) Use an FDW to Hbase Hbase is not an “Eventually Consistent” architecture so it is ideal for message queues
  39. 39. Message Queue (cont.) Create a writable foreign table CREATE FOREIGN TABLE hbase_table ( key varchar, value varchar ) SERVER hadoop_server OPTIONS (table 'hbase_table', hbase_address 'localhost', hbase_port '9090', hbase_mapping ':key,cf:val'); INSERT INTO hbase_table VALUES ('key1', 'value1'); INSERT INTO hbase_table VALUES ('key2', 'value2'); UPDATE hbase_table SET value = 'update' WHERE key = 'key2'; DELETE FROM hbase_table WHERE key='key1'; SELECT * from hbase_table;
  40. 40. High Availability When setting up replication for high availability many necessary components are not provided by PostgreSQL Failure detection Split brain prevention Replica promotion Notification to clients of fail over
  41. 41. High Availability (cont.) ZooKeeper with a custom background worker can handle all of the missing components
  42. 42. High Availability (cont.) Failure Detection – Replicas watch an ephemeral lock created by the master void watch_master() { ... sprintf(root_path, "%s/lock", zookeeper_path); while (!found_master && ! got_sigterm) { elog(DEBUG1, "Looking for the master lock..."); rc = zoo_get_children(zh,
  43. 43. High Availability (cont.) Split brain prevention – master grabs an exclusive zooKeeper lock on startup. Shut down immediately if unsuccessful char *create_lock() { char path[PATH_LEN]; char *buffer; int rc; buffer = (char *) palloc(PATH_LEN); ensure_connected();
  44. 44. High Availability (cont.) Replica promotion – use zooKeeper for ballots of a election. Highest LSN wins void elect_master() { ... recptr = GetWalRcvWriteRecPtr(NULL, NULL); sprintf(lsn, "%X/%08X", (uint32) (recptr >> 32), (uint32) recptr); elog(DEBUG1, "Entering a ballot with an LSN of: %s", lsn); sprintf(path, "%s/lock/%s", zookeeper_path, replica_id); rc = zoo_create(zh, path, lsn, strlen(lsn), &ZOO_OPEN_ACL_UNSAFE, ZOO_EPHEMERAL, buffer, sizeof(buffer)-1); if (rc) { elog(FATAL, "Failure creating zooKeeper path: %s", path); } elog(DEBUG1, "Created a zooKeeper ephemeral path at: %s", buffer); all_votes_in = false; while (!all_votes_in && !got_sigterm) { sprintf(path, "%s/replica", zookeeper_path); rc = zoo_get_children(zh, path, 0, &replicas); if (rc == ZOK) { sprintf(path, "%s/lock", zookeeper_path); rc = zoo_get_children(zh, path, 0, &ballots); if (rc == ZOK) { all_votes_in = true; for(i=0; i < replicas.count; i++) { found = false; for(j=0; j < ballots.count; j++) { if (strcmp(replicas.data[i], ballots.data[j]) == 0) { found = true; break; } } if (!found) { all_votes_in = false; break; } } } } … } for(j=0; j < ballots.count; j++) { if (strcmp(ballots.data[j], replica_id) != 0) { sprintf(path, "%s/lock/%s", zookeeper_path, ballots.data[j]); memset(buffer, 0, sizeof(buffer)); bufferlen= sizeof(buffer); rc = zoo_get(zh, path, 0, buffer, &bufferlen, NULL); if (rc != ZOK) { elog(LOG, "Unable to get %s. New master probably already found...", path); } elog(DEBUG1, "Comparing the LSN: %s", buffer); if (strcmp(lsn, buffer) < 0) { elog(DEBUG1, "Found an LSN greater than mine. I am not the winner."); return; } else if (strcmp(lsn, buffer) == 0) { elog(DEBUG1, "Found an LSN equal to mine. See if I was the first to the start."); if (strcmp(replica_id, ballots.data[j]) > 0 ) { elog(DEBUG1, "Found an LSN equal to mine and a sequence earlier than mine. I am not the winner."); return; } } } } elog(LOG, "Becoming the new master. Acquiring the proper locks."); lock = create_lock(); for(j=0; j < ballots.count; j++) { elog(DEBUG1, "Removing ballot at %s", path); rc = zoo_delete(zh, path, -1); if (rc != ZOK) { elog(LOG, "Unable to delete %s", path); } } if (!has_lock(lock)) { elog(LOG, "Unable to acquire a zooKeeper lock. Shutting down to prevent a split brain scenario"); do_stop(); } else { elog(LOG, "Promoting to become the new master."); do_promote(); } publish_master_info(); }
  45. 45. High Availability (cont.) Client notification – Python (or others) can watch the master and act appropriately def __init__(self,zkHosts,pathName): zkHosts = zkHosts pathName = pathName watchPath = pathName + "/master" zk = KazooClient(hosts=zkHosts) zk.start()
  46. 46. Getting the Components http://hadoop.apache.org/ http://hive.apache.org/ http://flume.apache.org/ http://sqoop.apache.org/ http://zookeeper.apache.org/ http://hbase.apache.org/ http://www.postgresql.org/ http://jdbc.postgresql.org/ http://openjdk.java.net/ http://openscg.com/se/hadoop-fdw/
  47. 47. Or... BigSQL.org
  48. 48. Questions?