This talk will begin with a discussion of the strengths of PostgreSQL and Hadoop. We will then lead into a high level overview of Hadoop and its community of projects like Hive, Flume and Sqoop. Finally, we will dig down into various use cases detailing how you can leverage Hadoop technologies for your PostgreSQL databases today. The use cases will range from using HDFS for simple database backups to using PostgreSQL and Foreign Data Wrappers to do low latency analytics on your Big Data.
2. Who am I?
Jim Mlodgenski
CTO, OpenSCG
Co-organizer, NYCPUG
Co-organizer, Philly PUG
Co-chair, PGConf US
jim@openscg.com
@jim_mlodgenski
3. Agenda
Strengths of PostgreSQL
Strengths of Hadoop
Hadoop Community
Use Cases
4. Best of Both World
Postgres
World’s most advanced open
source database solution
Enterprise class including MVCC,
streaming replication & rich data
type support (to name a few!)
Robust transaction support with
strong ANSI-SQL compliance
Hadoop
Big data distributed framework
Reliable, massively scalable &
proven
Failures handled at the application
layer allowing commodity
hardware
5. Strengths of PostgreSQL
Strong Data Types
Concurrency
Transactions
Security
Indexes
Connectors
6. Components of PostgreSQL
Database
Connectors
– JDBC
– ODBC
– Libpq
Foreign Data Wrappers
And more...
14. Sqoop
Allows for bulk
transfers of data
between Hadoop
and a RDBMS
15. Hadoop Community
Much more like the Linux community than the
PostgreSQL community
Some competing commercial interests makes
the direction unclear to some
17. Hive Metastore
All of the meta data of the Hive tables reside
in a RDBMS
The default is to use Derby
– Limits to a single connection
18. Hive Metastore (cont.)
Use PostgreSQL for scalability and reliability
Many concurrent users
19. PostgreSQL Backups
PostgreSQL's WAL archiving and Point In Time
Recovery is powerful
– But it requires a lot of storage
Typically used with some sort of NFS
22. Log Files
Maintain log files for
months or years
May use Syslog to
consolidate multiple
database logs
Turning on query logging
makes the log file huge
23. Log Files (cont.)
Use Flume
Consolidates
logs across
databases
MapReduce
allows for parallel
analysis
24. Log Files (cont.)
Setup Syslog to forward messages to Flume
rsyslog.conf:
*.* @127.0.0.1:5140
Configure Flume to act as a Syslog server
pglogs.sources.sl.type = syslogudp
pglogs.sources.sl.port = 5140
pglogs.sources.sl.host = 0.0.0.0
25. Log Files (cont.)
MapReduce jobs can quickly analyze the logs
public static class MapClass extends MapReduceBase implements
Mapper<StatementOffset, Text, Text, LongWritable> {
private final static String STATEMENT_DELIM = "statement: ";
private final static String SYSLOG_IDENT = "postgres";
private final static LongWritable one = new LongWritable(1);
public void map(StatementOffset key, Text value,
OutputCollector<Text, LongWritable> output,
Reporter reporter)
throws IOException {
String line = value.toString();
if (line.startsWith(SYSLOG_IDENT) &&
line.contains(STATEMENT_DELIM)) {
output.collect(getStatementType(line), one);
}
}
...
26. Transaction History
History Tables grow very
rapidly
Maintaining the tables
over time is a huge
undertaking
Partitioning frequently
used
28. OLAP Cubes
Can take a very long
time to build
PostgreSQL will use
only a single CPU
Drilling down to the
details can be a very
long query
29. OLAP Cubes
Use a Foreign Data Wrapper
Looks like a native table to reporting tools
Drill down takes place on Hadoop
30. OLAP Cubes (cont.)
Create a Foreign Server
CREATE EXTENSION hadoop_fdw;
CREATE SERVER hadoop_server
FOREIGN DATA WRAPPER hadoop_fdw
OPTIONS (address '127.0.0.1', port '10000');
CREATE USER MAPPING
FOR PUBLIC SERVER hadoop_server;
32. OLAP Cubes (cont.)
Loading PostgreSQL aggregate tables is a
simple SQL statement
Use Hive views for more complex aggregations
INSERT INTO item_sale_month
SELECT ol_i_id as i_id,
EXTRACT(YEAR FROM ol_delivery_d) as year,
EXTRACT(MONTH FROM ol_delivery_d) as month,
sum(ol_amount) as amount
FROM order_line
GROUP BY 1, 2, 3;
33. OLAP Cubes (cont.)
Drill downs pass the processing down to Hive
postgres=# explain verbose select sum(ol_amount) from order_line where
ol_i_id = 34928;
QUERY PLAN
Aggregate
(cost=11002.50..11002.51 rows=1 width=14)
Output: sum(ol_amount)
>
Foreign Scan on public.order_line (cost=10000.00..11000.00 rows=1000
width=14)
Output: ol_w_id, ol_d_id, ol_o_id, ol_number, ol_i_id,
ol_delivery_d, ol_amount, ol_supply_w_id, ol_quantity, ol_dist_info
Remote SQL: SELECT * FROM order_line WHERE ((ol_i_id = 34928))
(5 rows)
34. Audit History
All database access
should be audited and
autonomously logged
Must be maintained for
years
36. Audit History (cont.)
Create a writable foreign table
CREATE FORIEGN TABLE audit (
audit_id bigint,
event_d timestamp,
table varchar,
action varchar,
user varchar,
) SERVER hadoop_server
OPTIONS (table 'audit',
flume_port '44444');
37. Message Queue
Tables have a lot of
churn with many
updates and deletes
Causes a lot of table
and index bloat in
PostgreSQL
AKA a vacuuming
nightmare
38. Message Queue (cont.)
Use an FDW to Hbase
Hbase is not an “Eventually Consistent”
architecture so it is ideal for message queues
39. Message Queue (cont.)
Create a writable foreign table
CREATE FOREIGN TABLE hbase_table (
key varchar,
value varchar
) SERVER hadoop_server
OPTIONS (table 'hbase_table', hbase_address
'localhost',
hbase_port '9090',
hbase_mapping ':key,cf:val');
INSERT INTO hbase_table VALUES ('key1',
'value1');
INSERT INTO hbase_table VALUES ('key2',
'value2');
UPDATE hbase_table SET value = 'update'
WHERE key = 'key2';
DELETE FROM hbase_table WHERE key='key1';
SELECT * from hbase_table;
40. High Availability
When setting up
replication for high
availability many
necessary components
are not provided by
PostgreSQL
Failure detection
Split brain prevention
Replica promotion
Notification to clients of fail
over
41. High Availability (cont.)
ZooKeeper with
a custom
background worker
can handle all of
the missing
components
42. High Availability (cont.)
Failure Detection – Replicas watch an
ephemeral lock created by the master
void watch_master() {
...
sprintf(root_path, "%s/lock",
zookeeper_path);
while (!found_master && !
got_sigterm) {
elog(DEBUG1, "Looking for
the master lock...");
rc = zoo_get_children(zh,
43. High Availability (cont.)
Split brain prevention – master grabs an
exclusive zooKeeper lock on startup. Shut
down immediately if unsuccessful
char *create_lock() {
char path[PATH_LEN];
char *buffer;
int rc;
buffer = (char *)
palloc(PATH_LEN);
ensure_connected();
44. High Availability (cont.)
Replica promotion – use zooKeeper for
ballots of a election. Highest LSN wins
void elect_master() {
...
recptr = GetWalRcvWriteRecPtr(NULL, NULL);
sprintf(lsn, "%X/%08X", (uint32) (recptr >> 32), (uint32) recptr);
elog(DEBUG1, "Entering a ballot with an LSN of: %s", lsn);
sprintf(path, "%s/lock/%s", zookeeper_path, replica_id);
rc = zoo_create(zh, path, lsn, strlen(lsn), &ZOO_OPEN_ACL_UNSAFE,
ZOO_EPHEMERAL, buffer, sizeof(buffer)-1);
if (rc) {
elog(FATAL, "Failure creating zooKeeper path: %s", path);
}
elog(DEBUG1, "Created a zooKeeper ephemeral path at: %s", buffer);
all_votes_in = false;
while (!all_votes_in && !got_sigterm) {
sprintf(path, "%s/replica", zookeeper_path);
rc = zoo_get_children(zh, path, 0, &replicas);
if (rc == ZOK) {
sprintf(path, "%s/lock", zookeeper_path);
rc = zoo_get_children(zh, path, 0, &ballots);
if (rc == ZOK) {
all_votes_in = true;
for(i=0; i < replicas.count; i++) {
found = false;
for(j=0; j < ballots.count; j++) {
if (strcmp(replicas.data[i], ballots.data[j]) == 0) {
found = true;
break;
}
}
if (!found) {
all_votes_in = false;
break;
}
}
}
}
…
}
for(j=0; j < ballots.count; j++) {
if (strcmp(ballots.data[j], replica_id) != 0) {
sprintf(path, "%s/lock/%s", zookeeper_path, ballots.data[j]);
memset(buffer, 0, sizeof(buffer));
bufferlen= sizeof(buffer);
rc = zoo_get(zh, path, 0, buffer, &bufferlen, NULL);
if (rc != ZOK) {
elog(LOG, "Unable to get %s. New master probably already found...",
path);
}
elog(DEBUG1, "Comparing the LSN: %s", buffer);
if (strcmp(lsn, buffer) < 0) {
elog(DEBUG1, "Found an LSN greater than mine. I am not the winner.");
return;
} else if (strcmp(lsn, buffer) == 0) {
elog(DEBUG1, "Found an LSN equal to mine. See if I was the first to the
start.");
if (strcmp(replica_id, ballots.data[j]) > 0 ) {
elog(DEBUG1, "Found an LSN equal to mine and a sequence earlier
than mine. I am not the winner.");
return;
}
}
}
}
elog(LOG, "Becoming the new master. Acquiring the proper locks.");
lock = create_lock();
for(j=0; j < ballots.count; j++) {
elog(DEBUG1, "Removing ballot at %s", path);
rc = zoo_delete(zh, path, -1);
if (rc != ZOK) {
elog(LOG, "Unable to delete %s", path);
}
}
if (!has_lock(lock)) {
elog(LOG, "Unable to acquire a zooKeeper lock. Shutting down to prevent a split
brain scenario");
do_stop();
} else {
elog(LOG, "Promoting to become the new master.");
do_promote();
}
publish_master_info();
}
45. High Availability (cont.)
Client notification – Python (or others) can
watch the master and act appropriately
def
__init__(self,zkHosts,pathName):
zkHosts = zkHosts
pathName = pathName
watchPath = pathName +
"/master"
zk =
KazooClient(hosts=zkHosts)
zk.start()