Tips & Tricks
Greenplum Database on HDFS
Like this presentation? Why not share!
Greenplum Database Overview
Hadoop & Greenplum: Why Do Such a T...
by Ed Kohlwey
EMC Greenplum Database 4.2
by Rajesh Nambiar
by Ahmad Yani Emrizal
Netezza vs Teradata vs Exadata
by Asis Mohanty
Greenplum Analytics Workbench - Wha...
Email sent successfully!
Show related SlideShares at end
Greenplum Database on HDFS
Jun 20, 2012
Comment goes here.
12 hours ago
Are you sure you want to
Your message goes here
Be the first to comment
1 year ago
2 years ago
Delivery Management and Enterprise Architecture - Large scale accounts
2 years ago
2 years ago
2 years ago
Number of Embeds
No notes for slide
Transcript of "Greenplum Database on HDFS"
1. Greenplum Database on HDFS (GOH) Presenter: Lei Chang firstname.lastname@example.org© Copyright 2012 EMC Corporation. All rights reserved. 1
Outline • Introduc/on • Architecture • Features • Performance study © Copyright 2012 EMC Corporation. All rights reserved. 2
EMC Greenplum Uniﬁed Analy/cs Pla@orm © Copyright 2012 EMC Corporation. All rights reserved. 3
GOH use cases • All customers of Greenplum who want to minimize the amount of duplicate storage that they have to buy for analy/cs – managing scale much easier if you focus on the growth of one pool than having many fragmented pools. • For customers who want the func/onality of GPDB with the generality and storage provided by their HBase store. • Poten/al Ability to plug various storage such as Isilon, Atoms, MapR Filesystem, CloudStore, GPFS, Lustre, PVFS and Ceph to GPDB/Hadoop soQware stack © Copyright 2012 EMC Corporation. All rights reserved. 4
Master host GPDB Interconnect Segment Segment (Mirror) Segment Segment Segment Segment Segment Segment (Mirror) Segment Segment (Mirror) (Mirror) (Mirror) Segment host Segment host Segment host Segment host Segment host Meta Ops Read/Write Tables in HDFS filespace Namenode B Datanode replication Datanode Datanode Rack1 Rack2© Copyright 2012 EMC Corporation. All rights reserved. 5
GOH features • A pluggable storage layer. If a new ﬁle system can support the full seman/c of HDFS interface, then the ﬁle system can be added as GPDB AO table storage. • ASributed ﬁlespace • HDFS ﬁlespaces are na/vely supported • Full transac/on support for AO tables on HDFS. • HDFS trunca/on capability to support the transac/on capability of GOH. • HDFS na/ve C interface to eliminate the concurrency limita/on of current java JNI based client. • All current GPDB func/onality: fault tolerance et al. © Copyright 2012 EMC Corporation. All rights reserved. 6
Pluggable storage: user interface CREATE FUNCTION open_func AS ( obj_ﬁle , link_smybol ) CREATE FILESYSTEM ﬁlesystemname [OWNER ownername] ( connect = connect_func, open = open_func, close = close_func, read = read_func, write = write_func, seek = seek_func, ... ) © Copyright 2012 EMC Corporation. All rights reserved. 7
ASributed ﬁlespaces • The number of replicas for the table in the ﬁlespace • Whether mirroring is supported for the tables stored in the ﬁlespace • Other aSributes… © Copyright 2012 EMC Corporation. All rights reserved. 8
Example SQL CREATE FILESPACE goh ON HDFS ( 1: hdfs://name-‐node/users/changl1/gp-‐data/gohmaster/gpseg-‐1, 2: hdfs://name-‐node/users/changl1/gp-‐data/goh/gpseg0, 3: hdfs://name-‐node/users/changl1/gp-‐data/goh/gpseg1, ) WITH (NUMREPLICA = 3, MIRRORING = false); © Copyright 2012 EMC Corporation. All rights reserved. 9
Transac/on support • When a load transac/on is aborted, there will be some garbage data leQ at the end of ﬁle. For HDFS like systems, data cannot be truncated or overwriSen. Thus, we need some methods to process the par/al data to support transac/on. – Op/on 1: Load data into a separate HDFS ﬁle. Unlimited number of ﬁles. – Op/on 2: Use metadata to records the boundary of garbage data, and implements a kind of vacuum mechanism. – Op/on 3: Implement HDFS trunca/on. © Copyright 2012 EMC Corporation. All rights reserved. 10
HDFS C client: why • libhdfs (Current HDFS c client) is based on JNI. It is diﬃcult to make GOH support a large number of concurrent queries. • Example: – 6 segments on each segment hosts – 50 concurrent queries – each query may have 12 or more QE processes that do scan – there will be about 600 processes that start 600 JVMs to access HDFS. – If each JVM uses 500MB memory, the JVMs will consume 600 * 500M = 300G memory. – Thus naïve usage of libhdfs is not suitable for GOH. Currently we have three op/ons to solve this problem © Copyright 2012 EMC Corporation. All rights reserved. 11
HDFS client: three op/ons • Op/on 1: use HDFS FUSE. HDFS FUSE introduces some performance overhead. And the scalability is not veriﬁed yet. • Op/on 3: implement a webhdfs based C client. webhdfs is based on HTTP. It also introduces some costs. Performance should be benchmarked. Webhdfs based method has several beneﬁts, such as ease to implementa/on and low maintenance cost. • Op/on 2: implement a C RPC interface that directly communicates with NameNode and DataNode. Many changes when the RPC protocol is changed. • Currently, we implemented op/on 2 and op/on 3. © Copyright 2012 EMC Corporation. All rights reserved. 12
HDFS truncate • API – truncate (DistributedFileSystem) -‐ truncate a ﬁle to a speciﬁed length – void truncate(Path src, long length) throws IOExcep/on; • Seman/cs – Only single writer/Appender/Truncater is allowed. Users can only call truncate on closed ﬁles. – HDFS guarantees the atomicity of a truncate opera/on. That is, it succeeds or fails. It does not leave the ﬁle in an undeﬁned state. – Concurrent readers may read content of a ﬁle that will be truncated by a concurrent truncate opera/on. But they must be able to read all the data that are not aﬀected by the concurrent truncate opera/on. © Copyright 2012 EMC Corporation. All rights reserved. 13
HDFS truncate implementa/on (HDFS-‐3107) • Get the lease of the to-‐be-‐truncated ﬁle (F) • If truncate is at block boundary – Delete the tail blocks as an atomic opera/on. • If truncate is not at block boundary – Copy the last block (B) of the result ﬁle (R) to a temporary ﬁle (T). • Otherwise, If truncate is not at block boundary • Remove the tail blocks of ﬁle F (including B, B+1, …), concat F and T, get R. • Release the lease for the ﬁle © Copyright 2012 EMC Corporation. All rights reserved. 14
Performance study (to be added) © Copyright 2012 EMC Corporation. All rights reserved. 15
Thank you! © Copyright 2012 EMC Corporation. All rights reserved. 16