External tables are typically used for fast, parallel data loading. Once an external table is defined, you can query its data directly (and in parallel) using SQL commands.
2. External Tables
External tables enable accessing external files as if they are regular database tables. They
are often used to move data into and out of a Greenplum database.
Used with gpfdist, the Greenplum parallel file distribution program, external tables
provide full parallelism by using the resources of all Greenplum segments to load or
unload data.
You can query external table data directly and in parallel using SQL commands such as
SELECT, JOIN, or SORT EXTERNAL TABLE DATA, and you can create views for external
tables.
This makes it easy for Greenplum users to load massive amounts of data by just writing
SQL!
4. External table to load data
Login to a machine which has gpfdist installed.
ssh <host>
Start gpfdist as a background process
gpfdist -d <DIRPATH> -p 8080 &
Login to Greenplum database for creating tables
Continue…
5. External table to load data
Create Table in Greenplum database
CREATE TABLE foo
(
bar_id int not null,
bar text,
bar_description text
)
DISTRIBUTED BY (bar_id);
Continue…
6. External table to load data
Create an External Table in Greenplum
CREATE EXTERNAL TABLE ext_foo (LIKE foo) LOCATION ('gpfdist://<HOST>:8080/demo/foo.txt')
FORMAT 'TEXT' (DELIMITER AS '|' NULL AS 'null');
Note:
We can spelled out all of the columns but a shortcut can be used by using “LIKE foo”
The location indicates it is using gpfdist and the host is <HOST> with port 8080
In this case, Files are located at <DIR>/demo/foo.txt but gpfdist is serving <DIR> so we
need to only specify the demo subdirectory and then the file name
You can pick TEXT or CSV format.
Here pipes (|) used as delimiter and spelled out null for null values
Continue…
7. External table to load data
Insert the data in file <DIR>demo/foo.txt.
1|foo|bar
2|blah|blah
3|this|that
Run query to fetch data from External Table and insert into regular table.
INSERT INTO foo SELECT * FROM ext_foo;
Check if data exist in regular table
SELECT * FROM foo
9. External web table to load data
Create an External Table in Greenplum
CREATE EXTERNAL WEB TABLE foo_web
(
bar_id int,
bar text,
bar_description text
)
EXECUTE E'/usr/bin/sudo -u gpload /bin/cat /data/dhh_core/foo.txt' ON HOST
FORMAT 'csv' (delimiter ',' null '' escape '"' quote '"')
ENCODING 'UTF8';
Continue…
10. External web table to load data
Insert the data in file <DIR>/demo/foo.txt.
1,foo,bar
2,blah,blah
3,this,that
Run query to fetch data from External Table and insert into regular table.
INSERT INTO foo(bar_id, bar, bar_description)
SELECT bar_id, bar, bar_description FROM foo_web
Check if data exist in regular table
SELECT * FROM foo