Giving an overview of many of the details of the external table syntax in Oracle that enables you from SQL to access files that reside outside the database
1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
External Tables
- Not *Just* Loading a CSV File
Kim Berg Hansen
Senior Consultant
2. About me
External Tables - Not *Just* Loading a CSV File2 9/21/2018
• Danish geek
• SQL & PL/SQL developer since 2000
• Developer at Trivadis since 2016
http://www.trivadis.dk
• Oracle Certified Expert in SQL
• Oracle ACE Director
• Blogger at http://www.kibeha.dk
• SQL quizmaster at
http://devgym.oracle.com
• Likes to cook
• Reads sci-fi
• Member of Danish Beer Enthusiasts
4. About Trivadis
External Tables - Not *Just* Loading a CSV File4 9/21/2018
Trivadis is a market leader in IT consulting, system integration, solution engineering
and the provision of IT services focusing on and
technologies in Switzerland, Germany, Austria and Denmark.
We offer our services in the following strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
5. COPENHAGEN
MUNICH
LAUSANNE
BERN
ZURICH
BRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region
External Tables - Not *Just* Loading a CSV File5 9/21/2018
14 Trivadis branches and more than
600 employees
260 Service Level Agreements
Over 4,000 training participants
Research and development budget:
EUR 5.0 million
Financially self-supporting and
sustainably profitable
Experience from more than 1,900
projects per year at over 800
customers
6. External Tables - Not *Just* Loading a CSV File
External Tables - Not *Just* Loading a CSV File6 9/21/2018
1. Access Drivers, Parameters, Locations
2. Definition versus Runtime
3. Error Handling, Logging Files
4. Flat Files input
5. Preprocessor
6. Multiple Files, Parallelism, Partition Pruning
7. Trusted Relied Constraints
8. SQL*Loader as Generator
9. External Table with Datapump Dump Files
10. HDFS / HIVE
7. External Tables - Not *Just* Loading a CSV File7 9/21/2018
Access Drivers
Parameters
Locations
8.
9. External Tables
External Tables - Not *Just* Loading a CSV File9 9/21/2018
A way to treat a file outside of the database as a rowsource
Enables SELECT from the file with all the power of SQL
– Without necessarily loading the data into a table in the database
Different filetypes supported with different Access Drivers
select t1.col1, t2.col2
from db_tab t1
join ext_tab t2
on t2.fk = t1.pk
where t1.grp = 'FOO';
10. Creation
External Tables - Not *Just* Loading a CSV File10 9/21/2018
Definition created in data dictionary* like normal table (only data is outside DB)
(* in 18c not necessarily - more on that later)
Specify type (access driver), directory and location (file)
Specify access parameters depending on access driver
create table ext_tab (fk number, col2 varchar2(10))
organization external (
type oracle_loader
access parameters (
records delimited by newline
fields terminated by ";" optionally enclosed by '"'
( fk integer external(6), col2 char(10) )
)
location (ext_dir:'file.txt')
);
11. Access Driver
External Tables - Not *Just* Loading a CSV File11 9/21/2018
Keyword TYPE specifies which access driver to use
ORACLE_LOADER
– Flat files - alternative to SQL*Loader
ORACLE_DATAPUMP
– Datadump dump files - can also write files (once - at creation time)
ORACLE_HDFS (12.2) Oracle Big Data SQL
– Read datafiles from HDFS (by creating a HIVE table)
ORACLE_HIVE (12.2) Oracle Big Data SQL
– Read datafiles from HDFS by querying a HIVE catalog
12. Access Parameters
External Tables - Not *Just* Loading a CSV File12 9/21/2018
Specific for each Access Driver type
Tells DB the metadata of the file, how to get the values of each column
18c doc states opaque_format_spec in quotes used for INLINE EXTERNAL and
EXTERNAL_MODIFY, while without quotes is used for CREATE TABLE
– This appears to be a doc bug - without quotes seems always to work
Or a subquery can return the access parameters
13. Location
External Tables - Not *Just* Loading a CSV File13 9/21/2018
Keyword LOCATION contains one or more filenames
For ORACLE_LOADER and ORACLE_DATAPUMP files in filesystem
– DIRECTORY object must be created and privileges granted
– DIRECTORY object specified for file: DIRNAME:'file.txt'
– Or DEFAULT DIRECTORY specifies directory for files where dir. is omitted
– (12.1) Location supports wildcards * and ?
For ORACLE_HDFS location specifies hdfs:/... style URI
For ORACLE_HIVE location unused - access parameters specifies cluster/table
14. External Tables - Not *Just* Loading a CSV File14 9/21/2018
Definition versus Runtime
15.
16. Definition in Data Dictionary
External Tables - Not *Just* Loading a CSV File16 9/21/2018
Define with CREATE TABLE
Change with ALTER TABLE
– Often useful to change LOCATION
– Some restrictions on what can be altered - see manual of each version
Change the projection with ALTER TABLE
– PROJECT COLUMN ALL / PROJECT COLUMN REFERENCED
- The latter may cause inconsistencies if errors in un-referenced columns
17. Overrides at Runtime (12.2)
External Tables - Not *Just* Loading a CSV File17 9/21/2018
SELECT ... FROM EXT_TAB EXTERNAL MODIFY (...)
– Modify default directory and/or location
- Allows each session/query to read own (identically structured) file(s)
– Modify reject limit
– Modify badfile / logfile / discardfile
Careful with your security
– A user with SELECT privilege on the external table can potentially read all files in
the DIRECTORY objects he has READ privilege on
18. Everything at Runtime (18.1)
External Tables - Not *Just* Loading a CSV File18 9/21/2018
Inline definition of External Table
Nothing in data dictionary (hence also less information for the optimizer)
select fk, col2
from external (
(fk number, col2 varchar2(10)
type oracle_loader
access parameters (
records delimited by newline
fields terminated by ";" optionally enclosed by '"'
( fk integer external(6), col2 char(10) )
)
location (ext_dir:'file.txt')
);
19. External Tables - Not *Just* Loading a CSV File19 9/21/2018
Error Handling
Logging Files
20.
21. Errors in the Data
External Tables - Not *Just* Loading a CSV File21 9/21/2018
Errors in the data may or may not return an error
– REJECT LIMIT 0 (default) = first occurrence of bad data throws error
– REJECT LIMIT {int} {int} occurrence of bad data throws error
– REJECT LIMIT UNLIMITED no errors thrown
Bad rows of data are copied to the BADFILE
Note: If you have ALTER TABLE ... PROJECT COLUMN REFERENCED
– When column with bad data is in SELECT list => row goes to BADFILE
– When column with bad data is not in SELECT list => row is selected
22. Logging Files
External Tables - Not *Just* Loading a CSV File22 9/21/2018
Three parameter pairs
– NOLOGFILE / LOGFILE dir_obj:'ext.log'
– NOBADFILE / BADFILE dir_obj:'ext.bad'
– NODISCARDFILE / DISCARDFILE dir_obj:'ext.dcs'
Can use symbol substitution for uniqueness
- %p = Process id of user process doing the SELECT
- %a = Agent number of slave process by parallel access
Each of them defaults to {table_name}_%p.{ext}
BADFILE contains those rows that could not be imported
DISCARDFILE contains those rows that were skipped by LOAD WHEN clause
23. External Tables - Not *Just* Loading a CSV File23 9/21/2018
Flat Files input
24.
25. Overall file characteristica
External Tables - Not *Just* Loading a CSV File25 9/21/2018
CHARACTERSET
– What characterset is the file (default is DB characterset, not client)
LANGUAGE
– Which language is used for month names, AM/PM, etc. in the file
TERRITORY
– How are decimal / thousand separators, week numbers, etc. in the file
DATA IS BIG ENDIAN / DATA IS LITTLE ENDIAN
– What endianness used the platform where the file originated
26. Records
External Tables - Not *Just* Loading a CSV File26 9/21/2018
FIXED
– Each record a fixed length (in bytes)
VARIABLE
– Start of each record contains a character count
DELIMITED BY
– Each record ends with a given string
XMLTAG
– Each record is the content within a given XML tag: <MYTAG>....</MYTAG>
27. Fields
External Tables - Not *Just* Loading a CSV File27 9/21/2018
Field list for file not necessarily match directly field list for table, can map differently
ALL FIELDS OVERRIDE - tells that field list does match directly table fields
– Then only list fields that needs extra info, like non-default date format or such
FIELD NAMES clause tells how to handle that first line contains field names
– Can be ignored or can map fields automatically by field name
TERMINATED BY / [OPTIONALLY] ENCLOSED BY
FIELDS CSV
– WITH / WITHOUT EMBEDDED - does file contain record delim within string fields
– TERMINATED / ENCLOSED - override default , and "
28. Specifying Field Positions (when not delimited)
External Tables - Not *Just* Loading a CSV File28 9/21/2018
Start position
– Digit is position directly
– * means the start is the char after the end of previous field
– *+{offset} or *-{offset} means plus or minus offset chars after end of previous field
End can be specified as position (Digit) or as length (+Digit)
STRING SIZES ARE IN
– Parameter says if positions are measured in bytes or chars (for multibyte charsets)
29. Datatypes
External Tables - Not *Just* Loading a CSV File29 9/21/2018
INTEGER, DECIMAL, FLOAT, DOUBLE
– Specifying EXTERNAL means the numbers are represented as strings in the file
– Without EXTERNAL means they are binary in the format as a C program
- Access parameter DATA IS BIG / LITTLE ENDIAN used here
RAW, VARRAW, VARRAWC
– Binary data, fixed length or variable with first bytes indicating length
ORACLE_DATE, ORACLE_NUMBER
– Binary representations of Oracle DATE or NUMBER datatype
30. Datatypes (continued)
External Tables - Not *Just* Loading a CSV File30 9/21/2018
CHAR, VARCHAR, VARCHARC
– Character data, fixed length or variable with first bytes indicating length
– VARCHAR length indicator is bytes, VARCHARC length indicator is characters
– CHAR also used for DATE, TIMESTAMP, INTERVAL:
– DATE_FORMAT {type} MASK "{format mask}"
31. COLUMN TRANSFORMS
External Tables - Not *Just* Loading a CSV File31 9/21/2018
{column_name} FROM {transformation}
– NULL - sets column in all rows to NULL
– CONSTANT - sets column in all rows to specified literal
– CONCAT - sets column to concatenation of field(s) and/or literal(s)
– STARTOF - sets column to a substring from the start of a field
– LOBFILE - sets column to a LOB loaded from another file
directory object / filename can be a field or literal
32. External Tables - Not *Just* Loading a CSV File32 9/21/2018
Preprocessor
33.
34. Preprocessor
External Tables - Not *Just* Loading a CSV File34 9/21/2018
PREPROCESSOR [{directory}:]{script_or_exe_file}
Must have EXECUTE privilege on directory object
Can be different directory than the datafile - this is recommended for security
Preprocessor script/exe will be called with filename from LOCATION as parameter
Standard output from script/exe will become the input for the EXTERNAL TABLE
Cannot specify arguments directly
– if executable requires arguments, must wrap it in a script
Windows script (batch file) must have suffix .bat or .cmd
Windows batch file must start with @echo off
35. Uses
External Tables - Not *Just* Loading a CSV File35 9/21/2018
Uncompress (gunzip / zcat)
– Process compressed file and stream uncompressed data as external table input
Directory listing
– Preprocessor script does ls / dir
Changing file content
– Do transformations with sed before the data is used for external table input
curl calls
– get http resources and feed them to external table input
Your imagination is the limit
36. External Tables - Not *Just* Loading a CSV File36 9/21/2018
Multiple Files
Parallelism
Partition Pruning
37.
38. Multiple Files
External Tables - Not *Just* Loading a CSV File38 9/21/2018
LOCATION can contain multiple files, with or without directory specification
– If without, directory specified in DEFAULT DIRECTORY is used
Selecting from the external table reads all the files (except by partition pruning)
If field names are in first row, it can be in either just first file or all files
– Specify which with FIELD NAMES FIRST / ALL
39. Parallelism
External Tables - Not *Just* Loading a CSV File39 9/21/2018
Multiple files
– Each file specified in LOCATION handled by each slave process
- parallel degree not helpful to set larger than number of files
– That includes that PREPROCESSOR is called for each file by slave process
Large files
– ORACLE_LOADER parallel select can attempt to assign file chunks to slaves
– Cannot always be done, for example not by:
- Named pipes as input
- Multibyte charactersets (unless fixed byte length records)
- Variable length records with length indicator bytes
40. Partition Pruning (12.2)
External Tables - Not *Just* Loading a CSV File40 9/21/2018
Can be partitioned with RANGE, INTERVAL, LIST or composites of them
Each partition has one or more files in LOCATION clause
When optimizer does partition pruning, for an external table that means it only scans
the file(s) of that partition
DB trusts that files of each partition only contains the specified partition key value(s)
If key values are wrong in the files:
– you can get output that does not match WHERE clause
– you may have data you cannot query with WHERE clause
41. External Tables - Not *Just* Loading a CSV File41 9/21/2018
Trusted Relied Constraints
42.
43. Purposes of Constraints
External Tables - Not *Just* Loading a CSV File43 9/21/2018
On regular tables integrity constraints can be enforced
– Not possible to enforce on external tables - data comes from elsewhere
- Except NOT NULL constraint can be enforced - nulls go to bad file
– But you can say "trust me" and use RELY DISABLE on constraints (12.2)
- can do that for primary key, foreign key, unique constraints
- but not check constraint
With knowledge of the constraints, optimizer can make assumptions
that enables choosing more optimal access plans
– This also works with the trusted constraints on external tables
- QUERY_REWRITE_INTEGRITY = trusted or stale_tolerated
44. External Tables - Not *Just* Loading a CSV File44 9/21/2018
SQL*Loader as Generator
45.
46. SQL*Loader for Creating External Tables
External Tables - Not *Just* Loading a CSV File46 9/21/2018
You have a SQL*Loader control file?
You want to do the same load (or almost) with an external table?
Use SQL*Loader parameter EXTERNAL_TABLE=GENERATE_ONLY
SQL*Loader won't load but instead create code in the log file
This code you can execute or edit as you wish
47. External Tables - Not *Just* Loading a CSV File47 9/21/2018
External Table with
Datapump Dump Files
48.
49. Write (once) to Dump File
External Tables - Not *Just* Loading a CSV File49 9/21/2018
CTAS for ORACLE_DATAPUMP access driver
This created external table can be read, but not modified
create table ext_emp_tab
organization external (
type oracle_datapump
default directory ext_dir
location ('ext_emp.dmp')
)
as select * from emp;
50. Driver Parameters for Write
External Tables - Not *Just* Loading a CSV File50 9/21/2018
COMPRESSION
– ENABLED BASIC / LOW / MEDIUM / HIGH
- requires Advanced Compression option
ENCRYPTION
– ENABLED / DISABLED
VERSION
– COMPATIBLE / LATEST / version number
51. Parallel Write to Multiple Files
External Tables - Not *Just* Loading a CSV File51 9/21/2018
CTAS for ORACLE_LOADER access driver
Parallel degree and number of files should match
– If number of files > parallel, extra files unused
– If parallel > number of files, parallel is reduced to number of files
create table ext_emp_tab
organization external (
type oracle_datapump
default directory ext_dir
location ('ext_emp1.dmp', 'ext_emp2.dmp', 'ext_emp3.dmp')
)
parallel 3
as select * from emp;
52. External Table to Read Dump File
External Tables - Not *Just* Loading a CSV File52 9/21/2018
Create external table on an existing Dump File (for example from other DB)
Dump file can be from other DB charset, other DB endianness
Reading from multiple files require all have been written with identical metadata
– Ext.table name, column names/types, charset, timezone must be identical
create table ext_emp_tab (
emp_id number, ename varchar2(20)
) organization external (
type oracle_datapump
default directory ext_dir
location ('ext_emp1.dmp', 'ext_emp2.dmp', 'ext_emp3.dmp')
);
53. External Tables - Not *Just* Loading a CSV File53 9/21/2018
HDFS / HIVE
54.
55. Oracle Big Data SQL
External Tables - Not *Just* Loading a CSV File55 9/21/2018
External HDFS / HIVE tables for Oracle Big Data SQL (licensed product)
– Hadoop Clusters on Oracle Big Data Appliance
– Database on Exadata
HIVE metadata exposed to database
– ORACLE_HIVE external tables can just specify columns and HIVE cluster/table
– Can override mappings if desired
ORACLE_HDFS you specify HIVE style metadata directly, no table in HIVE catalog
56. Advantages
External Tables - Not *Just* Loading a CSV File56 9/21/2018
Big Data SQL Engine
– SmartScan on Hadoop
– Fast direct reads
– Oracle PQ => Hadoop parallelism
Advantages of Hadoop data directly in SQL
– Immediate use by anything that uses SELECT
– Fine-grained access control of Hadoop
– Data redaction, data masking
57. Questions & Answers
Kim Berg Hansen
Senior Consultant
email kim.berghansen@trivadis.com
twitter @kibeha
blog http://www.kibeha.dk
9/21/2018 External Tables - Not *Just* Loading a CSV File57
Editor's Notes
“Our focus as IT consultants and system integrator lies on the business fields of Business Intelligence, Application Development, Infrastructure Engineering and Training. We have a separate division – Trivadis Services – which takes over the operation, maintenance and ongoing development of individual systems such as databases and specific applications, or we can also outsource the responsibility for more complex environments. We provide our services throughout Switzerland, Germany, Austria and Danmark and concentrate on Oracle and Microsoft technologies.”
“We are a non-affiliated and profitable company with over 600 employees. Regional proximity to our customers is one of our key considerations. We achieve this by operating 14 branch operations in Switzerland, Germany, Austria and Danmark. We successfully completed more than 1900 customer projects during the last business year. Additionally, we also support our customers with over 200 Service Level Agreements. The basis for this sustained technological excellence is reflected in our research and development budget. Every year we invest around 5 million Swiss franks in analyzing and evaluating new technologies and developing our methods and products.”