SlideShare a Scribd company logo
Session id: 40043

Data Pump in Oracle Database 10g :
Foundation for Ultra-High Speed Data
Movement Utilities
George H. Claborn
Data Pump Technical Project Leader
Oracle Corporation
New England Development Center
Data Pump: Overview







What is it?
Main Features
Architecture
Performance
Things to keep in mind
Some thoughts on original exp / imp
Data Pump: What is it?
 Server-based facility for high performance
loading and unloading of data and metadata
 Callable: DBMS_DATAPUMP. Internally uses
DBMS_METADATA
 Data written in Direct Path stream format. Metadata
written as XML
 New clients expdp and impdp: Supersets of original
exp / imp.
 Foundation for Streams, Logical Standby, Grid,
Transportable Tablespaces and Data Mining initial
instantiation.
Features: Performance!!
 Automatic, two-level parallelism
–
–
–
–
–






Direct Path for inter-partition parallelism
External Tables for intra-partition parallelism
Simple: parallel=<number of active threads>
Dynamic: Workers can be added and removed from a running
job in Enterprise Edition
Index builds automatically “parallelized” up to degree of job

Simultaneous data and metadata unload
Single thread of data unload: 1.5-2X exp
Single thread of data load: 15X-40X imp
With index builds: 4-10X imp
Features: Checkpoint / Restart
 Job progress recorded in a “Master Table”
 May be explicitly stopped and restarted later:
–

Stop after current item finishes or stop immediate

 Abnormally terminated job is also restartable
 Current objects can be skipped on restart if
problematic
Features: Network Mode
 Network import: Load one database
directly from another
 Network export: Unload a remote database to a local
dumpfile set
–

Allows export of read-only databases

 Data Pump runs locally, Metadata API runs remotely.
 Uses DB links / listener service names, not pipes. Data
is moved as ‘insert into <local table> select from
<remote table>@service_name’
 Direct path engine is used on both ends
 It’s easy to swamp network bandwidth: Be careful!
Features: Fine-Grained Object Selection
 All object types are supported
for both operations: export and import
 Exclude: Specified object types
are excluded from the operation
 Include: Only the specified object types are included.
E.g, just retrieve packages, functions and procedures
 More than one of each can be specified, but use of both
is prohibited by new clients
 Both take an optional name filter for even finer
granularity:
–
–

INCLUDE PACKAGE: “LIKE ‘PAYROLL%’ “
EXCLUDE TABLE: “IN (‘FOO’,’BAR’, … )’ “
Features: Monitoring
 Flexible GET_STATUS call
 Per-worker status showing current object and
percent done
 Initial job space estimate and overall percent
done
 Job state and description
 Work-in-progress and errors
Features: Dump File Set Management
 Directory based: E.g, DMPDIR:export01.dmp where
DMPDIR created as:
SQL> create directory dmpdir as ‘/data/dumps’

 Multiple, wildcarded file specifications supported:
dumpfile=dmp1dir:full1%u.dmp, dmp2dir:full2%u.dmp
–

Files are created as needed on a round-robin basis from
available file specifications

 File size can be limited for manageability
 Dump file set coherency automatically maintained
New Clients – expdp / impdp
 Similar (but not identical) look and feel to exp / imp
 All modes supported: full, schema, table, tablespace,
transportable. Superset of exp / imp
 Flashback is supported
 Query supported by both expdp and impdp… and on a
per-table basis!
 Detach from and attach to running jobs
 Multiple clients per job allowed; but a single client can
attach to only one job at a time
 If privileged, attach to and control other users’ jobs
New Clients – expdp / impdp
 Interactive mode entered via Ctl-C:
–
–
–
–

–
–
–
–

ADD_FILE: Add dump files and wildcard specs. to job
PARALLEL: Dynamically add or remove workers
STATUS: Get detailed per-worker status and change
reporting interval
STOP_JOB{=IMMEDIATE}: Stop job, leaving it
restartable. Immediate doesn’t wait for workers to finish
current work items… they’ll be re-done at restart
START_JOB: Restart a previously stopped job
KILL_JOB: Stop job and delete all its resources (master
table, dump files) leaving it unrestartable
CONTINUE: Leave interactive mode, continue logging
EXIT: Exit client, leave job running
Features: Other Cool Stuff…
 DDL transformations are easy with XML:
–
–
–
–

REMAP_SCHEMA
REMAP_TABLESPACE
REMAP_DATAFILE
Segment and storage attributes can be suppressed

 Can extract and load just data, just metadata or both
 SQLFILE operation generates executable DDL script
 If a table pre-exists at load time, you can: skip it
(default), replace it, truncate then load or append to it.
 Space estimates based on allocated blocks (default) or
statistics if available
 Enterprise Manager interface integrates 9i and 10g
 Callable!
Architecture: Block Diagram
expdp

impdp

Other Clients:
Data Mining, etc

Enterprise
Manager

Data Pump
DBMS_DATAPUMP
Data/Metadata movement engine

Oracle_
Loader

Oracle_
DataPump

External Table API

Direct Path
API

Metadata
API:
DBMS_METADATA
Architecture:
Flow Diagram
User A:
expdp

Data, metadata & master table

Master
table

User A’s
shadow
process

Worker A:
metadata

Status queue:
Work-in-progress
and errors
Master
Control
Process
Log File

User B:
OEM

Dump File Set:

User B’s
shadow
process

Worker B:
Direct path
Parallel
Proc. 01
Worker C:
Ext. Table

Dynamic commands Command and
(stop, start, parallel etc) control queue

Parallel
Proc. 02
No Clients Required !

Dump File Set:
Data, metadata & master table

Master
table
Worker A:
metadata

Master
Control
Process
Log File

Worker B:
Direct path
Parallel
Proc. 01
Worker C:
Ext. Table

Parallel
Proc. 02
Data Pump: Performance Tuning
 Default initialization parameters are fine!
–

Make sure disk_asynch_io remains TRUE

 Spread the I/O!
 Parallel= no more than 2X number of CPUs:
Do not exceed disk spindle capacity.
–

Corollary:

SPREAD THE I/O !!!

 Sufficient SGA for AQ messaging and metadata API
queries
 Sufficient rollback for long running queries

That’s it!
Large Internet Company
2 Fact Tables: 16.2M rows, 2 Gb
Program

Elapsed

exp out of the box: direct=y

0 hr 10 min 40 sec

exp tuned: direct=y buffer=2M recordlength=64K

0 hr 04 min 08 sec

expdp out of the box: Parallel=1

0 hr 03 min 12 sec

imp out of the box

2 hr 26 min 10 sec

imp tuned: buffer=2M recordlength=64K

2 hr 18 min 37 sec

impdp out of the box: Parallel=1

0 hr 03 min 05 sec

With one index per table
imp tuned: buffer=2M recordlength=64K

2 hr 40 min 17 sec

impdp: Parallel=1

0 hr 25 min 10 sec
Oracle Applications Seed Database:
 Metadata intensive: 392K objects, 200 schemas,
10K tables, 2.1 Gb of data total
 Original exp / imp total:
32 hrs 50 min
–

exp:

2 hr 13 min

imp:

30 hrs 37 min.

 Data Pump expdp / impdp total:
–
–

15 hrs 40 min

expdp: 1 hr 55 min impdp: 13 hrs 45 min.
Parallel=2 for both expdp and impdp
Keep in Mind:
 Designed for *big* jobs with lots of data.
–
–

Metadata performance is about the same
More complex infrastructure, longer startup

 XML is bigger than DDL, but much more flexible
 Data format in dump files is ~15% more
compact than exp
 Import subsetting is accomplished by pruning
the Master Table
Original exp and imp

 Original imp will be supported forever to allow
loading of V5 – V9i dump files
 Original exp will ship at least in 10g, but may
not support all new functionality.
 9i exp may be used for downgrades from 10 g
 Original and Data Pump dump file formats are
not compatible
10g Beta Feedback
 British Telecom:
Ian Crocker, Performance & Storage Consultant

“We have tested Oracle Data Pump, the new Oracle10g Export
and Import Utilities. Data Pump Export performed twice as fast
as Original Export, and Data Pump Import performed ten times
faster than Original Import. The new manageability features
should give us much greater flexibility in monitoring job status.”

 Airbus Deutschland:
Werner Kawollek, Operation Application Management
“We have tested the Oracle Data Pump Export and Import
utilities and are impressed by their rich functionality. First tests
have shown tremendous performance gains in comparison to the
original export and import utilities"
Please visit our Demo!
Oracle Database 10g Data Pump: Faster and Better
Export / Import

Try out Data Pump’s Tutorial in the Oracle By Example
(OBE) area:
“Unloading and Loading Data Base Contents”
Q
&

Q U E S T I O N S
A N S W E R S

More Related Content

What's hot

Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Tanel Poder
 
Oracle ebs capacity_analysisusingstatisticalmethods
Oracle ebs capacity_analysisusingstatisticalmethodsOracle ebs capacity_analysisusingstatisticalmethods
Oracle ebs capacity_analysisusingstatisticalmethods
Ajith Narayanan
 
Spring batch for large enterprises operations
Spring batch for large enterprises operations Spring batch for large enterprises operations
Spring batch for large enterprises operations
Ignasi González
 
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIESORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
Ludovico Caldara
 
SAP HANA Distributed System Scaleout and HA
SAP HANA Distributed System Scaleout and HASAP HANA Distributed System Scaleout and HA
SAP HANA Distributed System Scaleout and HA
Linh Nguyen
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methods
Ajith Narayanan
 
nginx + ansible로 점검모드 만들기
nginx + ansible로 점검모드 만들기nginx + ansible로 점검모드 만들기
nginx + ansible로 점검모드 만들기
June Kim
 
Sql server logshipping
Sql server logshippingSql server logshipping
Sql server logshipping
Zeba Ansari
 
Flashback - The Time Machine..
Flashback - The Time Machine..Flashback - The Time Machine..
Flashback - The Time Machine..
Navneet Upneja
 
The Top 12 Features new to Oracle 12c
The Top 12 Features new to Oracle 12cThe Top 12 Features new to Oracle 12c
The Top 12 Features new to Oracle 12c
David Yahalom
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
Robert Sanders
 
Spring batch
Spring batchSpring batch
Spring batch
Chandan Kumar Rana
 
twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063
Madhusudan Anand
 
Les 17 sched
Les 17 schedLes 17 sched
Les 17 sched
Femi Adeyemi
 
SAP HANA System Replication - Setup, Operations and HANA Monitoring
SAP HANA System Replication - Setup, Operations and HANA MonitoringSAP HANA System Replication - Setup, Operations and HANA Monitoring
SAP HANA System Replication - Setup, Operations and HANA Monitoring
Linh Nguyen
 
Database upgradation
Database upgradationDatabase upgradation
Database upgradation
santosh kodandapani
 
Long live to CMAN!
Long live to CMAN!Long live to CMAN!
Long live to CMAN!
Ludovico Caldara
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
Rupak Roy
 
RMAN - New Features in Oracle 12c - IOUG Collaborate 2017
RMAN - New Features in Oracle 12c - IOUG Collaborate 2017RMAN - New Features in Oracle 12c - IOUG Collaborate 2017
RMAN - New Features in Oracle 12c - IOUG Collaborate 2017
Andy Colvin
 

What's hot (19)

Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
 
Oracle ebs capacity_analysisusingstatisticalmethods
Oracle ebs capacity_analysisusingstatisticalmethodsOracle ebs capacity_analysisusingstatisticalmethods
Oracle ebs capacity_analysisusingstatisticalmethods
 
Spring batch for large enterprises operations
Spring batch for large enterprises operations Spring batch for large enterprises operations
Spring batch for large enterprises operations
 
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIESORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
 
SAP HANA Distributed System Scaleout and HA
SAP HANA Distributed System Scaleout and HASAP HANA Distributed System Scaleout and HA
SAP HANA Distributed System Scaleout and HA
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methods
 
nginx + ansible로 점검모드 만들기
nginx + ansible로 점검모드 만들기nginx + ansible로 점검모드 만들기
nginx + ansible로 점검모드 만들기
 
Sql server logshipping
Sql server logshippingSql server logshipping
Sql server logshipping
 
Flashback - The Time Machine..
Flashback - The Time Machine..Flashback - The Time Machine..
Flashback - The Time Machine..
 
The Top 12 Features new to Oracle 12c
The Top 12 Features new to Oracle 12cThe Top 12 Features new to Oracle 12c
The Top 12 Features new to Oracle 12c
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Spring batch
Spring batchSpring batch
Spring batch
 
twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063
 
Les 17 sched
Les 17 schedLes 17 sched
Les 17 sched
 
SAP HANA System Replication - Setup, Operations and HANA Monitoring
SAP HANA System Replication - Setup, Operations and HANA MonitoringSAP HANA System Replication - Setup, Operations and HANA Monitoring
SAP HANA System Replication - Setup, Operations and HANA Monitoring
 
Database upgradation
Database upgradationDatabase upgradation
Database upgradation
 
Long live to CMAN!
Long live to CMAN!Long live to CMAN!
Long live to CMAN!
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
RMAN - New Features in Oracle 12c - IOUG Collaborate 2017
RMAN - New Features in Oracle 12c - IOUG Collaborate 2017RMAN - New Features in Oracle 12c - IOUG Collaborate 2017
RMAN - New Features in Oracle 12c - IOUG Collaborate 2017
 

Viewers also liked

Oracle RDS Data pump
Oracle RDS Data pumpOracle RDS Data pump
Oracle RDS Data pump
Jongwon
 
Oracle
OracleOracle
Comandos Editor VI
Comandos Editor VIComandos Editor VI
Comandos Editor VI
Usa
 
Less17 moving data
Less17 moving dataLess17 moving data
Less17 moving data
Amit Bhalla
 
Mastering the Oracle Data Pump API
Mastering the Oracle Data Pump APIMastering the Oracle Data Pump API
Mastering the Oracle Data Pump API
Enkitec
 
Oracle data pump
Oracle data pumpOracle data pump
Oracle data pump
marcxav72
 
SQL Developer for DBAs
SQL Developer for DBAsSQL Developer for DBAs
SQL Developer for DBAs
Leighton Nelson
 
Less17 Util
Less17  UtilLess17  Util
Less17 Util
vivaankumar
 
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Amazon Web Services
 

Viewers also liked (9)

Oracle RDS Data pump
Oracle RDS Data pumpOracle RDS Data pump
Oracle RDS Data pump
 
Oracle
OracleOracle
Oracle
 
Comandos Editor VI
Comandos Editor VIComandos Editor VI
Comandos Editor VI
 
Less17 moving data
Less17 moving dataLess17 moving data
Less17 moving data
 
Mastering the Oracle Data Pump API
Mastering the Oracle Data Pump APIMastering the Oracle Data Pump API
Mastering the Oracle Data Pump API
 
Oracle data pump
Oracle data pumpOracle data pump
Oracle data pump
 
SQL Developer for DBAs
SQL Developer for DBAsSQL Developer for DBAs
SQL Developer for DBAs
 
Less17 Util
Less17  UtilLess17  Util
Less17 Util
 
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
 

Similar to 40043 claborn

Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
jade_22
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
Dr Anjan Krishnamurthy
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Christian Gohmann
 
Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorial
tekslate1
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesCOUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_Features
Alfredo Abate
 
Optimizing your Database Import!
Optimizing your Database Import! Optimizing your Database Import!
Optimizing your Database Import!
Nabil Nawaz
 
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
Kaxil Naik
 
Watch Re-runs on your SQL Server with RML Utilities
Watch Re-runs on your SQL Server with RML UtilitiesWatch Re-runs on your SQL Server with RML Utilities
Watch Re-runs on your SQL Server with RML Utilities
dpcobb
 
Dataflow.pptx
Dataflow.pptxDataflow.pptx
Dataflow.pptx
Sadeka Islam
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
Dori Waldman
 
Pentaho etl-tool
Pentaho etl-toolPentaho etl-tool
Pentaho etl-tool
Sreenivas Kappala
 
Synapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipeline
Calvin French-Owen
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
Tao Feng
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_Jeff
Jeff McQuigg
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
Databricks
 
Von neumann workers
Von neumann workersVon neumann workers
Von neumann workers
riccardobecker
 
Less18 moving data
Less18 moving dataLess18 moving data
Less18 moving data
Imran Ali
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
Databricks
 
Exploring plsql new features best practices september 2013
Exploring plsql new features best practices   september 2013Exploring plsql new features best practices   september 2013
Exploring plsql new features best practices september 2013
Andrejs Vorobjovs
 

Similar to 40043 claborn (20)

Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTS
 
Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorial
 
COUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_FeaturesCOUG_AAbate_Oracle_Database_12c_New_Features
COUG_AAbate_Oracle_Database_12c_New_Features
 
Optimizing your Database Import!
Optimizing your Database Import! Optimizing your Database Import!
Optimizing your Database Import!
 
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
 
Watch Re-runs on your SQL Server with RML Utilities
Watch Re-runs on your SQL Server with RML UtilitiesWatch Re-runs on your SQL Server with RML Utilities
Watch Re-runs on your SQL Server with RML Utilities
 
Dataflow.pptx
Dataflow.pptxDataflow.pptx
Dataflow.pptx
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Pentaho etl-tool
Pentaho etl-toolPentaho etl-tool
Pentaho etl-tool
 
Synapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipeline
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_Jeff
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Von neumann workers
Von neumann workersVon neumann workers
Von neumann workers
 
Less18 moving data
Less18 moving dataLess18 moving data
Less18 moving data
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
 
Exploring plsql new features best practices september 2013
Exploring plsql new features best practices   september 2013Exploring plsql new features best practices   september 2013
Exploring plsql new features best practices september 2013
 

Recently uploaded

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 

Recently uploaded (20)

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 

40043 claborn

  • 1.
  • 2. Session id: 40043 Data Pump in Oracle Database 10g : Foundation for Ultra-High Speed Data Movement Utilities George H. Claborn Data Pump Technical Project Leader Oracle Corporation New England Development Center
  • 3. Data Pump: Overview       What is it? Main Features Architecture Performance Things to keep in mind Some thoughts on original exp / imp
  • 4. Data Pump: What is it?  Server-based facility for high performance loading and unloading of data and metadata  Callable: DBMS_DATAPUMP. Internally uses DBMS_METADATA  Data written in Direct Path stream format. Metadata written as XML  New clients expdp and impdp: Supersets of original exp / imp.  Foundation for Streams, Logical Standby, Grid, Transportable Tablespaces and Data Mining initial instantiation.
  • 5. Features: Performance!!  Automatic, two-level parallelism – – – – –     Direct Path for inter-partition parallelism External Tables for intra-partition parallelism Simple: parallel=<number of active threads> Dynamic: Workers can be added and removed from a running job in Enterprise Edition Index builds automatically “parallelized” up to degree of job Simultaneous data and metadata unload Single thread of data unload: 1.5-2X exp Single thread of data load: 15X-40X imp With index builds: 4-10X imp
  • 6. Features: Checkpoint / Restart  Job progress recorded in a “Master Table”  May be explicitly stopped and restarted later: – Stop after current item finishes or stop immediate  Abnormally terminated job is also restartable  Current objects can be skipped on restart if problematic
  • 7. Features: Network Mode  Network import: Load one database directly from another  Network export: Unload a remote database to a local dumpfile set – Allows export of read-only databases  Data Pump runs locally, Metadata API runs remotely.  Uses DB links / listener service names, not pipes. Data is moved as ‘insert into <local table> select from <remote table>@service_name’  Direct path engine is used on both ends  It’s easy to swamp network bandwidth: Be careful!
  • 8. Features: Fine-Grained Object Selection  All object types are supported for both operations: export and import  Exclude: Specified object types are excluded from the operation  Include: Only the specified object types are included. E.g, just retrieve packages, functions and procedures  More than one of each can be specified, but use of both is prohibited by new clients  Both take an optional name filter for even finer granularity: – – INCLUDE PACKAGE: “LIKE ‘PAYROLL%’ “ EXCLUDE TABLE: “IN (‘FOO’,’BAR’, … )’ “
  • 9. Features: Monitoring  Flexible GET_STATUS call  Per-worker status showing current object and percent done  Initial job space estimate and overall percent done  Job state and description  Work-in-progress and errors
  • 10. Features: Dump File Set Management  Directory based: E.g, DMPDIR:export01.dmp where DMPDIR created as: SQL> create directory dmpdir as ‘/data/dumps’  Multiple, wildcarded file specifications supported: dumpfile=dmp1dir:full1%u.dmp, dmp2dir:full2%u.dmp – Files are created as needed on a round-robin basis from available file specifications  File size can be limited for manageability  Dump file set coherency automatically maintained
  • 11. New Clients – expdp / impdp  Similar (but not identical) look and feel to exp / imp  All modes supported: full, schema, table, tablespace, transportable. Superset of exp / imp  Flashback is supported  Query supported by both expdp and impdp… and on a per-table basis!  Detach from and attach to running jobs  Multiple clients per job allowed; but a single client can attach to only one job at a time  If privileged, attach to and control other users’ jobs
  • 12. New Clients – expdp / impdp  Interactive mode entered via Ctl-C: – – – – – – – – ADD_FILE: Add dump files and wildcard specs. to job PARALLEL: Dynamically add or remove workers STATUS: Get detailed per-worker status and change reporting interval STOP_JOB{=IMMEDIATE}: Stop job, leaving it restartable. Immediate doesn’t wait for workers to finish current work items… they’ll be re-done at restart START_JOB: Restart a previously stopped job KILL_JOB: Stop job and delete all its resources (master table, dump files) leaving it unrestartable CONTINUE: Leave interactive mode, continue logging EXIT: Exit client, leave job running
  • 13. Features: Other Cool Stuff…  DDL transformations are easy with XML: – – – – REMAP_SCHEMA REMAP_TABLESPACE REMAP_DATAFILE Segment and storage attributes can be suppressed  Can extract and load just data, just metadata or both  SQLFILE operation generates executable DDL script  If a table pre-exists at load time, you can: skip it (default), replace it, truncate then load or append to it.  Space estimates based on allocated blocks (default) or statistics if available  Enterprise Manager interface integrates 9i and 10g  Callable!
  • 14. Architecture: Block Diagram expdp impdp Other Clients: Data Mining, etc Enterprise Manager Data Pump DBMS_DATAPUMP Data/Metadata movement engine Oracle_ Loader Oracle_ DataPump External Table API Direct Path API Metadata API: DBMS_METADATA
  • 15. Architecture: Flow Diagram User A: expdp Data, metadata & master table Master table User A’s shadow process Worker A: metadata Status queue: Work-in-progress and errors Master Control Process Log File User B: OEM Dump File Set: User B’s shadow process Worker B: Direct path Parallel Proc. 01 Worker C: Ext. Table Dynamic commands Command and (stop, start, parallel etc) control queue Parallel Proc. 02
  • 16. No Clients Required ! Dump File Set: Data, metadata & master table Master table Worker A: metadata Master Control Process Log File Worker B: Direct path Parallel Proc. 01 Worker C: Ext. Table Parallel Proc. 02
  • 17. Data Pump: Performance Tuning  Default initialization parameters are fine! – Make sure disk_asynch_io remains TRUE  Spread the I/O!  Parallel= no more than 2X number of CPUs: Do not exceed disk spindle capacity. – Corollary: SPREAD THE I/O !!!  Sufficient SGA for AQ messaging and metadata API queries  Sufficient rollback for long running queries That’s it!
  • 18. Large Internet Company 2 Fact Tables: 16.2M rows, 2 Gb Program Elapsed exp out of the box: direct=y 0 hr 10 min 40 sec exp tuned: direct=y buffer=2M recordlength=64K 0 hr 04 min 08 sec expdp out of the box: Parallel=1 0 hr 03 min 12 sec imp out of the box 2 hr 26 min 10 sec imp tuned: buffer=2M recordlength=64K 2 hr 18 min 37 sec impdp out of the box: Parallel=1 0 hr 03 min 05 sec With one index per table imp tuned: buffer=2M recordlength=64K 2 hr 40 min 17 sec impdp: Parallel=1 0 hr 25 min 10 sec
  • 19. Oracle Applications Seed Database:  Metadata intensive: 392K objects, 200 schemas, 10K tables, 2.1 Gb of data total  Original exp / imp total: 32 hrs 50 min – exp: 2 hr 13 min imp: 30 hrs 37 min.  Data Pump expdp / impdp total: – – 15 hrs 40 min expdp: 1 hr 55 min impdp: 13 hrs 45 min. Parallel=2 for both expdp and impdp
  • 20. Keep in Mind:  Designed for *big* jobs with lots of data. – – Metadata performance is about the same More complex infrastructure, longer startup  XML is bigger than DDL, but much more flexible  Data format in dump files is ~15% more compact than exp  Import subsetting is accomplished by pruning the Master Table
  • 21. Original exp and imp  Original imp will be supported forever to allow loading of V5 – V9i dump files  Original exp will ship at least in 10g, but may not support all new functionality.  9i exp may be used for downgrades from 10 g  Original and Data Pump dump file formats are not compatible
  • 22. 10g Beta Feedback  British Telecom: Ian Crocker, Performance & Storage Consultant “We have tested Oracle Data Pump, the new Oracle10g Export and Import Utilities. Data Pump Export performed twice as fast as Original Export, and Data Pump Import performed ten times faster than Original Import. The new manageability features should give us much greater flexibility in monitoring job status.”  Airbus Deutschland: Werner Kawollek, Operation Application Management “We have tested the Oracle Data Pump Export and Import utilities and are impressed by their rich functionality. First tests have shown tremendous performance gains in comparison to the original export and import utilities"
  • 23. Please visit our Demo! Oracle Database 10g Data Pump: Faster and Better Export / Import Try out Data Pump’s Tutorial in the Oracle By Example (OBE) area: “Unloading and Loading Data Base Contents”
  • 24. Q & Q U E S T I O N S A N S W E R S

Editor's Notes

  1. Server, not client Write your own export and import! DP stream format mirrors what’s on disk… minimal conversions required. Caveat the “superset” comment
  2. Describe req. gathering: Perf. Was customer reqs 1, 2 &amp; 3. 3. 1.5-2X exp direct path! Even faster against exp conventional 4. See me after for some tricks on how to increase original imp performance
  3. 2nd most requested enhancement
  4. Ask, “How many do the pipe trick?” Both operations support a “network mode” where the source is a remote instance.
  5. 1. Original exp: grants, indexes, triggers and constraints *only*.
  6. Explain security issue. If dmpdir1 pts to data1 and dmpdir2 to data2, then files will be created as: /data1/full101.dmp, /data2/full201.dmp, /data1/full102.dmp, /data2/full202.dmp 3. However, specified size must be large enough to hold entire Master Table at end of export. 4. File specs. Must span entire dump set at import time.
  7. 2. XML schemas and views are not yet supported in Data Pump 5. Start a job up when leaving the office and monitor it from home.
  8. A whole slide to the very cool interactive mode…
  9. Explain XML 2. Original exp can’t do just data. Caveat on “truncate”: Target of ref. constraints. Append is default for data-only.
  10. 1. Important on those few platforms that don’t support asynch I/O by default.