2. Introduction to TERADATA
Flow through TERADATA UTILITIES with Lab
Examples
Bteq
Fast Load
Fast Export
Muilti Load
Tpump
Comparative study of the Teradata loading
utilities.
Agenda
3. Teradata is a Relational Database Management System
(RDBMS) that drives a company’s data warehouse.
Teradata is an open system, compliant with industry ANSI
standards.
It is currently available for the following operating systems:
· UNIX MP-RAS
· Windows 2000
The ability to manage terabytes of data is accomplished using
the concept of parallelism, wherein many individual processors
perform smaller tasks concurrently to accomplish an operation
against a huge repository of data.
To date, only parallel architectures can handle databases of this
size.
What is Teradata?
4. There are many reasons to choose Teradata as the preferred
platform for enterprise data warehousing:
Supports easy scalability from a small (10 GB) to a massive
(100+TB) database.
Automatic and even data distribution eliminates complex
indexing schemes or time-consuming reorganizations.
Designed and built with parallelism from day one .
Single operational view of the entire MPP system and single
point of control for the DBA (Teradata Manager).
Teradata has been doing data warehousing longer than any
other vendor.
Why Teradata?
5. Start Smaller and Grow: One Experience
200-300 users Over 7500 users
30 concurrent users Over 2000 concurrent users
300 GB disk space Over 50 TB user data
1.7 billion-row table Over 7.5 billion -row table
200 queries per day Over 20,000 queries per day
30M-row batch per night Over 500M-row batch per night
1 main application Over 30 applications
BUT ONE REMAINS CONSTANST
Scalability in a Production Environment
6. i . e, Through Put.
ADVANTAGE of TERADATA
Ease of setup and maintenance
No reorganization of data needed
Most robust utilities in the industry
Low cost of disk to data ratio
Ease in expanding the system
ADVANTAGE of TERADATA
8. Each Teradata Node is made up of hardware and software
Each node has CPUs, system disk, memory and adapters
Each node runs copy of OS and database SW
Node Architecture(Shared Nothing)
9. BYNET
The BYNET performs the internal communication of the
Teradata RDBMS
All communication between PEs and AMPs is done via the
BYNET
BYNET in TERADATA
Boardless BYNET
Single-node SMP systems use Board less Bynet ( or virtual BYNET
software to simulate the Bynet hardware driver.
10. Disk Arrays
A Disk Array is a configuration of disk drives that utilizes
specialized controllers to manage and distribute data and
parity across the Disks while providing fast access and data
integrity.
Clique
A Clique is a set of Teradata nodes that share a common set
of disk arrays.
In the event of failure, all virtual processors can migrate to
another available node in the clique.
All nodes in the clique must have access to the same disk
arrays.
Disk Arrays & Clique
11. Hot Standby Nodes
The hot standby Node feature allows spare nodes to be
incorporated into the production environment so that Teradata
Database can take advantage of the presence of the spare nodes
to improve availability and maintain performance levels.
Is Member of a Clique
Does not normally participate in the trusted parallel
application(TPA).
Can be brought into the TPA to compensate for the loss of a
node in the Clique
12. Virtual Processors
The versatility of Teradata Database is based on virtual
processors(vprocs) that eliminate dependency on specialized
physical processors. Vprocs are a set of software processes that
run on a node under Teradata Parallel Database Extensions(PDE)
within the multitasking environment of the operating system.
The two types of vprocs
PE Parsing Engine)
AMP Access module processor
13. AMP
The AMP is a virtual processor designed for and dedicated to managing a
portion of the entire database.
An AMP will control some portion of each table on the system.
It performs all database management functions such as sorting,
aggregating and formatting data.
The AMP receives data from the PE, formats rows and distributes them to
the disk storage units it controls.
The AMP also retrieves the rows requested by the PE.
Vproc
PE
A Parsing Engine (PE) is a virtual processor that manages the dialogue
between the client application and the RDBMS.
It interprets the SQL requests, receives input records and passes data.
It is made of the following software components: Session Control, the
Parser, the Optimizer and the Dispatcher
14. Table A rows
Table B rows
AMP AMP AMP AMP
• The rows of every table are distributed among all AMPs
• Each AMP is responsible for a subset of the rows of each table.
• Ideally, each table will be evenly distributed among all AMPs.
• Evenly distributed tables result in evenly distributed workloads.
• The uniformity of distribution of the rows of a table depends on the choice
of the Primary Index.
Data Store on Disks
15. Request processing
NODE
Parsing Engine Parsing Engine
BYNET
AMP AMP AMP AMP
Disk Storage Disk Storage Disk Storage Disk Storage
SQL Request Answer Set Response
16. The major Teradata utility that assists in data
warehousing management and maintenance along
with the Teradata RDBMS are
BTEQ
FASTLOAD
FAST EXPORT
MULTILOAD
TPUMP
TERADATA UTILITIES INTRODUCTION
17. General-purpose, command-based program that allows users
on a workstation to communicate with one or more Teradata
Database systems.
A set of SQL statements used to inserts updates or deletes in
Teradata tables.
Imports data to Teradata database from a file.
Exports data from table and formats the results and returns
them to the screen, a file, or to a designated printer.
Do report the error occurs but will not capture it as log.
BTEQ - Basic Teradata Query
18. Enter Teradata SQL statements to view, add, modify, and
delete data.
Enter operating system commands.
Create and use Teradata stored procedures
BTEQ supports Teradata-specific SQL functions for doing
complex analytical querying and data mining
All database requests in BTEQ are expressed in Teradata SQL.
BTEQ also supports the conditional logic (i.e., "IF.THEN...")
based on activity count or error code. It is useful for batch
mode export / import processing.
Error handling is applicable in BTEQ. We can assign error level
for each error code and make decisions based on the level
assigned.
Capabilities in BTEQ
19. Interactive mode
you start a BTEQ session by entering BTEQ logon at the system
prompt on your terminal and submit SQL commands to the
database as needed.
Format of logon cmd: bteq .logon server name/user_name,
password
Batch mode
In batch mode, you prepare BTEQ scripts or macros, and then
submit them to BTEQ from a scheduler or manually for
processing.
A BTEQ script is a set of SQL statements and BTEQ commands
saved in a file with the extension ".bteq".
The BTEQ script can be run using the following command (in
UNIX or Windows)
OPERATING MODES in BTEQ
20. Export BTEQ by default delivers a response to all SQL
queries that includes a helpful message along with helpful
diagnostic information about the time taken to perform the
query.
If all of this information is captured in a single output file, this
mixed output typically renders the data unsuitable for some
other purposes.
So the .EXPORT feature provides the ability to separate the
report or output data to a separate file.
The output file of this script will contain only the messages and
not the data. It is exported to a file which can be used for some
other purposes also.
Export types are export record , export data, export reset ,
export indicdata, export dif
EXPORT in BTEQ
21. Import data from host to Teradata as a series of inserts updates
and deletes.
Import types supported are
import data
import record
import indicdata.
IMPORT in BTEQ
22. All the BTEQ commands must be preceded by a dot ‘.’
character and also BTEQ commands may or may not end with a
semicolon ‘;’.
They are of four types as
Session control
File control
Format control
Sequence control commands
BTEQ COMMANDS
23. Report formatting.
Ad hoc query tool .
Database administration .
Best for small data volumes.
BTEQ Advantages
24. Lab.sh
#! /bin/sh
.logon tdprd/username, pwd;
.Export report File=lab.txt
.set record vartext "|";
.BEGIN LOADING emp ERRORFILES Error_1, Error_2;
DEFINE
empno (VARCHAR (50)),
empname (VARCHAR (50)),
doj (VARCHAR (30))
FILE = /ngs/app/asrdedwp/SCRIPTS/emp.txt;
.Set Underline Off;
.Set Titledashes Off;
.Set Errorout Stdout;
.Set Width 4000;
select * From table_name where …;
Delete from table_name where…….;
Insert into table name values(….);
Update table name set where…..;
Call macro, procedure etc…
.if errorcode <> 0 then .exit 2
.export reset
.logoff
Bteq Lab Exercise
25. FastLoad- Fload or FL is a multi - sessioned parallel load utility
for initial table load in bulk mode on a Teradata Database.
It is a command-driven utility to load large data into an empty
table on a Teradata RDBMS with no secondary indexes.
It uses multiple database sessions to load data.
FL-FASTLOAD
26. Full Restart capability.
Checkpoints provided for restart.
Checkpoints slow fast load processing. Set the checkpoint large
enough to be taken every 10 to 15 minutes.
Two Error tables and Error Limits, accessible using SQL.
In one Error table, rows which failed due to constraints or
translation errors are loaded. In another table duplicate rows
for UPIs are captured.
Error table is loaded with one row at a time, so errors slow
down the performance of fastload.
FASTLOAD Capability
27. Phase 1
FastLoad uses one SQL session to define AMP steps
The PE sends a block to each AMP
Amps hash each record and redistribute them to the AMP responsible for
the hash value
Records are written to the target table in unsorted blocks
Phase 2
starts after .end loading command. So if this command not specified
fast load will be paused and not terminated.
When loading completes, each AMP sorts the target table, puts the rows
into blocks, and writes the blocks to the disk
Fall back rows are then generated if required
Fast Load Operates in two phases
28. Interactive mode
In interactive mode, Teradata FastLoad uses terminal
screen and keyboard as the standard output and input
streams.
For Interactive mode, fastload .logon tdprd/user_id,
pwd
Batch mode
In batch mode, FastLoad uses > and < to redirect the
standard output / input streams.
For Batch mode, fastload [options] < infile > outfile
Here, the infile is a Teradata FastLoad job script file and
the outfile is the FastLoad output stream file.
OPERATING MODES in Fast Load
29. CREATE TABLE
Defines the columns, index and other qualities of a table
DATABASE
Changes the default database
DELETE
Deletes rows from a table
DROP TABLE
Removes a table and all of its rows from a database
INSERT
Inserts rows into a table
SQL Statements Supported in FastLoad
30. .LOGON TDP/username,pwd .LOGON TDP/username,pwd;
errlimit 1;
tenacity 4;
sleep 6;
DROP TABLE ;
SET RECORD UNFORMATTED;
.begin loading filename errorfiles filename_ref1, filename_ref2;
Define
feilds
file=stg_sref_service_price.out;
show;
checkpoint 0 ;
INSERT INTO table name
(
)
VALUES
(
) ;
end loading;
logoff;;
Lab Excercise FastLoad
31. Teradata FastExport, also called "FastExport" or "FE," is a
multi-sessioned command-driven utility for export in bulk mode
from tables and views of the Teradata Database to a client-
based application.
It is the reverse of the Teradata FastLoad utility.
Teradata FastExport processes a series of FastExport
commands and Teradata SQL statements written in a batch
mode job script or interactively entered.
The FastExport commands provide the session control and data
handling specifications for the data transfer operations, and the
Teradata SQL statements perform the actual data export
functions on the Teradata RDBMS tables and views
FE—Fast Export
32. Fully automated Restart.
Export from multiple tables
There are two techniques to provide variable inputs to fastexport
for selection controls.
They are
ACCEPT from a parameter file; only accept from a single
record
IMPORT from a data file; each import record is applied to
every select.
Capability of Fast Export
33. Interactive mode;
In interactive mode, Teradata FastLoad uses terminal screen and keyboard
as the standard output and input streams.
Interactive mode for Microsoft Windows: c:ncrfexq
Batch mode;
Batch mode for Microsoft Windows: c:ncrfexq [options] < infile >
outfile
In batch mode, FastExport uses > and < to redirect the standard output /
input streams.
Operating Modes in Fast Export
34. CREATE TABLE
Defines the columns, index and other qualities of a table
DATABASE
Changes the default database
DELETE
Deletes rows from a table
DROP TABLE
Removes a table and all of its rows from a database
INSERT
Inserts rows into a table
SQL Statements Supported in Fast Export
35. ALTER TABLE
Changes the column configuration or options of an existing table
COLLECT STATISTICS
Collects statistical data for one or more columns of a table
COMMENT
Stores or retrieves comment string associated with a database object
CREATE DATABASE,MACRO,TABLE,VIEW
Creates a new database, macro, table, or view
DATABASE
Specifies a new default database for the current session
DELETE
Removes rows from a table
SQL Statements Supported in Fast Export
36. DELETE DATABASE
Removes all tables, views, and macros from a database
DROP DATABASE
Drops the definition for an empty database from the Data Dictionary
DROP TABLE
Removes a table from the database
GIVE
Transfers ownership of a database to another user
GRANT
Grants access privileges to a database object
INSERT
Inserts new rows to a table
SQL Statements Supported in FastExport
37. RENAME
Changes the name of an existing table, view, or macro
REPLACE MACRO,VIEW
Redefines an existing macro or view
REVOKE
Rescinds access privileges to a database object
UPDATE
Changes the column values of an existing row in a table
SQL Statements Supported in FastExport
38. RENAME
Changes the name of an existing table, view, or macro
REPLACE MACRO,VIEW
Redefines an existing macro or view
REVOKE
Rescinds access privileges to a database object
UPDATE
Changes the column values of an existing row in a table
SQL Statements Supported in FastExport
39. RENAME
Changes the name of an existing table, view, or macro
REPLACE MACRO,VIEW
Redefines an existing macro or view
REVOKE
Rescinds access privileges to a database object
UPDATE
Changes the column values of an existing row in a table
SQL Statements Supported in FastExport
40. .LOGTABLE utillog;
.LOGON tdpz/user,pswd;
.BEGIN EXPORT
SESSIONS 20;
.LAYOUT UsingData;
.FIELD ProjId * Char(8);
.FIELD WkEnd * Date;
.IMPORT INFILE ddname1
LAYOUT UsingData;
.EXPORT OUTFILE ddname2;
SELECT
EmpNo,
Hours
FROM CHARGES
WHERE WkEnd = :WkEnd /* these input variables are refered from imported
AND Proj_ID = :ProjId input file */
ORDER BY EmpNo;
.END EXPORT;
.LOGOFF;
Lab Exercise Fast Export
41. MultiLoad - MLoad or ML is a command-driven parallel
load utility for high-volume batch maintenance on
multiple tables and views of the Teradata Database
Multi Load
42. Teradata MultiLoad executes a series of MultiLoad commands
and Teradata SQL statements written in a batch mode job
script or interactively entered
Supports up to five populated tables
Fastload like technology – Tpump like functionality
Multiple operations with one pass of input files
Conditional logic for applying changes
Supports INSERTs, UPDATEs, DELETEs and UPSERTs
Full restart capability
Error reporting via error tables
Support for INMODs
Features of MultiLoad
43. ALTER TABLE
Changes the column configuration or options of an existing table
COLLECT STATISTICS
Collects statistical data for one or more columns of a table
COMMENT
Stores or retrieves comment string associated with a database object
CREATE DATABASE,MACRO,TABLE,VIEW
Creates a new database, macro, table, or view
DATABASE
Specifies a new default database for the current session
DELETE
Removes rows from a table
DELETE DATABASE
Removes all tables, views, and macros from a database
SQL Statements Supported in MultiLoad
44. DROP DATABASE
Drops the definition for an empty database from the Data Dictionary
DROP TABLE
Removes a table from the database
GIVE
Transfers ownership of a database to another user
GRANT
Grants access privileges to a database object
INSERT
Inserts new rows to a table
RENAME
Changes the name of an existing table, view, or macro
REPLACE MACRO,VIEW
Redefines an existing macro or view
REVOKE
Rescinds access privileges to a database object
UPDATE
Changes the column values of an existing row in a table
SQL Statements Supported in MultiLoad
45. Interactive mode
In Interactive mode, Teradata MultiLoad uses terminal screen and keyboard
as the standard output and input streams.
Interactive mode for Microsoft Windows: c:ncrbinMultiLoad
Batch mode
In batch mode MultiLoad uses > and < to redirect the standard output /
input streams.
Batch mode for Microsoft Windows : c:ncrbinMultiLoad [options] <
infile > outfile
infile is a Teradata MultiLoad job script file and the outfile is the output
stream file.
SQL Statements Supported in MultiLoad
46. IMPORT task
These are the tasks which intermix a number of different SQL/DML
statements and apply them to up to five different tables depending on the
APPLY conditions
Import tasks are always primary index operations, but not allowed to
change the value of table’s primary index.
Allows restart and checkpoint during each operating phase.
Import tasks cannot be done on tables with USI’s, Referential Integrity, Join
Indexes, Hash Indexes, and Triggers.
Phases involved in this task are
Preliminary – Basic set up
DML phase – Get DML steps down on Amps
Acquisition phase – Send the input data to Amps and sort it
Application phase – Send the input data to target tables
End phase – Basic clean up
MULTILOAD TASKS
47. Basic set up involves validate all sql, starts all sessions, create
work tables (one per target), error tables (two per target),
restart log table (one per table), apply locks to target tables (to
prevent access to target while loading).
Basic clean up involves session logoff, dropping error and work
tables, releasing table locks.
DELETE task
These are tasks which execute a single DELETE statement on a
single table.
MULTILOAD TASKS
48. Each MultiLoad import task can do multiple data insert, update,
and delete functions on up to five different tables or views;
Each MultiLoad import task can have up to 100 DML steps;
Each MultiLoad delete task can remove large numbers of rows
from a single table.
Multi Load Advantage
50. Teradata TPump, short for "Teradata Parallel Data Pump," is a continuous
data-loading utility used to move data into the Teradata Database without
locking the affected tables.
Instead of updating Teradata Databases overnight, or in batches throughout
the day, TPump updates information in near real-time or real time, acquiring
data from the client system with low processor utilization.
This parallel utility is featured by stream-mode loading its SQL-based, but
not block-based, protocol.
TPUMP
51. DATABASE
Changes the default database qualification for all DML statements.
DELETE
Removes specified rows from a table
EXECUTE
Specifies a user-created (predefined) macro for execution.The macro
named in this statement resides in the Teradata Database and
specifies the type of DML statement (INSERT, UPDATE, or DELETE) being
handled by the macro.
INSERT
Adds new rows to a table by directly specifying the row data to be
inserted
UPDATE
Changes field values in existing rows of a table.
SQL Statements Supported in TPump
52. Interactive mode;
In interactive mode, Teradata Tpump uses terminal screen and keyboard as
the standard output and input streams, involving the more or less
continuous participation of the user.
Batch mode.
In batch mode, Teradata Tpump processes data in discrete groups of
previously scheduled operations, typically in a separate operation, rather
than interactively or in real time.
Operation Modes in TPump
53. Its setup does not require staging of data, intermediary files, or special
hardware;
Its operation is not affected by database restarts, dirty data, and network
slowdowns.
Its jobs restart without intervention;
Fast, scalable continuous data loads
Row hash lock enables concurrent queries
Dynamic throttling feature
Best for small data volumes
Multiple sessions and multistatement request are typically used to increase
throughput.
TPump also provides a dynamic throttling feature that enables it to run “all
out” during batch windows, but within limits when it may impact other
business uses of the Teradata RDBMS. Operators can specify the number of
statements run per minute, or may alter throttling minute-by-minute, if
necessary.
TPUMP Advantages