Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
INTRODUCTION TO
TERADATA DATA WAREHOUSE
SYSTEM ARCHITECTURE
PREPARED BY: MOHAMED TAHOON
1. What’s a Teradata DWH System ?
2. SMP vs MPP
3. Shared-Everything vs Shared-Nothing architecture
4. Hardware architectu...
3
1 What is Teradata DWH system?
• RDBMS designed to run the world’s largest databases
• Latest Intel technology nodes
• S...
2. SMP VS MPP
- Multiple CPU’s serving
separate processes Simultaneously
- Shared Everything
- All CPU’s Share Same Memory...
1. What’s a Teradata DWH System ?
2. SMP vs MPP
3. Shared-Everything vs Shared-Nothing architecture
4. Hardware architectu...
6
SHARED-NOTHINGSHARED-EVERYTHING
- Disk controllers and bandwidth shared
- Synchronization required across nodes
- Large ...
1. What’s a Teradata DWH System ?
2. SMP vs MPP
3. Shared-Everything vs Shared-Nothing architecture
4. Hardware architectu...
8
4 Teradata Hardware Architecture
• SMP Nodes
> Latest Intel SMP CPUs
> Configured in 2+1 node cliques
> Linux or Windows...
4.1 CLIQUES
• Group nodes together by multiported
access to common disk array units.
• Inter-node disk array connections a...
4.2 HOT STANDBY NODES
Improves availability and maintain performance levels
in the event of a node failure.
What’s a Hot S...
1. What’s a Teradata DWH System ?
2. SMP vs MPP
3. Shared-Everything vs Shared-Nothing architecture
4. Hardware architectu...
5 Node Architecture (‘Shared Nothing’)
Each Teradata Node is made up of hardware and software
• Each node runs copy of OS,...
5.1 PARALLEL DATABASE EXTENSIONS - PDE
Software interface layer lies between O. S. & TD DB which enables The database to :...
5.2 VIRTUAL PROCESSORS – WHAT IS IT
What is it:
• Set of software processes that run on a node under Teradata (PDE).
• Eli...
5.2 VIRTUAL PROCESSORS – VPROC TYPES
GTW
• Gateway VPROCs provide a socket interface to Teradata Database
PE
• Parsing Eng...
5.3 PARSING ENGINE ‘PE’
• Communicates with the client system on one side and with the AMPs (via the BYNET) on the other s...
5.4 ACCESS MODULE PROCESSOR ‘AMP’
• The AMP VPROC manages Teradata Database interactions with the disk subsystem.
• Each A...
5.4 THE AMPS – REQUEST PROCESSING
• The BYNET transmits messages to and from the AMPS and PEs.
• An AMP step can be sent t...
5.5 DISK ARRAYS
Logical Units ‘Lun’
• The RAID Manager uses drive groups DG
• DG is a set of drives that have been configu...
1. What’s a Teradata DWH System ?
2. SMP vs MPP
3. Shared-Everything vs Shared-Nothing architecture
4. Hardware architectu...
6 REQUEST PROCESSING – “LIFETIME OF A QUERY”
1. The Parser
•Checks Request cache to determine
if the request is already th...
P.45 REQUEST PROCESSING – “1. THE PARSER”
1. The Parser
• Checks if the request in Request cache:
• IF IN = > Go to Step (...
P.45 REQUEST PROCESSING – “2. SYNTAXER, 3. RESOLVER”
1. The Parser
•Checks Request cache to
determine if the request is
al...
P.45 REQUEST PROCESSING – “4. SECURITY MODULE, 5. - 6. OPTIMIZER”
1. The Parser
•Checks Request cache to determine if the
...
P.45 REQUEST PROCESSING – “SEPARATE ORIGINAL”
1. The Parser
•Checks Request cache to determine if the request is
already t...
P.45 DISPATCHER – REQUEST PROCESSING
• controls the sequence in which steps are executed:
• It also passes the steps to th...
27
Upcoming SlideShare
Loading in …5
×

Teradata introduction - A basic introduction for Taradate system Architecture

6,408 views

Published on

- What’s a Teradata DWH System ?
- SMP vs MPP
- Shared-Everything vs Shared-Nothing architecture
- Hardware architecture
- Node architecture
- SQL Request Processing

Published in: Technology
  • Nice guide for someone who is starting to prepare for Teradata certification. I recently enrolled at e-learnify.in. They are one of the best training providers for Teradata training. Explore the course structure here : http://www.e-learnify.in/course/data-warehousing-teradata-certification-training.php
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • For big data teradata online training register at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Teradata introduction - A basic introduction for Taradate system Architecture

  1. 1. INTRODUCTION TO TERADATA DATA WAREHOUSE SYSTEM ARCHITECTURE PREPARED BY: MOHAMED TAHOON
  2. 2. 1. What’s a Teradata DWH System ? 2. SMP vs MPP 3. Shared-Everything vs Shared-Nothing architecture 4. Hardware architecture 4.1 Cliques 4.2 Hot standby Nodes 5. Node architecture 5.1 PDE 5.2 Virtual Processors 5.3 Parsing Engine 5.4 Access Module Processor 5.5 Disk Arrays 6. Request Processing 2
  3. 3. 3 1 What is Teradata DWH system? • RDBMS designed to run the world’s largest databases • Latest Intel technology nodes • Standard access language (SQL) • Massive Parallel Processing ‘MPP’ system • a “Shared-Nothing” architecture • Parallel-aware optimizer allowing concurrent complex queries • Linear Scalability
  4. 4. 2. SMP VS MPP - Multiple CPU’s serving separate processes Simultaneously - Shared Everything - All CPU’s Share Same Memory - Mostly Hosted on Shared SAN Symmetric Multi Processing Massive Parallel Processing - Multiple CPUs runs in Parallel serving single process - Shared Nothing - Each CPU have It’s Own Memory and space - High Speed Nodes Connection [ByNet] 4
  5. 5. 1. What’s a Teradata DWH System ? 2. SMP vs MPP 3. Shared-Everything vs Shared-Nothing architecture 4. Hardware architecture 4.1 Cliques 4.2 Hot standby Nodes 5. Node architecture 5.1 PDE 5.2 Virtual Processors 5.3 Parsing Engine 5.4 Access Module Processor 5.5 Disk Arrays 6. Request Processing 5
  6. 6. 6 SHARED-NOTHINGSHARED-EVERYTHING - Disk controllers and bandwidth shared - Synchronization required across nodes - Large scale Scalability Issue - Best for many small statements - Controllers dedicated to nodes - No Cache Synchronization necessary - Linear Scalability - Best for Heavy statements
  7. 7. 1. What’s a Teradata DWH System ? 2. SMP vs MPP 3. Shared-Everything vs Shared-Nothing architecture 4. Hardware architecture 4.1 Cliques 4.2 Hot standby Nodes 5. Node architecture 5.1 PDE 5.2 Virtual Processors 5.3 Parsing Engine 5.4 Access Module Processor 5.5 Disk Arrays 6. Request Processing 7
  8. 8. 8 4 Teradata Hardware Architecture • SMP Nodes > Latest Intel SMP CPUs > Configured in 2+1 node cliques > Linux or Windows • BYNET Interconnect > Fully scalable bandwidth > 1 to 1024 nodes • Storage > Independent I/O per Node > Scales per node • Server Management > One console for the entire system Server Management PE SMP Node1 AMPPE AMP AMP AMP PE SMP Node2 AMPPE AMP AMP AMP PE SMP Node3 AMPPE AMP AMP AMP PE SMP Node4 AMPPE AMP AMP AMP BYNET Interconnect
  9. 9. 4.1 CLIQUES • Group nodes together by multiported access to common disk array units. • Inter-node disk array connections are made using FibreChannel (FC) buses. • FC paths enable redundancy to ensure the loss of a processor node or disk controller won’t limit data availability. • Clique is a mechanism supports migration of VPROCs under PDE following a node failure. • If a node in a clique fails, VPROCs migrate to other nodes in the clique and continue to operate while recovery occurs on their home node. 9
  10. 10. 4.2 HOT STANDBY NODES Improves availability and maintain performance levels in the event of a node failure. What’s a Hot Standby node: • Is a member of each clique in the system. • Does not normally participate in Teradata Database operations. • Used to compensate for the loss of a node in the clique. Using Hot Standby node Eliminates: • Restarts that are required to bring a failed node back into service. • Degraded service when VPROCs have migrated to other nodes in a clique. How Hot Standby node failover works : At node failure, all AMPs and LAN-attached PEs on the failed node migrate to the hot standby node The hot standby node becomes the production node. When the failed node returns to service, it becomes the new hot standby node. 10
  11. 11. 1. What’s a Teradata DWH System ? 2. SMP vs MPP 3. Shared-Everything vs Shared-Nothing architecture 4. Hardware architecture 4.1 Cliques 4.2 Hot standby Nodes 5. Node architecture 5.1 PDE 5.2 Virtual Processors 5.3 Parsing Engine 5.4 Access Module Processor 5.5 Disk Arrays 6. Request Processing 11
  12. 12. 5 Node Architecture (‘Shared Nothing’) Each Teradata Node is made up of hardware and software • Each node runs copy of OS, database SW, & virtual processes • Each node has CPUs, system disk, memory & adapters PE vproc AMP vproc Vdisk AMP vproc Vdisk AMP vproc Vdisk AMP vproc Vdisk AMP vproc Vdisk AMP vproc Vdisk AMP vproc Vdisk AMP vproc Vdisk PE vproc UNIX PDE
  13. 13. 5.1 PARALLEL DATABASE EXTENSIONS - PDE Software interface layer lies between O. S. & TD DB which enables The database to : • Run in a parallel environment • Execute Vprocs • Apply a flexible priority scheduler to Teradata Database sessions • Consistently manage memory, I/O, and messaging system interfaces across multiple OS platforms PDE provides a series of parallel operating system services, which include: • Facilities to manage parallel execution of database operations on multiple nodes. • Dynamic distribution of database tasks. • Coordination of task execution within and between nodes. 13
  14. 14. 5.2 VIRTUAL PROCESSORS – WHAT IS IT What is it: • Set of software processes that run on a node under Teradata (PDE). • Eliminate dependency on specialized physical processors VPROC characteristics: • Multiple VPROCs can run on an SMP platform or a node. • VPROCs and the tasks running under them communicate using unique-address messaging, as if they were physically isolated from one another. • This message communication is done using the BYNET hardware and BYNET Driver. • maximum # VPROCs in a system: 16,384 VPROCs, in a node 128. 14
  15. 15. 5.2 VIRTUAL PROCESSORS – VPROC TYPES GTW • Gateway VPROCs provide a socket interface to Teradata Database PE • Parsing Engines perform session control, query parsing, security validation, query optimization AMP • Access Module Processors perform DB functions; Like: executing database queries. • Database storage Distributed Across AMPs. TVS • Manages Teradata Database storage. • AMPs acquire their portions of database storage through the TVS vproc. NODE • The node vproc handles PDE and operating system functions not directly related to AMP and PE work. • Cannot be externally manipulated, and do not appear in the output of the Vproc Manager utility. RSG • Relay Services Gateway provides a socket interface for the replication agent. • Relaydictionary changes to the Teradata Meta Data Services utility. 15
  16. 16. 5.3 PARSING ENGINE ‘PE’ • Communicates with the client system on one side and with the AMPs (via the BYNET) on the other side. • Each PE executes the database software that manages sessions, decomposes SQL statements into steps, possibly in parallel, and returns the answer rows to the requesting client. Parsing Engine Elements Parser Decomposes SQL into relational data management processing steps. Optimizer Determines the most efficient path to access data. Generator Generates and packages steps. Dispatcher Receives processing steps from the parser, sends them to the appropriate AMPs via BYNET. Monitors the completion of steps and handles errors encountered during processing. Session Control Manages session activities, such as logon, password validation, and logoff. Recovers sessions following client or server failures. 16
  17. 17. 5.4 ACCESS MODULE PROCESSOR ‘AMP’ • The AMP VPROC manages Teradata Database interactions with the disk subsystem. • Each AMP manages a share of the disk storage. Database management tasks • Accounting • Journaling • Locking tables, rows, and databases • Output data conversion During query processing: • Sorting • Joining data rows • Aggregation File System Management Disk Space management. 17
  18. 18. 5.4 THE AMPS – REQUEST PROCESSING • The BYNET transmits messages to and from the AMPS and PEs. • An AMP step can be sent to one of the following: • One AMP • Multi-Cast (A selected set of AMPs) • All AMPs in the system PE communication with Amps during request processing: 1. PE 1 : Access is through a primary index and a request is for a single row - the PE transmits steps to a single AMP 2. PE 2 : the request is for many rows (an all-AMP request): - the PE makes the BYNET broadcast the steps to all AMPs ** To minimize system overhead, the PE can send a step to a subset of AMPs, when appropriate. 18
  19. 19. 5.5 DISK ARRAYS Logical Units ‘Lun’ • The RAID Manager uses drive groups DG • DG is a set of drives that have been configured into one or more LUNs. • OS recognizes a LUN as a disk and is not aware that it is writing on on multiple disk drives. Vdisk • Group of cylinders currently assigned to an AMP • OS recognizes a LUN as a disk and is not aware that it is writing on on multiple disk drives. • The actual physical storage may derive from several different storage devices 19
  20. 20. 1. What’s a Teradata DWH System ? 2. SMP vs MPP 3. Shared-Everything vs Shared-Nothing architecture 4. Hardware architecture 4.1 Cliques 4.2 Hot standby Nodes 5. Node architecture 5.1 PDE 5.2 Virtual Processors 5.3 Parsing Engine 5.4 Access Module Processor 5.5 Disk Arrays 6. Request Processing 20
  21. 21. 6 REQUEST PROCESSING – “LIFETIME OF A QUERY” 1. The Parser •Checks Request cache to determine if the request is already there 2. The Syntaxer •checks the syntax of an incoming request 3. The Resolver •Adds information from the Data Dictionary to convert database, table, view, stored procedure, and macro names to internal identifiers. 4. Security module •checks privileges in the Data Dictionary. 5. The Optimizer •Determines the most effective way to implement the SQL request. 6. The Optimizer •scans the request to determine where to place locks, then passes the optimized parse tree to the Generator. 7. The Generator •Transforms the optimized parse tree into plastic steps, caches the steps if appropriate, and passes them to gncApply 8. gncApply •Takes the plastic steps produced by the Generator, binds in parameterized data if it exists, and transforms it into concrete steps. 9 The Dispatcher 21
  22. 22. P.45 REQUEST PROCESSING – “1. THE PARSER” 1. The Parser • Checks if the request in Request cache: • IF IN = > Go to Step (2) - The Syntaxer. • IF New Request • The Parser reuses the plastic steps found in the cache and passes them to gncApply. • Go to checking privileges (step 4) • Then Go to gncApply (step 8) after. 2. The Syntaxer •checks the syntax 3. The Resolver •convert Object names to internal identifiers. 4. Security module •checks privileges 5. The Optimizer •Determines the most effective way to implement the SQL request. 6. The Optimizer •scans the request to determine where to place locks 7. The Generator •Transforms parse tree into plastic steps. 8. gncApply •binds parameterized data if it exists, transforms it into concrete steps. 9 The Dispatcher Plastic steps are directives to the database management system that do not contain data values 22
  23. 23. P.45 REQUEST PROCESSING – “2. SYNTAXER, 3. RESOLVER” 1. The Parser •Checks Request cache to determine if the request is already there 2. The Syntaxer • checks the syntax of new request: • IF Wong => passes an error message back to the requestor and stops • IF Correct => converts the request to a parse tree and passes it to the Resolver (3) 3. The Resolver • Adds information from the Data Dictionary (or cached copy of the information) to convert database, table, view, stored procedure, and macro names to internal identifiers. 4. Security module •checks privileges 5. The Optimizer •Determines the most effective way to implement the SQL request. 6. The Optimizer •scans the request to determine where to place locks 7. The Generator •Transforms parse tree into plastic steps. 8. gncApply •binds parameterized data if it exists, transforms it into concrete steps. 9 The Dispatcher 23
  24. 24. P.45 REQUEST PROCESSING – “4. SECURITY MODULE, 5. - 6. OPTIMIZER” 1. The Parser •Checks Request cache to determine if the request is already there 2. The Syntaxer •checks the syntax 3. The Resolver •convert Object names to internal identifiers. 4. The Security module • checks privileges of accessed object vs Requestor: • Mismatch => returns a privilege error message • Privileged => passes the request to the Optimizer. 5. The Optimizer • Determines the most effective way to implement the request (Excution plan) 6. Optimizer • Determine what type and where to place Objects locks 7. The Generator •Transforms parse tree into plastic steps. 8. gncApply •binds parameterized data if it exists, transforms it into concrete steps. 9 The Dispatcher 24
  25. 25. P.45 REQUEST PROCESSING – “SEPARATE ORIGINAL” 1. The Parser •Checks Request cache to determine if the request is already there 2. The Syntaxer •checks the syntax 3. The Resolver •convert Object names to internal identifiers. 4. Security module •checks privileges 5. The Optimizer •Determines the most effective way to implement the SQL request. 6. The Optimizer •scans the request to determine where to place locks 7. The Generator • Transforms the optimized parse tree into plastic steps • caches the steps if appropriate • passes them to gncApply. 8. gncApply • Binds in parameterized data if it exists, and transform plastic steps to concrete steps. • passes the concrete steps to the Dispatcher 9 The Dispatcher Concrete steps are directives to the AMPs that contain needed user-or session- specific values and needed data parcels 25
  26. 26. P.45 DISPATCHER – REQUEST PROCESSING • controls the sequence in which steps are executed: • It also passes the steps to the BYNET to be distributed to the AMPs: • 1 The Dispatcher receives concrete steps from gncApply. • 2 The Dispatcher places the first step on the BYNET; • - tells the BYNET whether the step is for one AMP, several AMPS, or all AMPs; • - waits for a completion response. • - Whenever possible, Teradata Database performs steps in parallel to enhance performance. • 3 The Dispatcher receives a completion response from all expected AMPs and places the next step on the BYNET. • It continues to do this until all the AMP steps associated with a request are done. 26
  27. 27. 27

×