Teradata - Architecture of Teradata

Teradata Company Highlights
• Founded 1979 – West LA
• First product to market – 1984
• First Terabyte system – 1987
• Acquired by AT&T and
merged with acquired NCR – 1992
• Tri-vested as part of NCR - 1997
• Teradata Corporation – (re)Launched October 1, 2007
– Global Leader in Enterprise Data Warehousing
• EDW/ADW Database Technology
• Analytic Solutions
– Positioned in Gartner’s Leaders Quadrant
in data warehousing since 1999
• Top 10 U.S. publicly-traded software company
– S&P 500 Member
– Listed NYSE: “TDC”
– 2007 - $1.7B revenue

Continuous (R)evolution
Hardware
+ Database
+ Consulting
+ Data models and
reports
+ Analytic applications

Sell the HW, give everything else
away
Sell the SW with some HW to
run on
Sell solving business problems – and technology to
solve them
Sell applications with consulting, SW
and HW inside

90% R&D
10% integration
80286
70% R&D
30% integration
i486
20% R&D
80% integration
Pentium
10% R&D
90% integration
Xeon Quad Core

Scale
• Every dimension of the technology must scale to meet today’s requirements
– Data, Data model complexity, Users, Performance, queries, Data loading, …
• What is a big Data Warehouse?
• Total spinning disk?
– 2.5 Petabytes
• Big table?
– 150 billion rows
• Number of tables?
– 300,000
• Insert/Update per day?
– 5 billion records
• Identified users?
– 100,000
• Queries per day?
– 5 million
• Data Turnover rate?
– 1TB per 5 seconds

The Problem
10 > 09/2009
Accts. Payable
Accts. Receivable
Invoicing
Sales/Orders
Finance G/L
Customer Support
HR
Payroll
Purchasing
Order Fulfillment
Manufacturing
Inventory …
Marketing
Supply Chain
Finance
Risk Management
Maintenance
Sales
Operations
Inventory
Call Center …
Operational Systems Decision Makers

The EDW Solution
Accts. Payable
Accts. Receivable
Invoicing
Sales/Orders
Finance G/L
Customer Support
HR
Payroll
Purchasing
Order Fulfillment
Manufacturing
Inventory …
EnterpriseEnterprise
DataData
WarehouseWarehouse
(EDW)(EDW)
Marketing
Supply Chain
Finance
Risk Management
Maintenance
Sales
Operations
Inventory
Call Center …
Operational Systems Decision Makers

Active Enterprise Intelligence™
An Obvious Trend: More Speed, More Users
Strategic Intelligence Operational Intelligence
Enterprise Data Warehouse
BI Tools & reports
Analysis & visualization
Predictive Analytics
EDW Enterprise Integration
Mixed workload management
SOA, BPMS, IDEs
Portals/composite applications
Days
Seconds

Active Enterprise Intelligence™ enabled by an
Active Data Warehouse™
STRATEGIC INTELLIGENCEOPERATIONAL INTELLIGENCE
Business Intelligence
Tools and Applications
Teradata Warehouse
Workflow & Applications
Active EventsActive Access
Suppliers Customers Call
Center
Logistics MarketingFinanceProduct/
Services
Executive
Active Enterprise Integration
Active
Availability
Active
Workload
Management
Active
Load

Active Enterprise Intelligence™ in Retail
Detecting Retail Fraud
Situation
Thieves make copies of cash register receipts, walk into
the store, pick up merchandise, and return items for
cash.
Problem
Associates in returns department did not have historical
POS receipt retrieval access to verify against previously
“returned” receipts or to do returns without receipts.
Solution
Associates query Teradata to quickly check if a return
has already occurred on that receipt number. Also used
by analysts to understand and prevent excessive
returns.
Impact
(for 500-store chain)
• 100% ROI in 5 months
• Stopped a crime ring on the
first day of rollout
• “Cost savings have been
huge”

Active Enterprise Intelligence™ in Retail
Single View of the Customer Across All Channels
Situation
Needed to add Web channel for selling shoes.
Problem
Too much time and cost to keep multiple customer
systems synchronized. Realized they needed just
one customer database, not one more for the Web,
in addition to Call Center, and POS/Store databases.
Solution
Adopted an ADW strategy, moved all customer data
to one Teradata system, revised data models to
cover all channels, added web channel for
commerce, used web services, added TASM to
handle multiple workload types
Impact
• 1M tactical hits to the
EDW per day from the
POS, Call Center, and
Web with 0.11 sec
response time
• Runs simultaneously
with back-office BI,
reports, and ETL
workloads
• Eliminated all other
customer data systems

What is the Measure of a Great
Architecture?
Handle huge changes of underlying technologies and
dependent components while continuing to deliver the
key value proposition.

Processor RoadmapCPU power radically increasing
2003 2005 2009 2011
90nm
process
45nm
process
65nm
process
32nm
process
22nm
process
Hyper-Threading Dual Core Multi Core
20002000 2008+2008+
SPECInt2000SPECInt2000
5X5X
SINGLE-CORESINGLE-CORE
PERFORMANCEPERFORMANCE
DUAL/MULTI-CORE
PERFORMANCE
2007
20042004

What Does Shared Nothing Mean?
• 1985 – Every hardware part, every line of software – “pure” shared
nothing
• 1995 – Multiple units of parallelism sharing CPU, memory
• 2004 – Multiple units of parallelism sharing multiple cores, memory
• 2009 – Multiple units of parallelism sharing same physical spindles
– but still not sharing data
• Future – Multiple units of parallelism in Virtual machines/cloud
not even knowing what physical machine it is on or sharing
19 > 09/2009
Copyright Teradata © 2007-2009
– All rights Reserved

Teradata MPP Server Architecture
• Nodes
– Incrementally scalable to 1024
nodes
• Operating System
– Linux, Windows, Unix
• Storage
– Independent I/O
– Scales per node
• BYNET Interconnect
– Fully scalable bandwidth
• Connectivity
– Fully scalable
– Channel – ESCON/FICON
– LAN, WAN
• Server Management
– One console to view
the entire system
SMP Node1 SMP Node2 SMP Node3 SMP Node4
Server
Management
Dual BYNET Interconnects
CPU1 CPU2
Memory
Operating Sys
CPU1 CPU2
Memory
Operating Sys
CPU1 CPU2
Memory
Operating Sys
CPU1 CPU2
Memory
Operating Sys

Shared Nothing - Dividing the Work
• “Virtual processors” (vprocs) do the work
• Two types
– AMP: owns and operates on the data
– PE: handles SQL and external interaction
• Configure multiple vprocs per hardware node
– Take full advantage of SMP CPU and memory
• Each vproc has many threads of execution
– Many operations executing concurrently
– Each thread can do work for any user, transaction
• Software is equivalent regardless of configuration
– No user changes as system grows from small SMP to huge MPP

Shared Nothing - Dividing the Work
• Basis of Teradata scalability
– Each AMP owns an equal slice of the disk
– Only that AMP reads that slice
• No single point of control for any operation
– I/O, Buffers, Locking, Logging, Dictionary
– Nothing centralized
– Exponential communication costs avoided
AMPsLogs
Locks
Buffers
I/O
# Nodes
Coordination
cost
Teradata

Teradata Data Distribution
• Rows automatically distributed evenly by hash partitioning
– Even distribution results in scalable performance
– Done in real-time as data are loaded, appended, or changed.
– Hash map defined and maintained by the system
• 2**32 hash codes, 64K buckets distributed to AMPs
– Prime Index (PI) column(s) are hashed
– Hash is always the same - for the same values
– No reorgs, repartitioning, space management
Table A Table B Table C
AMP1 AMP2 AMP3 AMP4 ……………………………………………………… AMPn
Primary Index
Teradata Parallel Hash Function
P
DM
P
DM
P
DM
P
DM
P
DM
P
DM
P
DM
P
DM
P
DM
RowHash (Hash Bucket) Data Fields

Disk Capacity Exploding
with Little Increase in Performance
36 GB
5.5
73 GB
6.0
146 GB
6.4
.044
.080
.155
PerformanceperCapacity
MB/Sec/GB
DiskDriveBandwidth(MB/Sec)
1
2
3
4
5
6
7
8
Disk Drive Capacity

Platform Change
• Focus used to be
– Optimization of expensive CPU cycles
– Micro-management of precious disk space
• Now
– Manage I/O
– Balance CPU power to the I/O capacity
– Find new ways to optimize I/O, trading for CPU use as necessary
– Pulling 2.5GB/sec per node continuous
• Discontinuity coming
– SSDs become price competitive and reliable

File System
• Teradata wrote a new rule book
– Old one written by IBM 35 years ago, used by all mainstream DBMSs today - except Teradata
• File system built of raw slices
• Rows stored in blocks
– Variable length
– Grow and shrink on demand
– Rows located dynamically
• May be moved to reclaim space, defrag
– Maximum block size is configurable
• System default or per table
• 8K to 128K
• Change dynamically
• Indexes are just rows in tables
• Has evolved from direct management of single spindles to completely virtualized storage, not even
knowing spindle location

Workload Management Evolution
• 1984 – pure timeshare
• 1987 – 4 priorities, defined by user
• 1995 – multiple priorities in multiple partitions
• 2000 – weighted workload groups
• 2004 – queuing, reserved resources, focus on tactical work
• 2009 – Visualization and detailed workgroup management
• Future – Set service level goals, our job to deliver

Active Workload Management
• Manage workloads
– Reduce server congestion
• Dynamically adjust
in-flight task priority
– Turn the dial – change priorities
• Fast active access queries
– Performance, performance,
performance
• Get maximum throughput
Speed
10
Active
Events
Active
Access
Query and
ReportingActive Load
Active Data
Warehouse
Speed
60
Speed
75
Speed
25

TASM Reporting/Monitoring - 13.10

Availability Requirements
IT, Finance,
Planners, Power
Users,
Data Miners
Executives,
Middles
Managers,
Marketing
1000000
100000
10000
1000
100
10
Consumers
Suppliers
B2B
Operational
Employees
Category Mgr,
Line Managers,
Service Managers
Users
Mission Critical
Dual
Active
Strategic Intelligence Operational Intelligence

“Always ON” – An Elusive Challenge
• Unplanned downtime
– Hardware faults
– Software faults
– Hangs
• Planned downtime
– Software upgrade
– Hardware upgrade
– Data center maintenance
• “Disasters”
– Multi-component failures
– Building disasters
– Area disasters
• And optimize resource value to the business
• And avoid hidden costs and surprises
– Eg Major performance variations
• Major opportunity for research – but must be holistic
– Reaches far beyond core database

Real time Operational Actions
Strategic
Intelligence
Operational
Intelligence
1. Customer makes
multi-segment
travel reservation
2. Flight rerouted
causing missed
connections.
“Active”
Enterprise Data
Warehouse
3. What are the customers’
flying history?
4. How profitable is each
customer?
5. Which customers
experienced delays or
other problems in last 6
months?
WebSphere MQ,
Oracle AQ,
Microsoft MSMQ
6. Customer re-booked
and notified.
7. Airport operations
adjusted

Real Time Customer Management
Strategic
Intelligence
Operational
Intelligence
4. Is this customer
approaching the
predicted loss rate for
their segment?
5. What offers are
available for this
customer?6. Message sent to floor
Luck Ambassador with
customer offer to
prevent additional
losses.
TIBCO
2. What is the customer’s past
spending history in all our
casinos?
3. What is a significant loss
for this person based on
market segment, past and
predicted behavior?“Active”
Enterprise Data
Warehouse
1. Customer inserts
Total Rewards
Card at Slot
Machine

That’s a Wrap!
• Business requires a new level of decision making
– Many more decisions by many more people much faster
– Current representation of the state of the enterprise
• Data Warehouse must evolve to support the requirements of Active
Enterprise Intelligence
• Technology must evolve to deal with the new requirements
– Rich area for research and innovation
– Change view of what data warehouse/BI means
• Teradata driving an aggressive roadmap to meet real business
requirements

For More Information click below link:
Follow Us on:
http://vibranttechnologies.co.in/teradata-classes-in-mumbai.html
Thank You !!!

Teradata - Architecture of Teradata

More Related Content

What's hot

Viewers also liked

Similar to Teradata - Architecture of Teradata

More from Vibrant Technologies & Computers

Recently uploaded

Teradata - Architecture of Teradata

Editor's Notes