teradata tools

Published on January 2017 | Categories: Documents | Downloads: 67 | Comments: 0 | Views: 418

of 65

Content

Teradata
Architecture, Technology, Scalabilty, Performance and Vision for Active Enterprise Data Warehousing
Dr. Barbara Schulmeister Teradata – a Division of NCR
[email protected]

28. 6. 2005

Agenda
• • • • • • • • • • • • History Definitions Hardware Architecture Fault Tolerance and High Availability Coexistence Operational System Tools and Utilities Data Distribution SQL Parser Active Data Warehouse Scalability

Teradata Timeline Overview

Born to be parallel!
DBC Model 1: First MPP System! “Product of the Year” – Forbes DBC Model 3 “Fastest Growing Small Company” – INC Magazine “Fastest Growing Electronic Company” – Electronic Business “Leader in Commercial Parallel Processing” – Gartner Group

1979...

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994...

Teradata Corp. Founded

First 100GB System!

First 500GB System!

First 700GB System!

First Terabyte System!

DBC Model 4

3+ TB System!

First Beta system shipped Christmas to Wells Fargo Bank

Initial public offering on Wall Street

Joint Venture with NCR for next generation systems

more

Teradata Timeline (II)
“#1 in MPP” – IDC Survey in Computerworld DB Expo Realware Award w/ Union Pacific: “Data Warehouse Innovations” Over 500 Production Data Warehouses Worldwide! DWI VLDB Best Practice Award w/ ATT BMD: “Data Warehouse and the Web”

Only Vendor to Publish Multi-user TPC-Ds!

First Vendor to Publish 1TB TPC-D Benchmark!

Teradata V2 on WorldMark 4300

...1995

1996

1997

...

Teradata Version 2 on NCR 3555 SMP

Teradata V2 on WorldMark 5100 SMP & MPP

“...only NCR’s Teradata V2 RDBMS has proven it can scale…” – Gartner Group

Demonstrated World’s Largest Data Warehouse Database at 11TB!

100GB TPC-D Benchmark Leader!

24TB Data Warehouse in Production!

more

Teradata Timeline (III)
Teradata V2 ported to Microsoft Windows NT Industry leading TPC-D benchmark for all volumes Industry leading TPCH at 1TB and 3TB Teradata attains 99.98% availability 64 bit Teradata

Largest Data Warehouse system (176 node, 130 TB disk)

...1998

1999

2000

2001

2002

2003

2004

2005

Database Programming and Design Award

IT Award of Excellence

V2R5 Teradata

V2R6 Teradata

Linux

• TDWI Solution Provider Best Practices in Data Warehousing • TDWI Leadership in Data Warehousing Award • DM Review World-Class Solution Award for business Intelligence • IT Times Award • DM Review 100 Award • DM Review Readership Award • Intelligent Enterprise Real Ware Award

the commitment continues…

Alternative Approaches to Enterprise Analytics
Data Mart Centric Sources Marts Users Users Sources Middleware Virtual, Distributed, Federated Sources DW Marts Users Independent Data Marts P r o s C o n s • Easy to Build Organizationally • Limit Scope • Easy to Build Technically • Business Enterprise view unavailable • Redundant data costs • High ETL costs • High App costs • High DBA and operational costs Leave Data Where it Lies • No need for ETL • No need for separate platform Hub-and-Spoke Data Warehouse • Allows easier customization of user interfaces & reports Centralized Integrated Data With Direct Access • Single Enterprise “Business” View • Data reusability • Consistency • Low Cost of Ownership • Requires corporate leadership and vision Hub-andSpoke Data Warehouse Sources DW Users Enterprise Data Warehouse

• Only viable for low volume access • Meta data issues • Network bandwidth and join complexity issues • Workload typically placed on workstation

• Business Enterprise view challenging • Redundant data costs • High DBA and operational costs • Data latency

A Spectrum of Data Warehouse Architectures
Virtual, Distributed, Federated
Sources Middleware Users

Data Mart Centric
Sources Marts Users

Hub-andSpoke Data Warehouse
Sources DW Marts Users

Enterprise Data Warehouse
Sources DW Users

The goal: Any question, on any data, at any time.

Teradata’s Advocated Data Warehouse Approach for 20 years, Since 1984!

Diffentiating OLTP - DSS

Most time consuming steps:
OLTP
l l l l

DSS

Full scan of big tables Complexe joins Aggregation Sorting
Frequency of steps OLTP or DSS

NCR Server
• Provide customers with growth opportunities and investment protection
> Coexistence is enabled across five generations
– NCR 5400E & 5400H Servers – NCR 4980 & 5380 Servers – NCR 4950 & 5350 Servers – NCR 4900 & 5300 Servers – NCR 485X & 525X Servers BYNET V2 / V3

485X & 525X

4900 & 5300

4950 & 5350

4980 & 5380

5400E & 5400H

NCR Server Generations

NCR 5400 Server SMP
• 5400E
Ethernet Switches

> 1 - 4 nodes > BYNET V2 > ESCON & FICON for 3 and 4 node configurations > Field Upgradeable to 5400H
Up to 4 nodes within each cabinet Server Management

1 3

nd 2 2nd

Node Node st Node 1 1st Node 4th Node 3rd Node 3GSM

Internal BYNET switches

1 3

1 3

1 3

1 3

Three UPS Modules

NCR 5400 Server MPP
• Continued rapid adoption of latest Intel® Technology
> Dual Intel Pentium Xeon EM 64T 3.6 GHz processors with Hyper-Threading (32-bit and 64-bit capability) > 800 MHz front side bus
Ethernet Switches BYNET V3 Switches
1 3

1 3

1 3

1 3

• Industry Standard Form Factor
> Up to 10 nodes per cabinet > Integrated BYNET V3 (provides the capability to physical separate systems between 300-600 meters) > Integrated Server Management > N+1 UPS > Dual AC

Up to 10 nodes within each cabinet

1 3

1 3

1 3

1 3

1 3

1

Server Management Five UPS Modules

3

1 3

• Multi-Generation Coexistence
> Investment protection

Relative CPU Performance per Core
Industry CPU Performance per Core
3000
Xeon 2M L2 >3.6 Ghz 90nm Dual Core 65 nm Next Gen Arch. Dual Core 65 nm Multi Core 45 nm

2500
54000

Xeon 2M L2 3.6 Ghz 90nm

Montecito 90nm Power 6 ~3Ghz 65nm

Tukwilla Common Platform 65nm

2000

Itanium 2 1.6 Ghz 130nm
Xeon 3.0Ghz 1M 130nm

Xeon 3.6 Ghz 90nm

1500

1000

Power 5 ~1.9Ghz 130 nm

Itanium 2 9M 130nm Power 5+ ~2.5Ghz 90 nm

Rock 90nm

Xeon Itanium Power Sparc

500

Power 4+ 1.45Ghz 130 nm

Ultrasparc 3 130 nm 1.6Ghz

0 2004 2005 Year 2006 2007

Symmetric Multi Threading (Hyper Threading) Dual Core Multicore, Multithreaded

Relative CPU Performance based on multi-threading and multi-core roadmap capabilities

www.spec.org: benchmarks SPECint2000 and SPECint_rate2000

Gartner Product Ranking 2004 ASEM

FUJITSU Primepower

HP HP9000

HP Integrity

HP Proliant

IBM pSeries

NCR Teradata

SUN Sunfire 40

PRODUCT

43

45

46

29

45

54

The Product category (which was called Technology in previous ASEM updates) focuses on the performance and reliability/availability aspects of each platform. In this category Teradata received a very strong 93.5% of total possible points and leads the IBM pSeries with 74.35% by 44 points or 19%.
Source Gartner 2004 ASEM Report

NCR Enterprise Storage 6842
• NCR Enterprise Storage 6842 Features
> Two array modules per cabinet > 56, 73GB, 15K drives
– greater than 8 Terabytes of spinning disk per cabinet

> Dual Quad Fibre Channel Controllers per array for performance and availability > Typical configuration is 4 NCR 5400 Server nodes per 3 – 6842 arrays
– 1.2 Terabytes of database space per node (RAID 1)

> Supports RAID 1 and RAID 5 > Support for MP-RAS and Microsoft Windows Server 2003 environments

EMC Symmetrix DMX
• Enterprise Fit • Storage Standardization • Extended storage life through Redeployment

EMC Model Disks Teradata Use RAID Options Operating Environment Maximum Teradata disks

DMX 1000 M2 73GB – 15K RPM MPP: supports 1 or 2 nodes per cabinet RAID -1 Only MP-RAS and Windows 96

DMX 2000 M2 73GB – 15K RPM MPP: supports 2, 3, or 4 nodes per cabinet RAID-1 Only MP-RAS and Windows 192

Assumption: Compute and Storage Balance
• A balanced configuration is one where the storage I/O subsystem for each compute node is configured with enough disk spindles, disk controllers, and connectivity so that the disk subsystem can satisfy the CPU demand from that node. • A supersaturated configuration also can satisfy the CPU demand from that node although the extra I/O may be underutilized.
> This is useful for investment protection on certain upgrade paths.

• All system configurations discussed in this presentation are based on balanced or supersaturated compute nodes.

Node CPU and Storage I/O Balance
Node/Storage Balance and Response Time

Optimum

e im eT ns po es yR er Qu

I/O Bandwidth – MB/sec

95% Node Utilization

Effective Node Utilization

th id dw n Ba

Best Query Response Time

# of Disk Drives/Storage Capacity
Industry wide, disk drive capacity is increasing at a faster rate than disk drive performance

Query Response Time

Common Upgrades Applied
GROW RAW DATA VOLUME

Performance more than adequate: Add more data to all nodes Query Response Time

SYSTEM with CURRENT Nodes

Query Response Time Increases because you didn’t add more compute power to support the additional raw data volume.

Raw Data Volume

Typical System Expansion
LINEAR GROWTH

Maintain Query Performance with more nodes Query Response Time
SYSTEM with Current Nodes SYSTEM with More or Faster Nodes Scale out with Teradata by adding compute nodes, interconnect, storage arrays, and disks. aka “horizontal scalability”

Query Response Time Remains Constant because you add proportionally more raw data volume as compute power.

Raw Data Volume

Common Upgrades Applied
GROW QUERY PERFORMANCE

Raw Data Volume adequate: Upgrade to faster CPUs Query Response Time
SYSTEM with Current Nodes SYSTEM with More or Faster Nodes

Query Response Time Decreases because you didn’t add more raw data volume to offset the increase in compute power. “Scale vertically” with Teradata by increasing compute power.

Raw Data Volume

Combo: Upgrade Nodes and Increase Storage Per Node
Adjust Query Performance and Data Volume to match service level agreement Query Response Time
SYSTEM with Current Nodes

SYSTEM with More or Faster Nodes

Scale to Target query performance and data volume by increasing compute power and adding storage.

Raw Data Volume

Scaling by Reconfiguration and Expansion
Improve Query Performance and Adjust Data Volume to match service level agreement Query Response Time
SYSTEM with Current Nodes SYSTEM with More Nodes

Improve query performance and adjust data volume by reducing storage per node and adding more nodes.

Raw Data Volume

Architecture Determines Scalability
CPU(s) Cache Memory Disk Storage

BYNET Fabrics

CPU uses independent direct I/O path to Disk All memory accesses are local

CPU(s) CPU(s) CPU(s) CPU(s) CPU(s) CPU(s) CPU(s)

Cache Cache Cache Cache Cache Cache Cache

Memory Memory Memory Memory Memory Memory Memory

Disk Storage Disk Storage Disk Storage Disk Storage Disk Storage Disk Storage Disk Storage

Interconnect used only for database messages, no I/O or memory traffic

Teradata Shared-Nothing MPP
• Designed for Slope of 1 Linear Scaling • Optimized for very high data rates to/from disk • Excellent performance and efficiency for data warehousing

Teradata MPP Architecture
• Nodes
> Incrementally scalable to 1024 nodes > Windows or Unix > Independent I/O > Scales per node
Dual BYNET Interconnects

• Storage

SMP Node1
CPU1 CPU2

SMP Node2
CPU1 CPU2

SMP Node3
CPU1 CPU2

SMP Node4
CPU1 CPU2

• BYNET Interconnect • Connectivity

Memory

Memory

Memory

Memory

> Fully scalable bandwidth

• Server Management
> One console to view the entire system

> Fully scalable > Channel – ESCON/FICON > LAN, WAN

Server Management

Node Software Architecture
Perfectly Tuned Nodes Working in Parallel for Scalability and Availability
Teradata Node SW Architecture (SMP)
Parsing Engine Virtual Processors (VPROCS)

4-Node MPP Clique
BYNET

PE1 PE2

AMP1 AMP2 AMP3

AMP5
Access Module Processor VPROCS

VPROCS

AMP6 AMP7 AMP8

VPROCs AMP & PE

VPROCs AMP & PE

VPROCs AMP & PE

VPROCs AMP & PE

LAN Gateway

Communication Interfaces

AMP4

Disk Array

Channel Gateway

Parallel Database Extensions (PDE)

UNIX, Windows 2000

PEs recieve the queries and figure out the query plan AMPs interact with the disk arrays and process the data

PE VProc
Parser Optimizer Session Control

VAMP
Relational Database Management File System / Data Management

Dispatcher

The Scalable BYNET Interconnect
Specifically designed for data warehousing workloads
Multiple Simultaneous Point-to-Point Messaging
Node Node Node Node Node Node Node Node

Broadcast Messaging

Node Node

The Teradata Optimizer chooses between Point-to-Point and Broadcast Messaging to select the most effective communication.

• Bandwidth scales linearly to 1,024 nodes • Redundant, fault tolerant network • Guaranteed message delivery

Built-In Integrated Fail Over
• Teradata provides built-in node failover.
> Cost effective > Easy to deploy

• Work migrates to the remaining nodes in the cliques. • System performance degradation up to 33%.
Traditional Configuration

X

Large Cliques
• Double the number of nodes in a clique up to 8. • Work distributed across a greater number of nodes. • Minimize system performance impacts – may not be noticeable to end-users.

X

Node

Node

Node

Node

Node

Node

Node

Node

86% System Performance Continuity
Fibre Channel Switches

Disk Array

Disk Array

Disk Array

Disk Array

Disk Array

Disk Array

Hot Standby Nodes
• Work re-directed to a Hot Standby Node. • No system performance impacts. • Teradata restart can be postponed to a maintenance window.

X
Node Disk Array

Node

Node

Hot Standby

100% System Performance Continuity

Disk Array

Disk Array

Large Clique + Hot Standby Node
• Same performance benefits of Hot Standby node. • Reduced costs for larger system implementations.

X
Node

Node

Node

Node

Node

Node

Node

Hot Standby

Fibre Channel Switches

100% System Performance Continuity

Disk Array Disk Array Disk Array Disk Array Disk Array Disk Array

High Availability

Case
Power Failure Node Failure Bynet failure Disc failure More than one Disc Failure Clique Failure

Hardware
UPS (redundant), Dual AC

Teradata

VPROC Migration (VAMP, PE) Redundant BYNET RAID-1/-5/-S in Disc Subsystem Fallback-Option Fallback-Option

BYNET

Server Nodes

DiskArray Subsystem Clique

Coexistence Considerations

Generation x VAMPS

Generation x VAMPS
AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP

Generation x VAMPS
AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP

AMP AMP

AMP AMP

AMP AMP

AMP AMP

AMP AMP

Performance Factor

1x

1.5x

2.0x

• VAMPs manage the same amount of data • Coexistence enables the faster nodes to be realized by running more VAMPs per node

System Expansion with Teradata Coexistence
• The utilization of multiple generations of hardware within a single Teradata MPP system
Dual BYNET Interconnects

SMP Node1
AMP AMP AMP

SMP Node2
AMP AMP AMP

SMP Node3
AMP AMP AMP

SMP Node4
AMP AMP AMP

SMP Node1
AMP AMP AMP

SMP Node2
AMP AMP AMP

SMP Node3
AMP AMP AMP

SMP Node4
AMP AMP AMP

5380 - 9 AMPs per node

5400 - 12 AMPs per node

Server Management

Customer Example (72 Nodes, 4 Generations)

11/2000 1/2001 Original Footprint Expansion 1 8 Node 5250 12 Node 5250

11/2001 Expansion 2 8 Node 5255

6/2002 Expansion 3 12 Node 5300

6/2003 Expansion 4 16 Node 5350

6/2004 Expansion 5 16 Node 5380

2005 Future 5400 Expansion

Generation “A”

Generation “B”

Generation “C”

Generation “D”

Generation “E”

Database and Operating System
V2R5.0.3, V2R5.1.X, V2R6 on MP-RAS 3.03 V2R5.0.3, V2R5.1.X and V2R6 on MP-RAS 3.02

485x/525x

4900/5300

4950/5350

4980/5380

5400

V2R6 on WS03 (2Q 2005) V2R5.0.3, V2R5.1.X, V2R6 on W2K (2Q 2005) • Database > Teradata V2R6 > Support one Release Back V2R5.1.X (Current Exception in Place V2R5.0.3) • Unix > MP-RAS 3.03 required for Teradata Database on 5400 > MP-RAS 3.02 still supported on previous server generations • Microsoft Windows > Microsoft Windows Server 2003 recommended for new and expanding 5400 motions > Microsoft Windows 2000 supported in 2Q 2005

V2R5.0 Features + V2R5.1 Features
Strategic Decision Making
• Analytic extensions such as Extended Windows Functions & Multiple Aggregate Distincts • Random Stratified Sampling • Join Elimination • Extended Transitive Closure • Partial Group By • Early Group By • Derived Table Rewrite • Very Large SQL • Extended Grouping • Inner Join Optimization • Eliminate Unnecessary Outer Joins • Hash Joins • UDFs for Complex Analysis and Unstructured Data

Tactical & Event-Driven Decision Making
• • • • • •

Partial Covering Join Index Global Index Sparse Index Join Index Extensions ODS Workload Optimization Stored Procedures Enhancements • Enhancements to Triggers • Extra FK-PK Joins in Join Index • UDFs for XML Processing etc.

Trusted, Integrated Environment
• Index Wizard • Statistics Wizard & Collect Statistics Improvements • Query Log • Extreme Workload Management & Administration • Roles and Profiles • SQL Assistant/ Web Edition • Availability • Performance Dashboard & Reporting

Single Version of the Truth

• • • • • • •

Security enhancements (Encryption) DBQL enhancements Database Object Level Use Count ROLES enhancements Priority Scheduler enhancements TDQM enhancements No Auto Restart After Disk-Array Power Failure • Cancel Rollback • Incompatible Package Warning • Disk I/O Integrity Check

Data Freshness

• Cylinder Read • Partitioned Tables (PPI) • Value List Compression • 2000 Columns, 64 Columns per Index • Identity Column • Enhancement to Identity Column • UTF16 Support • PPI Dynamic Partition Elimination • Large Objects (LOBs)

• • Continuous Continuous Update Update Performance Performance & & Manageability Manageability • • Faster Faster Join Join Index Index Update Update • • Join Join Update Update Performance Performance • • Bulk Bulk Update Update Performance Performance • • Teradata Teradata Warehouse Warehouse Builder Builder Full Full Functionality Functionality & & Platform Platform Support Support •UDFs for Data Transformation and Scoring

V2R6.0 Feature List
Strategic Decision Making
• Remove 1MB limit on plan cache size • Increase response buffer to 1MB • Table header expansion • Improve Random AMP Sampling • Top N Row Operation • Recursive Queries

• Improve Primary Index Operations\ • Improved IN-list processing • External Stored Procedures • Trigger calling a Stored Procedure • Stored Procedure Internals Enhancements • Queue tables

Tactical & Event-Driven Decision Making

Trusted, Integrated Environment
• Teradata Dynamic Workload Management • Extensible User Authentication • Directory Integration • Global deadlock logging • Faster Rollbacks

Single View of Your Business

Data Freshness

• Stored Procedure LOB support • External Table Function • Partition level BAR • Eliminate indexed row IDs (PPI) • PPI Join performance improvement • DBS Information consolidation

• Replication Services • Array support • Priority Scheduler enhancements • Reduce restart time

Teradata Tools & Utilities (1)
Load/Unload
Teradata Warehouse Builder FastLoad, MultiLoad & FastExport Teradata TPump Access Modules Teradata Database

Database Management
Teradata Manager Teradata Dynamic Query Manager Teradata System Emulation Tool Teradata Visual Explain Teradata Index Wizard Teradata Statistics Wizard

Teradata Utility Pak
Teradata Administrator Teradata SQL Assistant Teradata SQL Assistant/Web Edition BTEQ ODBC JDBC CLI OLE DB Provider
.com

Metadata
Teradata Meta Data Services

Mainframe Connectivity
Mainframe Channel Connect TS/API, CICS, HUTCNS & IMS/DC
Any Query, Any Time

Technical Differentiator: Database Utilities
Teradata Utilities Are Robust and Mature

• Teradata utilities are fully parallel. • Teradata utilities have checkpoint restart capability. • Data loads directly from the source into the database.
> > > > No No No No manual data partitioning. file splitting. intermediary file transfers. separate data conversion step.
Parallel In Parallel Out

Teradata Warehouse

Teradata Tools & Utilities (2)
• • • • • • Teradata Teradata Teradata Teradata Teradata ..... Data Profiler CRM Warehouse Miner Demand Chain Management Supply Chain Management (LDM = Logical Data Model) • • • • • • • • Financial Solution LDM Retail LDM Communication LDM Insurance/Healthcare LDM Manufacturing LDM Government LDM Media and Entertainment LDM Travel/Transportation LDM

Two Basic Software Architecture Models Task Centric and Data Centric

Request

Request

Request

Request

Task

Shared Memory

Task

Parallel Optimizer

Parallel Unit

Parallel Unit Data

DATA

DATA
Data Data

Data

Uniform and shared access to all platform resources (disk, etc) is REQUIRED

Exclusive access to a subset of resources

Data Centric Software: Teradata Virtual AMP
Tables
28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

AMP - Balanced collection of three abstracted platform resources
P

Processor
D

Memory

M

Disk

AMP 1
P

AMP 2
P

AMP 3
P

AMP 4
P

M

D

M

D

M

D

M

D

25 21 17 13 9 5 1

26 22 18 14 10 6 2

27 23 19 15 11 7 3

28 24 20 16 12 8 4

Table A

Table B

Table C

• Each virtual AMP has rows from every table • Each virtual AMP works independently on its rows • Goal: Database rows are equally distributed across multiple tables

Data distribution by Primary Index
Primary Index value for a row

Hashing Algorithm Destination Selection Word (DSW) – first 16 bits

Row Hash (32 bits) Current configuration Primary Current configuration Fallback Reconfiguration Primary Reconfiguration Fallback

Hash Map

Dual BYNET Interconnects

Node1

Node2

Node3

Node4

Node1

Node2

Node3

Node4

Node1

Node2

Node3

Node4

Node1

Node2

Node3

Node4

Teradata Hashing
Table ORDER
O rder Num ber C u s to m e r Num ber O rder D a te O rder S tatus

PK UPI 7325 7324 7415 7103 7225 7384 7402 7188 7202 2 3 1 1 2 1 3 1 2 4/13 4/13 4/13 4/10 4/15 4/12 4/16 4/13 4/09 O O C O C C C C C

(Hexadecimal)

HASH MAP 6 07 07 07 07 07 07 7 08 08 08 08 08 08 8 01 01 01 01 01 01 9 02 02 02 02 02 02 A 03 03 03 03 03 03 B 04 04 04 04 04 04 C 05 05 05 05 05 05 D 0 0 0 0 0 0

SELECT * FROM ORDER WHERE order_number = 7225;

000 001 002 003 004 005

0 01 01 01 01 01 01

1 02 02 02 02 02 02

2 03 03 03 03 03 03

3 04 04 04 04 04 04

4 05 05 05 05 05 05

5 06 06 06 06 06 06

7225 Hashing Algorithm

AMP 1

AMP 2

AMP 3

AMP x

32 bit Row Hash
DSW # 0000 0000 0001 1010 V V V V 0 0 1 A Remaining16 bits 1100 0111 0101 1011 Bucket Number
7225 2 4/13 O

Primary Index Choice Criteria
ACCESS
Maximize one_AMP operations: choose the column most frequently used for access

DISTRIBUTION
Optimize parallel processing: choose a column that provides good distribution

Volatility
Reduce maintenance resource overhead (I/O): choose a column with stable data values

Data distribution by Primary Index - 2

SQL request Parser algorithm

48 bit table ID

32 bit row hash value

Index value

Dual BYNET Interconnects

Node1

Node2

Node3

Node4

Node1

Node2

Node3

Node4

Node1

Node2

Node3

Node4

Node1

Node2

Node3

Node4

Logical block identifier

SQL Parser Overview
Request Parcel

Cached?

Syntaxer
DBase, AccRights TVM, TVFields Indexes

DD

Resolver Security
Serial steps Parallel steps Individual and common steps (MSR) Additional: Triggers, check constraints, references, foreign keys, join indexes collected statistics or dynamic sampling

Statistics

Optimizer
Costs

Generator Data Parcel GNCApply AMP Steps

Statistics Summary
Collect statistics on • all non-unique indexes • UPI of any table with less than x rows per AMP (dependent on available number of AMPs) • All indexes of a join index • any non-indexed column used for join constraints • indexes of global temporary tables Collected statistics are not automatically updated by the system Refresh statistics when 5-10% of the table rows have changed

Database Workload Continuum
Transactional (OLTP)
• User Profiles • Customers • Clerks • Services: • Transactions • Bookkeeping • Access Profile: • Frequent updates • Occasional lookup • Data: • Current “state” data • Limited history • Narrow Scope

Tactical (ODS)
• User Profiles • Front Line Services • Customers - Indirectly • Services: • Lookups • Tactical decisions • Analytics (e.g. scoring) • Access Profile: • Continuous updates • Frequent lookups • Data Model: • Current “state” data • Recent history • Integrated business areas

Strategic (EDW)
• User Profiles • Back Office Services • Management • Trading Partners • Services: • Strategic decisions • Analytics (e.g. scoring) • Access Profile: • Bulk Inserts – Some Updates • Frequent complex analytics • Data Model: • Periodic “state” data • Deep history • Enterprise integrated view

Workload Continuum

OLTP1

•••

OLTPi

•••

OLTPn

ODS1

• • •

ODS2

Enterprise Data Warehouse
Strategic Decision Repositories

Transactional Repositories

Tactical Decision Repositories

Database Workload Continuum
Transactional (OLTP)
• User Profiles • Customers • Clerks • Services: • Transactions • Bookkeeping • Access Profile: • Frequent updates • Occasional lookup • Data: • Current “state” data • Limited history • Narrow Scope

Tactical (ODS)
• User Profiles • Front Line Services • Customers - Indirectly • Services: • Lookups • Tactical decisions • Analytics (e.g. scoring) • Access Profile: • Continuous updates • Frequent lookups • Data Model: • Current “state” data • Recent history • Integrated business areas

Strategic (EDW)
• User Profiles • Back Office Services • Management • Trading Partners • Services: • Strategic decisions • Analytics (e.g. scoring) • Access Profile: • Bulk Inserts – Some Updates • Frequent complex analytics • Data Model: • Periodic “state” data • Deep history • Enterprise integrated view

Workload Continuum

OLTP1

OLTPi

OLTPn

Active Data Warehouse
Tactical and Strategic Decision Repositories

Transactional Repositories

Data Warehouse Needs Will Evolve
ACTIVATING MAKE it happen!

Workload Complexity

• • • • • • •

Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Simultaneous Workloads: Depth of history grows Strategic, tactical, Number of users grows loading Expectations grow
Increasing depth and breadth of users and queries ANALYZING WHY did it happen? REPORTING WHAT happened?
Increase in ad hoc analysis Primarily batch and some ad hoc reports

OPERATIONALIZING WHAT IS happening?

PREDICTING WHAT WILL happen?

Event-based triggering takes hold

Continuous update and time-sensitive queries become important

Analytical modeling grows

Batch Ad Hoc Analytics Continuous Update/Short Queries Event-Based Triggering

Increasing depth and breadth of data

Data Sophistication

Single View of the Business – Better, Faster Decisions – Drive Business Growth

Data Warehouse Needs Will Evolve
ACTIVATING MAKE it happen! Automate

Workload Complexity

• • • • • • •

Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow

OPERATIONALIZING WHAT IS happening?

PREDICTING WHAT WILL happen?

Event-based triggering takes hold

ANALYZING WHY Understand did it happen? REPORTING WHAT happened?
Increase in ad hoc analysis Primarily batch and some ad hoc reports Analytical modeling grows

Execute
Continuous update and time-sensitive queries become important

Optimize
Batch

Measure

Chasm from static to dynamic decisionmaking

Ad Hoc Analytics Continuous Update/Short Queries Event-Based Triggering

Data Sophistication

Single View of the Business – Better, Faster Decisions – Drive Business Growth

Data Warehouse Needs Will Evolve
Database Requirement: Data Warehouse Foundation must handle multi-dimensional growth! OPERATIONALIZING WHAT IS happening? ACTIVATING MAKE it happen!

Workload Complexity

• • • • • • •

Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow

PREDICTING WHAT WILL happen?

Event-based triggering takes hold

ANALYZING WHY did it happen? REPORTING WHAT happened?
Increase in ad hoc analysis Primarily batch and some ad hoc reports Analytical modeling grows

Continuous update and time-sensitive queries become important

Batch Ad Hoc Analytics Continuous Update/Short Queries Event-Based Triggering

Data Sophistication

Single View of the Business – Better, Faster Decisions – Drive Business Growth

The Multi-Temperature Warehouse

• Customers desire deep historical data in the warehouse.
> The access frequency or average temperature of data varies.
– HOT, WARM, COOL, dormant

> Seamless management required.

• Teradata systems can address this need through a combination of technologies, such as:
> Partitioned primary index (PPI). > Multi-value compression. > Priority scheduler.

Three Tiers of Workload Management

Teradata Dynamic Query Manager

Pre-Execution

•Control what and how much is allowed to begin execution

•Manage the level of resources allocated to different priorities of executing work Priority Scheduler

ADW

(prioritized queues) •Analyse query performance and behavior after completion

Database Query Log

Post-Execution

Teradata Dynamic Query Manager

Indexes
• PI (UPI and NUPI) • SI (USI and NUSI) • Join Index single table index multi table index aggregated index sparse index (where clause used) partial covering global • Materialized Views (join index)

An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension
Data Volume (Raw, User Data) Mixed Workload Query Concurrency

Data Freshness

Query Complexity

Query Freedom Query Data Volume

Schema Sophistication

An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension
Data Volume (Raw, User Data) Mixed Workload Query Concurrency

Data Freshness

The Teradata Difference

Query Complexity

Query Freedom Query Data Volume

Schema Sophistication

The Teradata Difference “Multi-dimensional Scalability”
Data Volume
(Raw, User Data)

Mixed Workload
Customers Need to Evaluate “Real Life” Workloads

Query Concurrency
Good Example? TPC-H Benchmark

Data Freshness

Query Complexity

Query Freedom

Query Data Volume

Schema Sophistication

Teradata Experience in the Communications Industry

Companies generating >80% of the industry revenue utilize Teradata Data Warehousing

Some of Teradata’s Retail Customers Worldwide

Teradata is Well-Positioned in the Top Global 3000 Industries
80% of Top Global Telco Firms 70% of Top Global Airlines 65% of Top Global Retailers 60% of Top Most Admired Global Companies 50% of the Top Transportation Logistic Firms • Leading industries
> > > > > > > > Banking Government Insurance & Healthcare Manufacturing Retail Telecommunications Transportation Logistics Travel

• World class customer list
> More than 750 customers > Over 1200 installations

• Global presence
> Over 100 countries

FORTUNE Global Rankings, April 2005

Industry Leaders Use Teradata
Teradata Global 400 Customers 54% of Retailers 50% of Telco Industry 50% of Transportation Industry 32% of Financial Services Industry 19% of Manufacturers • Leading industries
> > > > > > > > Banking Government Insurance & Healthcare Manufacturing Retail Telecommunications Transportation Logistics Travel

• World class customer list
> More than 750 customers > Over 1200 installations

• Global presence
> Over 100 countries

www.teradata.com
Data Volume (Raw, User Data) Mixed Workload Query Concurrency

Data Freshness

The Teradata Difference

Query Complexity

Query Freedom Query Data Volume

Schema Sophistication

teradata tools

Comments

Content

Sponsor Documents

Recommended