Teradata
Architecture, Technology, Scalabilty, Performance and Vision for Active Enterprise Data Warehousing
Dr. Barbara Schulmeister Teradata – a Division of NCR
[email protected]
28. 6. 2005
Agenda
• • • • • • • • • • • • History Definitions Hardware Architecture Fault Tolerance and High Availability Coexistence Operational System Tools and Utilities Data Distribution SQL Parser Active Data Warehouse Scalability
Teradata Timeline Overview
Born to be parallel!
DBC Model 1: First MPP System! “Product of the Year” – Forbes DBC Model 3 “Fastest Growing Small Company” – INC Magazine “Fastest Growing Electronic Company” – Electronic Business “Leader in Commercial Parallel Processing” – Gartner Group
1979...
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994...
Teradata Corp. Founded
First 100GB System!
First 500GB System!
First 700GB System!
First Terabyte System!
DBC Model 4
3+ TB System!
First Beta system shipped Christmas to Wells Fargo Bank
Initial public offering on Wall Street
Joint Venture with NCR for next generation systems
more
Teradata Timeline (II)
“#1 in MPP” – IDC Survey in Computerworld DB Expo Realware Award w/ Union Pacific: “Data Warehouse Innovations” Over 500 Production Data Warehouses Worldwide! DWI VLDB Best Practice Award w/ ATT BMD: “Data Warehouse and the Web”
Only Vendor to Publish Multi-user TPC-Ds!
First Vendor to Publish 1TB TPC-D Benchmark!
Teradata V2 on WorldMark 4300
...1995
1996
1997
...
Teradata Version 2 on NCR 3555 SMP
Teradata V2 on WorldMark 5100 SMP & MPP
“...only NCR’s Teradata V2 RDBMS has proven it can scale…” – Gartner Group
Demonstrated World’s Largest Data Warehouse Database at 11TB!
100GB TPC-D Benchmark Leader!
24TB Data Warehouse in Production!
more
Teradata Timeline (III)
Teradata V2 ported to Microsoft Windows NT Industry leading TPC-D benchmark for all volumes Industry leading TPCH at 1TB and 3TB Teradata attains 99.98% availability 64 bit Teradata
Largest Data Warehouse system (176 node, 130 TB disk)
...1998
1999
2000
2001
2002
2003
2004
2005
Database Programming and Design Award
IT Award of Excellence
V2R5 Teradata
V2R6 Teradata
Linux
• TDWI Solution Provider Best Practices in Data Warehousing • TDWI Leadership in Data Warehousing Award • DM Review World-Class Solution Award for business Intelligence • IT Times Award • DM Review 100 Award • DM Review Readership Award • Intelligent Enterprise Real Ware Award
the commitment continues…
Alternative Approaches to Enterprise Analytics
Data Mart Centric Sources Marts Users Users Sources Middleware Virtual, Distributed, Federated Sources DW Marts Users Independent Data Marts P r o s C o n s • Easy to Build Organizationally • Limit Scope • Easy to Build Technically • Business Enterprise view unavailable • Redundant data costs • High ETL costs • High App costs • High DBA and operational costs Leave Data Where it Lies • No need for ETL • No need for separate platform Hub-and-Spoke Data Warehouse • Allows easier customization of user interfaces & reports Centralized Integrated Data With Direct Access • Single Enterprise “Business” View • Data reusability • Consistency • Low Cost of Ownership • Requires corporate leadership and vision Hub-andSpoke Data Warehouse Sources DW Users Enterprise Data Warehouse
• Only viable for low volume access • Meta data issues • Network bandwidth and join complexity issues • Workload typically placed on workstation
• Business Enterprise view challenging • Redundant data costs • High DBA and operational costs • Data latency
A Spectrum of Data Warehouse Architectures
Virtual, Distributed, Federated
Sources Middleware Users
Data Mart Centric
Sources Marts Users
Hub-andSpoke Data Warehouse
Sources DW Marts Users
Enterprise Data Warehouse
Sources DW Users
The goal: Any question, on any data, at any time.
Teradata’s Advocated Data Warehouse Approach for 20 years, Since 1984!
Diffentiating OLTP - DSS
Most time consuming steps:
OLTP
l l l l
DSS
Full scan of big tables Complexe joins Aggregation Sorting
Frequency of steps OLTP or DSS
NCR Server
• Provide customers with growth opportunities and investment protection
> Coexistence is enabled across five generations
– NCR 5400E & 5400H Servers – NCR 4980 & 5380 Servers – NCR 4950 & 5350 Servers – NCR 4900 & 5300 Servers – NCR 485X & 525X Servers BYNET V2 / V3
485X & 525X
4900 & 5300
4950 & 5350
4980 & 5380
5400E & 5400H
NCR Server Generations
NCR 5400 Server SMP
• 5400E
Ethernet Switches
> 1 - 4 nodes > BYNET V2 > ESCON & FICON for 3 and 4 node configurations > Field Upgradeable to 5400H
Up to 4 nodes within each cabinet Server Management
1 3
nd 2 2nd
Node Node st Node 1 1st Node 4th Node 3rd Node 3GSM
Internal BYNET switches
1 3
1 3
1 3
1 3
Three UPS Modules
NCR 5400 Server MPP
• Continued rapid adoption of latest Intel® Technology
> Dual Intel Pentium Xeon EM 64T 3.6 GHz processors with Hyper-Threading (32-bit and 64-bit capability) > 800 MHz front side bus
Ethernet Switches BYNET V3 Switches
1 3
1 3
1 3
1 3
• Industry Standard Form Factor
> Up to 10 nodes per cabinet > Integrated BYNET V3 (provides the capability to physical separate systems between 300-600 meters) > Integrated Server Management > N+1 UPS > Dual AC
Up to 10 nodes within each cabinet
1 3
1 3
1 3
1 3
1 3
1
Server Management Five UPS Modules
3
1 3
• Multi-Generation Coexistence
> Investment protection
Relative CPU Performance per Core
Industry CPU Performance per Core
3000
Xeon 2M L2 >3.6 Ghz 90nm Dual Core 65 nm Next Gen Arch. Dual Core 65 nm Multi Core 45 nm
2500
54000
Xeon 2M L2 3.6 Ghz 90nm
Montecito 90nm Power 6 ~3Ghz 65nm
Tukwilla Common Platform 65nm
2000
Itanium 2 1.6 Ghz 130nm
Xeon 3.0Ghz 1M 130nm
Xeon 3.6 Ghz 90nm
1500
1000
Power 5 ~1.9Ghz 130 nm
Itanium 2 9M 130nm Power 5+ ~2.5Ghz 90 nm
Rock 90nm
Xeon Itanium Power Sparc
500
Power 4+ 1.45Ghz 130 nm
Ultrasparc 3 130 nm 1.6Ghz
0 2004 2005 Year 2006 2007
Symmetric Multi Threading (Hyper Threading) Dual Core Multicore, Multithreaded
Relative CPU Performance based on multi-threading and multi-core roadmap capabilities
www.spec.org: benchmarks SPECint2000 and SPECint_rate2000
Gartner Product Ranking 2004 ASEM
FUJITSU Primepower
HP HP9000
HP Integrity
HP Proliant
IBM pSeries
NCR Teradata
SUN Sunfire 40
PRODUCT
43
45
46
29
45
54
The Product category (which was called Technology in previous ASEM updates) focuses on the performance and reliability/availability aspects of each platform. In this category Teradata received a very strong 93.5% of total possible points and leads the IBM pSeries with 74.35% by 44 points or 19%.
Source Gartner 2004 ASEM Report
NCR Enterprise Storage 6842
• NCR Enterprise Storage 6842 Features
> Two array modules per cabinet > 56, 73GB, 15K drives
– greater than 8 Terabytes of spinning disk per cabinet
> Dual Quad Fibre Channel Controllers per array for performance and availability > Typical configuration is 4 NCR 5400 Server nodes per 3 – 6842 arrays
– 1.2 Terabytes of database space per node (RAID 1)
> Supports RAID 1 and RAID 5 > Support for MP-RAS and Microsoft Windows Server 2003 environments
EMC Symmetrix DMX
• Enterprise Fit • Storage Standardization • Extended storage life through Redeployment
EMC Model Disks Teradata Use RAID Options Operating Environment Maximum Teradata disks
DMX 1000 M2 73GB – 15K RPM MPP: supports 1 or 2 nodes per cabinet RAID -1 Only MP-RAS and Windows 96
DMX 2000 M2 73GB – 15K RPM MPP: supports 2, 3, or 4 nodes per cabinet RAID-1 Only MP-RAS and Windows 192
Assumption: Compute and Storage Balance
• A balanced configuration is one where the storage I/O subsystem for each compute node is configured with enough disk spindles, disk controllers, and connectivity so that the disk subsystem can satisfy the CPU demand from that node. • A supersaturated configuration also can satisfy the CPU demand from that node although the extra I/O may be underutilized.
> This is useful for investment protection on certain upgrade paths.
• All system configurations discussed in this presentation are based on balanced or supersaturated compute nodes.
Node CPU and Storage I/O Balance
Node/Storage Balance and Response Time
Optimum
e im eT ns po es yR er Qu
I/O Bandwidth – MB/sec
95% Node Utilization
Effective Node Utilization
th id dw n Ba
Best Query Response Time
# of Disk Drives/Storage Capacity
Industry wide, disk drive capacity is increasing at a faster rate than disk drive performance
Query Response Time
Common Upgrades Applied
GROW RAW DATA VOLUME
Performance more than adequate: Add more data to all nodes Query Response Time
SYSTEM with CURRENT Nodes
Query Response Time Increases because you didn’t add more compute power to support the additional raw data volume.
Raw Data Volume
Typical System Expansion
LINEAR GROWTH
Maintain Query Performance with more nodes Query Response Time
SYSTEM with Current Nodes SYSTEM with More or Faster Nodes Scale out with Teradata by adding compute nodes, interconnect, storage arrays, and disks. aka “horizontal scalability”
Query Response Time Remains Constant because you add proportionally more raw data volume as compute power.
Raw Data Volume
Common Upgrades Applied
GROW QUERY PERFORMANCE
Raw Data Volume adequate: Upgrade to faster CPUs Query Response Time
SYSTEM with Current Nodes SYSTEM with More or Faster Nodes
Query Response Time Decreases because you didn’t add more raw data volume to offset the increase in compute power. “Scale vertically” with Teradata by increasing compute power.
Raw Data Volume
Combo: Upgrade Nodes and Increase Storage Per Node
Adjust Query Performance and Data Volume to match service level agreement Query Response Time
SYSTEM with Current Nodes
SYSTEM with More or Faster Nodes
Scale to Target query performance and data volume by increasing compute power and adding storage.
Raw Data Volume
Scaling by Reconfiguration and Expansion
Improve Query Performance and Adjust Data Volume to match service level agreement Query Response Time
SYSTEM with Current Nodes SYSTEM with More Nodes
Improve query performance and adjust data volume by reducing storage per node and adding more nodes.
Raw Data Volume
Architecture Determines Scalability
CPU(s) Cache Memory Disk Storage
BYNET Fabrics
CPU uses independent direct I/O path to Disk All memory accesses are local
CPU(s) CPU(s) CPU(s) CPU(s) CPU(s) CPU(s) CPU(s)
Cache Cache Cache Cache Cache Cache Cache
Memory Memory Memory Memory Memory Memory Memory
Disk Storage Disk Storage Disk Storage Disk Storage Disk Storage Disk Storage Disk Storage
Interconnect used only for database messages, no I/O or memory traffic
Teradata Shared-Nothing MPP
• Designed for Slope of 1 Linear Scaling • Optimized for very high data rates to/from disk • Excellent performance and efficiency for data warehousing
Teradata MPP Architecture
• Nodes
> Incrementally scalable to 1024 nodes > Windows or Unix > Independent I/O > Scales per node
Dual BYNET Interconnects
• Storage
SMP Node1
CPU1 CPU2
SMP Node2
CPU1 CPU2
SMP Node3
CPU1 CPU2
SMP Node4
CPU1 CPU2
• BYNET Interconnect • Connectivity
Memory
Memory
Memory
Memory
> Fully scalable bandwidth
• Server Management
> One console to view the entire system
> Fully scalable > Channel – ESCON/FICON > LAN, WAN
Server Management
Node Software Architecture
Perfectly Tuned Nodes Working in Parallel for Scalability and Availability
Teradata Node SW Architecture (SMP)
Parsing Engine Virtual Processors (VPROCS)
4-Node MPP Clique
BYNET
PE1 PE2
AMP1 AMP2 AMP3
AMP5
Access Module Processor VPROCS
VPROCS
AMP6 AMP7 AMP8
VPROCs AMP & PE
VPROCs AMP & PE
VPROCs AMP & PE
VPROCs AMP & PE
LAN Gateway
Communication Interfaces
AMP4
Disk Array
Channel Gateway
Parallel Database Extensions (PDE)
UNIX, Windows 2000
PEs recieve the queries and figure out the query plan AMPs interact with the disk arrays and process the data
PE VProc
Parser Optimizer Session Control
VAMP
Relational Database Management File System / Data Management
Dispatcher
The Scalable BYNET Interconnect
Specifically designed for data warehousing workloads
Multiple Simultaneous Point-to-Point Messaging
Node Node Node Node Node Node Node Node
Broadcast Messaging
Node Node
The Teradata Optimizer chooses between Point-to-Point and Broadcast Messaging to select the most effective communication.
• Bandwidth scales linearly to 1,024 nodes • Redundant, fault tolerant network • Guaranteed message delivery
Built-In Integrated Fail Over
• Teradata provides built-in node failover.
> Cost effective > Easy to deploy
• Work migrates to the remaining nodes in the cliques. • System performance degradation up to 33%.
Traditional Configuration
X
Large Cliques
• Double the number of nodes in a clique up to 8. • Work distributed across a greater number of nodes. • Minimize system performance impacts – may not be noticeable to end-users.
X
Node
Node
Node
Node
Node
Node
Node
Node
86% System Performance Continuity
Fibre Channel Switches
Disk Array
Disk Array
Disk Array
Disk Array
Disk Array
Disk Array
Hot Standby Nodes
• Work re-directed to a Hot Standby Node. • No system performance impacts. • Teradata restart can be postponed to a maintenance window.
X
Node Disk Array
Node
Node
Hot Standby
100% System Performance Continuity
Disk Array
Disk Array
Large Clique + Hot Standby Node
• Same performance benefits of Hot Standby node. • Reduced costs for larger system implementations.
X
Node
Node
Node
Node
Node
Node
Node
Hot Standby
Fibre Channel Switches
100% System Performance Continuity
Disk Array Disk Array Disk Array Disk Array Disk Array Disk Array
High Availability
Case
Power Failure Node Failure Bynet failure Disc failure More than one Disc Failure Clique Failure
Hardware
UPS (redundant), Dual AC
Teradata
VPROC Migration (VAMP, PE) Redundant BYNET RAID-1/-5/-S in Disc Subsystem Fallback-Option Fallback-Option
BYNET
Server Nodes
DiskArray Subsystem Clique
Coexistence Considerations
Generation x VAMPS
Generation x VAMPS
AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP
Generation x VAMPS
AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP
AMP AMP
AMP AMP
AMP AMP
AMP AMP
AMP AMP
Performance Factor
1x
1.5x
2.0x
• VAMPs manage the same amount of data • Coexistence enables the faster nodes to be realized by running more VAMPs per node
System Expansion with Teradata Coexistence
• The utilization of multiple generations of hardware within a single Teradata MPP system
Dual BYNET Interconnects
SMP Node1
AMP AMP AMP
SMP Node2
AMP AMP AMP
SMP Node3
AMP AMP AMP
SMP Node4
AMP AMP AMP
SMP Node1
AMP AMP AMP
SMP Node2
AMP AMP AMP
SMP Node3
AMP AMP AMP
SMP Node4
AMP AMP AMP
5380 - 9 AMPs per node
5400 - 12 AMPs per node
Server Management
Customer Example (72 Nodes, 4 Generations)
11/2000 1/2001 Original Footprint Expansion 1 8 Node 5250 12 Node 5250
11/2001 Expansion 2 8 Node 5255
6/2002 Expansion 3 12 Node 5300
6/2003 Expansion 4 16 Node 5350
6/2004 Expansion 5 16 Node 5380
2005 Future 5400 Expansion
Generation “A”
Generation “B”
Generation “C”
Generation “D”
Generation “E”
Database and Operating System
V2R5.0.3, V2R5.1.X, V2R6 on MP-RAS 3.03 V2R5.0.3, V2R5.1.X and V2R6 on MP-RAS 3.02
485x/525x
4900/5300
4950/5350
4980/5380
5400
V2R6 on WS03 (2Q 2005) V2R5.0.3, V2R5.1.X, V2R6 on W2K (2Q 2005) • Database > Teradata V2R6 > Support one Release Back V2R5.1.X (Current Exception in Place V2R5.0.3) • Unix > MP-RAS 3.03 required for Teradata Database on 5400 > MP-RAS 3.02 still supported on previous server generations • Microsoft Windows > Microsoft Windows Server 2003 recommended for new and expanding 5400 motions > Microsoft Windows 2000 supported in 2Q 2005
V2R5.0 Features + V2R5.1 Features
Strategic Decision Making
• Analytic extensions such as Extended Windows Functions & Multiple Aggregate Distincts • Random Stratified Sampling • Join Elimination • Extended Transitive Closure • Partial Group By • Early Group By • Derived Table Rewrite • Very Large SQL • Extended Grouping • Inner Join Optimization • Eliminate Unnecessary Outer Joins • Hash Joins • UDFs for Complex Analysis and Unstructured Data
Tactical & Event-Driven Decision Making
• • • • • •
Partial Covering Join Index Global Index Sparse Index Join Index Extensions ODS Workload Optimization Stored Procedures Enhancements • Enhancements to Triggers • Extra FK-PK Joins in Join Index • UDFs for XML Processing etc.
Trusted, Integrated Environment
• Index Wizard • Statistics Wizard & Collect Statistics Improvements • Query Log • Extreme Workload Management & Administration • Roles and Profiles • SQL Assistant/ Web Edition • Availability • Performance Dashboard & Reporting
Single Version of the Truth
• • • • • • •
Security enhancements (Encryption) DBQL enhancements Database Object Level Use Count ROLES enhancements Priority Scheduler enhancements TDQM enhancements No Auto Restart After Disk-Array Power Failure • Cancel Rollback • Incompatible Package Warning • Disk I/O Integrity Check
Data Freshness
• Cylinder Read • Partitioned Tables (PPI) • Value List Compression • 2000 Columns, 64 Columns per Index • Identity Column • Enhancement to Identity Column • UTF16 Support • PPI Dynamic Partition Elimination • Large Objects (LOBs)
• • Continuous Continuous Update Update Performance Performance & & Manageability Manageability • • Faster Faster Join Join Index Index Update Update • • Join Join Update Update Performance Performance • • Bulk Bulk Update Update Performance Performance • • Teradata Teradata Warehouse Warehouse Builder Builder Full Full Functionality Functionality & & Platform Platform Support Support •UDFs for Data Transformation and Scoring
V2R6.0 Feature List
Strategic Decision Making
• Remove 1MB limit on plan cache size • Increase response buffer to 1MB • Table header expansion • Improve Random AMP Sampling • Top N Row Operation • Recursive Queries
• Improve Primary Index Operations\ • Improved IN-list processing • External Stored Procedures • Trigger calling a Stored Procedure • Stored Procedure Internals Enhancements • Queue tables
Tactical & Event-Driven Decision Making
Trusted, Integrated Environment
• Teradata Dynamic Workload Management • Extensible User Authentication • Directory Integration • Global deadlock logging • Faster Rollbacks
Single View of Your Business
Data Freshness
• Stored Procedure LOB support • External Table Function • Partition level BAR • Eliminate indexed row IDs (PPI) • PPI Join performance improvement • DBS Information consolidation
• Replication Services • Array support • Priority Scheduler enhancements • Reduce restart time
Teradata Tools & Utilities (1)
Load/Unload
Teradata Warehouse Builder FastLoad, MultiLoad & FastExport Teradata TPump Access Modules Teradata Database
Database Management
Teradata Manager Teradata Dynamic Query Manager Teradata System Emulation Tool Teradata Visual Explain Teradata Index Wizard Teradata Statistics Wizard
Teradata Utility Pak
Teradata Administrator Teradata SQL Assistant Teradata SQL Assistant/Web Edition BTEQ ODBC JDBC CLI OLE DB Provider
.com
Metadata
Teradata Meta Data Services
Mainframe Connectivity
Mainframe Channel Connect TS/API, CICS, HUTCNS & IMS/DC
Any Query, Any Time
Technical Differentiator: Database Utilities
Teradata Utilities Are Robust and Mature
• Teradata utilities are fully parallel. • Teradata utilities have checkpoint restart capability. • Data loads directly from the source into the database.
> > > > No No No No manual data partitioning. file splitting. intermediary file transfers. separate data conversion step.
Parallel In Parallel Out
Teradata Warehouse
Teradata Tools & Utilities (2)
• • • • • • Teradata Teradata Teradata Teradata Teradata ..... Data Profiler CRM Warehouse Miner Demand Chain Management Supply Chain Management (LDM = Logical Data Model) • • • • • • • • Financial Solution LDM Retail LDM Communication LDM Insurance/Healthcare LDM Manufacturing LDM Government LDM Media and Entertainment LDM Travel/Transportation LDM
Two Basic Software Architecture Models Task Centric and Data Centric
Request
Request
Request
Request
Task
Shared Memory
Task
Parallel Optimizer
Parallel Unit
Parallel Unit Data
DATA
DATA
Data Data
Data
Uniform and shared access to all platform resources (disk, etc) is REQUIRED
Exclusive access to a subset of resources
Data Centric Software: Teradata Virtual AMP
Tables
28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
AMP - Balanced collection of three abstracted platform resources
P
Processor
D
Memory
M
Disk
AMP 1
P
AMP 2
P
AMP 3
P
AMP 4
P
M
D
M
D
M
D
M
D
25 21 17 13 9 5 1
26 22 18 14 10 6 2
27 23 19 15 11 7 3
28 24 20 16 12 8 4
Table A
Table B
Table C
• Each virtual AMP has rows from every table • Each virtual AMP works independently on its rows • Goal: Database rows are equally distributed across multiple tables
Data distribution by Primary Index
Primary Index value for a row
Hashing Algorithm Destination Selection Word (DSW) – first 16 bits
Row Hash (32 bits) Current configuration Primary Current configuration Fallback Reconfiguration Primary Reconfiguration Fallback
Hash Map
Dual BYNET Interconnects
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Teradata Hashing
Table ORDER
O rder Num ber C u s to m e r Num ber O rder D a te O rder S tatus
PK UPI 7325 7324 7415 7103 7225 7384 7402 7188 7202 2 3 1 1 2 1 3 1 2 4/13 4/13 4/13 4/10 4/15 4/12 4/16 4/13 4/09 O O C O C C C C C
(Hexadecimal)
HASH MAP 6 07 07 07 07 07 07 7 08 08 08 08 08 08 8 01 01 01 01 01 01 9 02 02 02 02 02 02 A 03 03 03 03 03 03 B 04 04 04 04 04 04 C 05 05 05 05 05 05 D 0 0 0 0 0 0
SELECT * FROM ORDER WHERE order_number = 7225;
000 001 002 003 004 005
0 01 01 01 01 01 01
1 02 02 02 02 02 02
2 03 03 03 03 03 03
3 04 04 04 04 04 04
4 05 05 05 05 05 05
5 06 06 06 06 06 06
7225 Hashing Algorithm
AMP 1
AMP 2
AMP 3
AMP x
32 bit Row Hash
DSW # 0000 0000 0001 1010 V V V V 0 0 1 A Remaining16 bits 1100 0111 0101 1011 Bucket Number
7225 2 4/13 O
Primary Index Choice Criteria
ACCESS
Maximize one_AMP operations: choose the column most frequently used for access
DISTRIBUTION
Optimize parallel processing: choose a column that provides good distribution
Volatility
Reduce maintenance resource overhead (I/O): choose a column with stable data values
Data distribution by Primary Index - 2
SQL request Parser algorithm
48 bit table ID
32 bit row hash value
Index value
Dual BYNET Interconnects
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Node1
Node2
Node3
Node4
Logical block identifier
SQL Parser Overview
Request Parcel
Cached?
Syntaxer
DBase, AccRights TVM, TVFields Indexes
DD
Resolver Security
Serial steps Parallel steps Individual and common steps (MSR) Additional: Triggers, check constraints, references, foreign keys, join indexes collected statistics or dynamic sampling
Statistics
Optimizer
Costs
Generator Data Parcel GNCApply AMP Steps
Statistics Summary
Collect statistics on • all non-unique indexes • UPI of any table with less than x rows per AMP (dependent on available number of AMPs) • All indexes of a join index • any non-indexed column used for join constraints • indexes of global temporary tables Collected statistics are not automatically updated by the system Refresh statistics when 5-10% of the table rows have changed
Database Workload Continuum
Transactional (OLTP)
• User Profiles • Customers • Clerks • Services: • Transactions • Bookkeeping • Access Profile: • Frequent updates • Occasional lookup • Data: • Current “state” data • Limited history • Narrow Scope
Tactical (ODS)
• User Profiles • Front Line Services • Customers - Indirectly • Services: • Lookups • Tactical decisions • Analytics (e.g. scoring) • Access Profile: • Continuous updates • Frequent lookups • Data Model: • Current “state” data • Recent history • Integrated business areas
Strategic (EDW)
• User Profiles • Back Office Services • Management • Trading Partners • Services: • Strategic decisions • Analytics (e.g. scoring) • Access Profile: • Bulk Inserts – Some Updates • Frequent complex analytics • Data Model: • Periodic “state” data • Deep history • Enterprise integrated view
Workload Continuum
OLTP1
•••
OLTPi
•••
OLTPn
ODS1
• • •
ODS2
Enterprise Data Warehouse
Strategic Decision Repositories
Transactional Repositories
Tactical Decision Repositories
Database Workload Continuum
Transactional (OLTP)
• User Profiles • Customers • Clerks • Services: • Transactions • Bookkeeping • Access Profile: • Frequent updates • Occasional lookup • Data: • Current “state” data • Limited history • Narrow Scope
Tactical (ODS)
• User Profiles • Front Line Services • Customers - Indirectly • Services: • Lookups • Tactical decisions • Analytics (e.g. scoring) • Access Profile: • Continuous updates • Frequent lookups • Data Model: • Current “state” data • Recent history • Integrated business areas
Strategic (EDW)
• User Profiles • Back Office Services • Management • Trading Partners • Services: • Strategic decisions • Analytics (e.g. scoring) • Access Profile: • Bulk Inserts – Some Updates • Frequent complex analytics • Data Model: • Periodic “state” data • Deep history • Enterprise integrated view
Workload Continuum
OLTP1
OLTPi
OLTPn
Active Data Warehouse
Tactical and Strategic Decision Repositories
Transactional Repositories
Data Warehouse Needs Will Evolve
ACTIVATING MAKE it happen!
Workload Complexity
• • • • • • •
Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Simultaneous Workloads: Depth of history grows Strategic, tactical, Number of users grows loading Expectations grow
Increasing depth and breadth of users and queries ANALYZING WHY did it happen? REPORTING WHAT happened?
Increase in ad hoc analysis Primarily batch and some ad hoc reports
OPERATIONALIZING WHAT IS happening?
PREDICTING WHAT WILL happen?
Event-based triggering takes hold
Continuous update and time-sensitive queries become important
Analytical modeling grows
Batch Ad Hoc Analytics Continuous Update/Short Queries Event-Based Triggering
Increasing depth and breadth of data
Data Sophistication
Single View of the Business – Better, Faster Decisions – Drive Business Growth
Data Warehouse Needs Will Evolve
ACTIVATING MAKE it happen! Automate
Workload Complexity
• • • • • • •
Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow
OPERATIONALIZING WHAT IS happening?
PREDICTING WHAT WILL happen?
Event-based triggering takes hold
ANALYZING WHY Understand did it happen? REPORTING WHAT happened?
Increase in ad hoc analysis Primarily batch and some ad hoc reports Analytical modeling grows
Execute
Continuous update and time-sensitive queries become important
Optimize
Batch
Measure
Chasm from static to dynamic decisionmaking
Ad Hoc Analytics Continuous Update/Short Queries Event-Based Triggering
Data Sophistication
Single View of the Business – Better, Faster Decisions – Drive Business Growth
Data Warehouse Needs Will Evolve
Database Requirement: Data Warehouse Foundation must handle multi-dimensional growth! OPERATIONALIZING WHAT IS happening? ACTIVATING MAKE it happen!
Workload Complexity
• • • • • • •
Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow
PREDICTING WHAT WILL happen?
Event-based triggering takes hold
ANALYZING WHY did it happen? REPORTING WHAT happened?
Increase in ad hoc analysis Primarily batch and some ad hoc reports Analytical modeling grows
Continuous update and time-sensitive queries become important
Batch Ad Hoc Analytics Continuous Update/Short Queries Event-Based Triggering
Data Sophistication
Single View of the Business – Better, Faster Decisions – Drive Business Growth
The Multi-Temperature Warehouse
• Customers desire deep historical data in the warehouse.
> The access frequency or average temperature of data varies.
– HOT, WARM, COOL, dormant
> Seamless management required.
• Teradata systems can address this need through a combination of technologies, such as:
> Partitioned primary index (PPI). > Multi-value compression. > Priority scheduler.
Three Tiers of Workload Management
Teradata Dynamic Query Manager
Pre-Execution
•Control what and how much is allowed to begin execution
•Manage the level of resources allocated to different priorities of executing work Priority Scheduler
ADW
(prioritized queues) •Analyse query performance and behavior after completion
Database Query Log
Post-Execution
Teradata Dynamic Query Manager
Indexes
• PI (UPI and NUPI) • SI (USI and NUSI) • Join Index single table index multi table index aggregated index sparse index (where clause used) partial covering global • Materialized Views (join index)
An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension
Data Volume (Raw, User Data) Mixed Workload Query Concurrency
Data Freshness
Query Complexity
Query Freedom Query Data Volume
Schema Sophistication
An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension
Data Volume (Raw, User Data) Mixed Workload Query Concurrency
Data Freshness
The Teradata Difference
Query Complexity
Query Freedom Query Data Volume
Schema Sophistication
The Teradata Difference “Multi-dimensional Scalability”
Data Volume
(Raw, User Data)
Mixed Workload
Customers Need to Evaluate “Real Life” Workloads
Query Concurrency
Good Example? TPC-H Benchmark
Data Freshness
Query Complexity
Query Freedom
Query Data Volume
Schema Sophistication
Teradata Experience in the Communications Industry
Companies generating >80% of the industry revenue utilize Teradata Data Warehousing
Some of Teradata’s Retail Customers Worldwide
Teradata is Well-Positioned in the Top Global 3000 Industries
80% of Top Global Telco Firms 70% of Top Global Airlines 65% of Top Global Retailers 60% of Top Most Admired Global Companies 50% of the Top Transportation Logistic Firms • Leading industries
> > > > > > > > Banking Government Insurance & Healthcare Manufacturing Retail Telecommunications Transportation Logistics Travel
• World class customer list
> More than 750 customers > Over 1200 installations
• Global presence
> Over 100 countries
FORTUNE Global Rankings, April 2005
Industry Leaders Use Teradata
Teradata Global 400 Customers 54% of Retailers 50% of Telco Industry 50% of Transportation Industry 32% of Financial Services Industry 19% of Manufacturers • Leading industries
> > > > > > > > Banking Government Insurance & Healthcare Manufacturing Retail Telecommunications Transportation Logistics Travel
• World class customer list
> More than 750 customers > Over 1200 installations
• Global presence
> Over 100 countries
www.teradata.com
Data Volume (Raw, User Data) Mixed Workload Query Concurrency
Data Freshness
The Teradata Difference
Query Complexity
Query Freedom Query Data Volume
Schema Sophistication