It has been designed to run the world‘s largest commercial databases.
Preferred solution for enterprise data warehousing
Executes on UNIX MP-RAS and Windows 2000 operating systems
It is compliant with ANSI industry standards st andards
Runs on a single or multiple nodes
It is a ―database server‖
Uses parallelism to manage ―terabytes‖ of data
Capable of supporting many concurrent users from various client platforms
Teradata – Teradata – A Brief History
1979 – Teradata Teradata Corp founded in Los Angeles, California Development begins on a massively parallel computer – Development 1982 – YNET YNET technology is patented 1984 – Teradata Teradata markets the first database d atabase computer DBC/1012 First system purchased by Wells Fargo Bank of Cal. – First Total revenue for year -$3 million – Total 1987 – First First public offering of stock 1989 – Teradata Teradata and NCR partner on next generation of DBC 1991 – NCR Corporation Corporation is acquired by AT&T Teradata revenues at $280 million – Teradata Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Ameerpet, Hyderabad. ph-8374187525
Page 1
1992 – Teradata Teradata is merged into NCR 1996 – AT&T AT&T spins off NCR Corp. with Teradata product 1997 – The The Teradata Database becomes the industry leader in data warehousing 2000 – First First 100+ Terabyte system in production 2002 – Teradata Teradata V2R5 released 12/2002; major release including featuressuch as PPI, roles and profiles, multi-value compression, and more. 2003 – Teradata Teradata V2R5.1 released 12/2003; includes UDFs, BLOBs, CLOBs, and more. 2005 – Teradata Teradata V2R6 Released Collect Statistics enhancement 2007 – Teradata Teradata Td12 Released Query Rewrite, 2009 – Teradata Teradata TD13 Released Scalar Subquery, NOPI Ongoing Development Development TD14 TD14 Temporal feature
1992 – Teradata Teradata is merged into NCR 1996 – AT&T AT&T spins off NCR Corp. with Teradata product 1997 – The The Teradata Database becomes the industry leader in data warehousing 2000 – First First 100+ Terabyte system in production 2002 – Teradata Teradata V2R5 released 12/2002; major release including featuressuch as PPI, roles and profiles, multi-value compression, and more. 2003 – Teradata Teradata V2R5.1 released 12/2003; includes UDFs, BLOBs, CLOBs, and more. 2005 – Teradata Teradata V2R6 Released Collect Statistics enhancement 2007 – Teradata Teradata Td12 Released Query Rewrite, 2009 – Teradata Teradata TD13 Released Scalar Subquery, NOPI Ongoing Development Development TD14 TD14 Temporal feature
Index use for Fast Retrieval Handles Millions of Rows data
Teradata in the Enterprise Large capacity database machine: The Teradata Database handles the
large data storage requirements to process the large amounts of detail data for decision support. Thisincludes Terabytes of detailed data stored in billions of rows and Thousands of Millions of Instructions per Second (MIPS) to process data.
Parallel processing:Parallel processingis the key thing which makes
Teradata RDBMS faster than other relational systems.
Single data store: Teradata RDBMS can be accessed by network-attached
and channel-attached systems. It also supports the requirements of many Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Ameerpet, Hyderabad. ph-8374187525
Page 3
diverse clients.
Fault tolerance: Teradata RDBMS automatically detects and recovers from
hardware failures.
Data integrity: Teradata RDBMS ensures that transactions either complete
or rollback to a stable state if a fault occurs.
Scalable growth: Teradata RDBMS allows expansion without sacrificing
performance.
SQL: Teradata RDBMS serves as a standard access language that permits
customers to control data.
Teradata Architecture and Components:
The BYNET
At the most elementary level, you can look at the BYNET as a bus that loosely couples all the SMP nodes in a multinode system. However, this view does an injustice to the BYNET, because the capabilities of the network range far beyond those of a simple system bus.
The BYNET also possesses high-speed logic arrays that provide bidirectional broadcast, multicast, and point-to-point communication and merge functions.
A multinode system has at leas two BYNETs. This creates a fault-tolerant environment and enhances interprocessor communication. Load-balancing software optimizes the transmission of messages over the BYNETs. If one BYNET should fail, the second can handle the traffic.
The total bandwidth for each network link to a processor node is ten megabytes. The total throughput available for each node is 20 megabytes, because each node has two network links and the bandwidth is linearly scalable. For example, a 16-node system has 320 megabytes of bandwidth for point-to-point connections. The total, available broadcast bandwidth for any size system is 20 megabytes.The BYNET software also provides a standard TCP/IP interface for communication among the SMP nodes.The following figure shows how the BYNET connects individual SMP nodes tocreate an MPP system.
Boardless BYNET
Single-node SMP systems use Boardless BYNET (or virtual BYNET) software tosimulate the BYNET hardware driver. Both the SMP and MPP machines run theset of software processes called vprocs on a node under the Parallel DatabaseExtensions (PDE) software layer.
Parallel Database Extensions
Parallel Database Extensions (PDE) software is an interface layer on top of theoperating system.
The PDE provides the ability to: Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 5
• Execute vprocs • Run the Teradata RDBMS in a parallel environment • Apply a flexible priority scheduler to Teradata RDBMS sessions •Debug the operating system kernel and the Teradata RDBMS using
resident debugging facilities
The PDE also enables an MPP system to: • Take advantage of hardware features such as the BYNET and shared disk
arrays • Process user applications written for the underlying operating system on
non-Trusted Parallel Application (non-TPA) nodes and disks different fromthose configured for the parallel database
PDE can be start, reset, and stop on Windows systems using the TeradataMultiTool utility and on UNIX MP-RAS systems using the xctl utility.
Virtual Processors:
The versatility of the Teradata RDBMS is based on virtual processors (vprocs)that eliminate dependency on specialized physical processors. Vprocs are a setof software processes that run on a node under the Teradata Parallel DatabaseExtensions (PDE) within the multitasking environment of the operatingsystem.
The two types of vprocs are Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 6
PE:
The PE performs session control and dispatching tasks as well as
parsing functions. AMP: The AMP performs database functions to retrieve and update data on
the virtual disks (vdisks).
A single system can support a maximum of 16,384 vprocs. The maximum number of vprocs per node can be as high as 128.
Each vproc is a separate, independent copy of the processor software, isolatedfrom other vprocs, but sharing some of the physical resources of the node, suchas memory and CPUs. Multiple vprocs can run on an SMP platform or a node.
Vprocs and the tasks running under them communicate using unique-address messaging, as if they were physically isolated from one another. This messagecommunication is done using the Boardless BYNET Driver software on singlenodeplatforms or BYNET hardware and BYNET Driver software on multinodeplatforms.
A Parsing Engine (PE) is a virtual processor (vproc) that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions.
The PE handles an incoming request in the following manner: The Session Control component verifies the request for session authorization (user names and passwords), and either allows or disallows the request.
The Parser does the following: Interprets the SQL statement received from the application.Verifies SQL requests for the proper syntax and evaluates them semantically. Consults theData Dictionary to ensure that all objects exist and that the user has authority to access them.
The Optimizer is cost-based and develops the least expensive plan (in terms of time) to return the requested response set. Processing alternatives are evaluated and the fastest alternative is chosen. This alternative is converted into executable steps, to be performed by the AMPs , which are then passed to the Dispatcher.
The Dispatcher controls the sequence in which the steps are executed and passes the steps received from the optimizer onto the BYNET for execution Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 8
by the AMPs. After the AMPs process the steps, the PE receives their responses over the BYNET.The Dispatcher builds a response message and sends the message back to the user
Access Module Processor (AMP )
The AMP is a vproc in the Teradata Database's shared-nothing architecture that is responsible for managing a portion of the database. Each AMP will manage some portion of each table on the system. AMPs do the physical work associated with generating an answer set (output) including sorting, aggregating, formatting, and converting. The AMPs retrieve and perform
all database management functions on the required rows from a table.
An AMP accesses data from its single associated vdisk , which is made up of multiple ranks of disks. An AMP responds to Parser/Optimizer steps transmitted across the BYNET by selecting data from or storing data to its disks. For some requests, the AMPs may redistribute a copy of the data to other AMPs.
Database Manager subsystem resides on each AMP. This subsystem will: Lock databases and tables. Create, modify, or delete definitions of tables. Insert, delete, or modify rows within the tables. Retrieve information from definitions and tables.
Return responses to the Dispatcher.
Teradata Directory Program
The Teradata Director Program (TDP) is a Teradata-supplied program that must run on any client system that will be channel-attached to the Teradata RDBMS. The TDP manages the session traffic between the Call-Level Interface and the RDBMS.
Functions of TDP include the following:
• Session initiation and termination • Logging, verification, recovery, and restart • Physical input to and output from the Teradata server, including session
The Call Level Interface (CLI) is a library of routines that resides on the client side. Client application programs use these routines to perform operations such as logging on and off, submitting SQL queries and receiving responses which contain the answer set. These routines are 98% the same in a network-attached environment as they are in a channel-attached.
The Teradata ODBC™ (Open Data base Connectivity) or JDBC (Java) drivers use open standards-based ODBC or JDBC interfaces to provide client applications access to Teradata across LAN-based environments.
The Micro Teradata Director Program (MTDP) is a Teradata-supplied program that must be linked to any application that will be network-attached to the Teradata RDBMS. The MTDP performs many of the functions of the channel based TDP including session management. The MTDP does not control session balancing across PEs. Connect and Assign Servers that run on the Teradata system handle this activity.
The Micro Operating System Interface (MOSI) is a library of routines providing operating system independence for clients accessing the RDBMS. By using MOSI, we only need one version of the MTDP to run on all network-attached platforms.
Trusted Parallel Applications
The PDE provide a series of parallel operating system services to a special classof tasks called a Trusted Parallel Application (TPA). Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 11
On an SMP or MPP system, the TPA is the Teradata RDBMS. TPA services include: • Facilities to manage parallel execution of the TPA on multiple nodes • Dynamic distribution of execution processes • Coordination of all execution threads, whether on the same or on different
nodes • Balancing of the TPA work load within a clique • Resident debugging facilities in addition to kernel and application
Software is equivalent regardless of configuration o
No user changes as system grows from small SMP to huge MPP
Delivers linear scalability o
Maximizes utilization of SMP resources
o
To any size configuration
o
Allows flexible configurations
o
Incremental upgrades
SMP vs. MPP:
A Teradata Database system contains one or more nodes. A node is a term for a processing unit under the control of a single operating system. The node is where the processing occurs for the Teradata Database. There are two types of Teradata Database systems:
Symmetric multiprocessing (SMP) - An SMP Teradata Database has a
single node that contains multiple CPUs sharing a memory pool. Massively parallel processing (MPP) - Multiple SMP nodes working
together comprise a larger, MPP implementation of a Teradata Database. The nodes are connected using the BYNET, which allows multiple virtual processors on multiple nodes to communicate with each other.
Benefits of Teradata : Shared Nothing - Dividing the Data
Data automatically distributed to AMPs via hashing
The Teradata Database virtual processors, or vprocs (which are the PEs and AMPs), share the components of the nodes (memory and cpu). The main component of the "shared-nothing" architecture is that each AMP manages its own dedicated portion of the system's disk space (called the vdisk) and this space is not shared with other AMPs. Each AMP uses system resources independently of the other AMPs so they can all work in parallel for high system performance overall.
Prime Index (PI) column(s) are hashes
Hash is always the same - for the same value
No partitioning or repartitioning ever required
Space Allocation:
Space allocation is entirely dynamic o
o
No tablespaces or journal spaces or any pre-allocation Spool (temp) and tables share space pool, no fixed reserved allocations
If no cylinder free, combine partial cylinders o
Dynamic and automatic
o
Background compaction based on tunable threshold
Quotas control disk space utilization o
Increase quota (trivial online command) to allow user to use more space
Exclusive locks are applied to databases or tables, never to rows. They are the mostrestrictive type of lock. With an exclusive lock, no other user can access the database ortable. Exclusive locks are used when a Data Definition Language (DDL) command isexecuted (i.e., CREATE TABLE). An exclusive lock on a database or table prevents otherusers from obtaining any lock on the locked object.
Write
Write locks enable users to modify data while maintaining data consistency. While the datahas a write lock on it, other users can only obtain an access lock. During this time, all otherlocks are held in a queue until the write lock is released. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 20
Read
Read locks are used to ensure consistency during read operations. Several users may holdconcurrent read locks on the same data, during which time no data modification ispermitted. Read locks prevent other users from obtaining the following locks on the lockeddata: Exclusive locks and Write locks
Access
Access locks can be specified by users unconcerned about data consistency. The use of anaccess lock allows for reading data while modifications are in process. Access locks aredesigned for decision support on tables that are updated only by small, single-row changes. Access locks are sometimes called "stale read" locks, because you may get "stale data"that has not been updated. Access locks prevent other users from obtaining the followinglocks on the locked data: Exclusive locks
Raid1 - Hardware Data Protection
RAID 1 is a data protection scheme that uses mirrored pairs of disks to protect data from a single drive failure
RAID 1 requires double the number of disks because every drive has an identical mirrored copy. Recovery with RAID 1 is faster than with RAID 5. The highest level of data protection is RAID 1 with Fallback.
RAID 5 uses a data parity scheme to provide data protection.
Rank: For the Teradata Database, RAID 5 uses the concept of a rank,
which is a set of disks working together. Note that the disks in a rank are not directly cabled to each other
If one of the disk drives in the rank becomes unavailable, the system
uses the parity byte to calculate the missing data from the down drive so the system can remain operational. With a rank of 4 disks, if a disk fails, any missing data block may be reconstructed using the other 3 disks.
Disk Allocation in Teradata
The operating system, PDE, and the Teradata Database do not recognize the Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 22
physical disk hardware. Each software component recognizes and interacts withdifferent components of the data storage environment:
Operating system: Recognizes a logical unit (LUN). The operating system recognizes the LUN as its "disk," and is not aware that it is actually writing tospaces on multiple disk drives. This technique enables the use of RAIDtechnology to provide data availability without affecting the operating system.
PDE: Translates LUNs into vdisks using slices (in UNIX) or partitions (in MicrosoftWindows and Linux) in conjunction with the Teradata Parallel Upgrade Tool.
Teradata Database: Recognizes a virtual disk (vdisk). Using vdisks instead ofdirect connections to physical disk drives enables the use of RAID technologywith the Teradata Database.
Pdisks: User Data Space
Space on the physical disk drives is organized into LUNs ,After a LUN iscreated, it is divided into partitions.
In UNIX systems, a LUN consists of one partition, which is further dividedinto slices: o
Boot slice (a very small slice, taking up only 35 sectors)
o
User slices for storing data. These user slices are called "pdisks" in theTeradata Database.
partitions(Microsoft Windows), or partitions (Linux) and are usedfor storage of the tables in a database. A LUN may haveone or more pdisks.
Vdisks
The pdisks (user slices or partitions, depending on the operating system) are assigned to an AMP through the software. No cabling is involved.
The combined space on the pdisks is considered the AMP's vdisk. AnAMP manages only its own vdisk (disk space assigned to it), not thevdisk of any other AMP. All AMPs then work in parallel, processing theirportion of the data.
Each
AMP
in
the
system
is
assigned
one
vdisk.
Although
numerousconfigurations are possible, generally all pdisks from a rank (RAID 5) ormirrored pair (RAID 1) are assigned to the same AMP for optimalperformance.
However, an AMP recognizes only the vdisk. The AMP has no controlover the physical disks or ranks that compose the vdisk
Fall Back
Fallback provides data protection at the table level by automatically storing a Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 24
copy of each permanent data row of a table on a different or ―fallback‖
AMP. If an AMP fails, the Teradata Database can access the fallback copy and continue operation. If you cluster your AMPs, fallback also provides for automatic recovery of the down AMP once you bring it back online
The benefits are • Permits access to table data when an AMP is offline. • Adds a level of data protection beyond disk array RAID. • Automatically applies changes to the offline AMP when it is back online.
The disadvantage of fallback is that this method doubles the storage space and the I/O (on inserts, updates, and deletes) for tables.
Clique:
A clique is a collection of nodes with shared access to the same disk
arrays. Each multi-nodesystem has at least one clique.
Nodes are interconnected via the BYNET. Nodes and disks are
interconnected via shared busesand thus can communicate directly.Whilethe shared access is defined to the configuration, it is not activelyusedwhen the Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 25
systemis up and running. On a running system, each rankof disks is addressed by exactly one node.
The shared access allows the system to continue operating during a node
failure. The vprocsremain operational and can access stored data.
If a node fails and then resets: o
Teradata Database restarts across all the nodes.
o
Teradata Database recovers, the BYNET redistributes the vprocs of the node to theothernodes within the clique.
o
Processing continues while the node is being repaired.
Clustering
Clustering provides data protection at the system level. A cluster is a logical group of AMPs that provide fallback capability . If an AMP fails, the remainingAMPs in the same cluster do their own work plus the work of the down AMP.Teradata recommends the cluster size of 2.
Although AMPs are virtual processes and cannot experience a hardware failure, they can be ―down‖ if the AMP cannot get to the data on the disk
array. If two disks in a rank go down, an AMP will be unable to access its data, which is the only situation where an AMP will stay down.
AMP Clustering and Fallback
If the primary AMP fails, the system can still access data on the fallback AMP.This ensures that one copy of a row is available if one or more hardware orsoftware failures occur within an entire array, or an entire node.
The following figure illustrates eight AMPs grouped into two clusters of fourAMPs each. In this configuration, if AMP 3 (or its vdisk) fails and stays offline, itsdata remains available on AMPs 1, 2, and 4. Even if AMPs 3 and 5 failsimultaneously and remain offline, the data for each remains available on the other AMPs in its cluster.
Other AMPs in its cluster. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 27
Down AMP Recovery Journal
The DownAMP Recovery Journal provides automatic data recovery on fallback-protected data tables when a clustered AMP is out of service. This journal
consists
of
two
system
files
stored
in
user
DBC:
DBC.ChangedRowJournal and DBC.OrdSysChngTable.
When a clustered AMP is out of service, the Down AMP Recovery Journal automatically captures changes to fallback-protected tables from the other Amps in the cluster
Each time a change is made to a fallback-protected row that has a copy that resides on a down AMP, the Down AMP Recovery Journal stores the table ID and row ID of the committed changes. When the AMP comes back online, Teradata Database opens the Down AMP Recovery Journal to update, or roll forward, any changes made while the AMP was down.
The recovery operation uses fallback rows to replace primary rows and primary rows to replace fallback rows. The journal ensures that the information on the fallback AMP and on the primary AMP is identical. Once Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 28
the transfer of information is complete and verified, the Down AMP Recovery Journal is discarded automatically.
Transient Journal
The Teradata Database system offers a variety of methods to protect data.Some data protection methods require that you set options when you createtables such as specifying fallback. Other methods are automatically activated when particular events occur in the system. Each data protection techniqueoffers different types of advantages under different circumstances. The followinglist describes a few of automatic data protection methods:
• The Transient Journal (TJ) automatically protects data by storing the image
ofan existing row before a change is made, or the ID of a new row after an insertis made. It enables the snapshot to be copied back to, or a new row to bedeleted from, the data table if a transaction fails or is aborted.The TJ protects against failures that may occur during transaction processing.To safeguard the integrity of your data, the TJ stores:
• A snapshot of a row before an UPDATE or DELETE • The row ID after an INSERT • A control record for each CREATE and DROP statement • Control records for certain operations
Can contain "before" images, which permit rollback, or after images, which permit rollforward, or both before and after images
Provides rollforward recovery
Provides rollback recovery
Provides full recovery of nonfallback tables
Reduces need for frequent, full-table archives
Teradata Storage and retrival Architectures.
Request Processing
1. SQL request is sent from the client to the appropriate component on the node: a. Channel-attached client: request is sent to Channel Driver (through the TDP). b. Network-attached client: request is sent to Teradata Gateway (through CLIv2 or ODBC). 2. Request is passed to the PE(s). 3. PEs parse the request into AMP steps. 4. PE Dispatcher sends steps to the AMPs over the BYNET. 5. AMPs perform operations on data on the vdisks. 6. Response is sent back to PEs over the BYNET. 7. PE Dispatcher receives response. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 30
8. Response is returned to the client (channel-attached or network-attached).
Parsing Engine Request Processing
The SQL parser handles all incoming SQL requests. It processes an incomingrequest as follows:
Stage 1: The Parser looks in the Request cache to determine if the requestis
already there. IF the request is…
THEN the Parser…
in the Request cache
Reuses the plastic steps found in thecache and passes them togncApply. Go to step 8 afterchecking access rights (step 4).
not in the Request Begins processing the request withthe Syntaxer. cache
Stage 2: The Syntaxer checks the syntax of an incoming request. IF there are…
THEN the Syntaxer…
no errors
converts the request to a parse treeand passes it to the Resolver.
errors
passes an error message back to therequestor and stops.
Stage 3 :The Resolver adds information from the Data Dictionary (or cached
Stage 4: The Parser looks in the Request cache to determine if the requestis
already there. IF the access rights are…
THEN the Security module…
valid
passes the request to the Optimizer
not valid
aborts
the
request and
passes
anerror
message and stops.
Stage 5: The Optimizer determines the most effective way to implement the
SQLrequest.
Stage 6: The Optimizer scans the request to determine where locks should
be placed,then passes the optimized parse tree to the Generator.
Stage 7: The Generator transforms the optimized parse tree into plastic steps
andpasses them to gncApply.Plastic steps are directives to the database management system that do notcontain data values.
Stage 8 :gncApply takes the plastic steps produced by the Generator and
transformsthem into concrete steps.Concrete steps are directives to the AMPs that contain any needed user- orsession-specific values and any needed data parcels.
Stage 9: gncApply passes the concrete steps to the Dispatcher.
The Dispatcher controls the sequence in which steps are executed. It also passesthe steps to the BYNET to be distributed to the AMP database managementsoftware as follows:
Stage 1: The Dispatcher receives concrete steps from gncApply.
Stage2:The Dispatcher places the first step on the BYNET; tells the BYNET
whetherthe step is for one AMP, several AMPS, or all AMPs; and waits for acompletion response.
Whenever possible, the Teradata RDBMS performs steps in parallel toenhance performance. If there are no dependencies between a step and thefollowing step, the following step can be dispatched before the first stepcompletes, and the two will execute in parallel. If there is a dependency, forexample, the following step requires as input data that is produced by thefirst step, then the following step can't be dispatched until the first stepcompletes.
Stage 3:
The Dispatcher receives a completion response from all expected AMPsand places the next step on the BYNET. It continues to do this until all theAMP steps associated with a request are done.
The AMPs are responsible for obtaining the rows required to process therequests (assuming that the AMPs are processing a SELECT statement). TheBYNET system controls the transmission of messages to and from the AMPs.An AMP step can be sent to one of the following:
One AMP
A selected set of AMPs, called a dynamic BYNET group
All AMPs in the system
Teradata SQL Reference.
Data Definition Language (DDL) – Defines database structures (tables, users, views, macros, triggers, etc.)
CREATE
REPLACE
DROP
ALTER
Data Manipulation Language (DML) – Manipulates rows and data values
SELECT
INSERT
UPDATE
DELETE
Data Control Language (DCL) – Grants and revokes access rights
employee_number INTEGER NOT NULL, dept_number SMALLINT, job_code INTEGER COMPRESS , first_name VARCHAR(20) NOT CASESPECIFIC, birth_date DATE FORMAT 'YYYY-MM-DD', salary_amount DECIMAL(10,2))
UNIQUE PRIMARY INDEX ( employee_number ) INDEX ( dept_number);
Views
Views are pre-defined subsets of existing tables consisting of specified columns and/or rows from the table(s).
A single table view:
is a window into an underlying table
allows users to read and update a subset of the underlying table
has no data of its own
CREATE VIEW Emp_403 AS SELECT employee_number, epartment_number, last_name, first_name, hire_date ROM Employee WHERE department_number = 403.
CREATE VIEW EmpDept AS SELECT last_name, department_name FROM Employee E INNER JOIN Department D ON E.department_number = D.department_number ; Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 35
MACRO
A MACRO is a predefined set of SQL statements which is logically stored in a database. Macros may be created for frequently occurring queries of sets of operations. Macros have many features and benefits: •Simplify end-user access •Control which operations may be performed by users •May accept user -provided parameter values •Are stored on the RDBMS, thus available to all clients •Reduces query size, thus reduces LAN/channel traffic •Ar e optimized at execution time •May contain multiple SQL statements
To create a macro: CREATE MACRO Customer_List AS (SELECT customer_name FROM Customer;);
Volatile tables have a lot of the advantages of derived tables, and additional benefits such as:
Local to a session - it exists throughout the entire session, not just a single query.
It must be explicitly created using the CREATE VOLATILE TABLEsyntax.
It is discarded automatically at the end of the session.
There is no data dictionary involvement.
Global Temporary Tables
The major difference between a global temporary table and a volatile temporary table is that the global table has a definition in the data dictionary, thus the definition may be shared by many users. Each user session can materialize its own local instance of the table. Attributes of a global temporary table include:
Local to a session, however each user session may have its own instance.
Uses CREATE GLOBAL TEMPORARY TABLE syntax.
Materialized instance of table discarded at session end.
Creates and keeps table definition in data dictionary. Eg derived table
To get the top three selling items across all stores. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 38
Solution
SELECT t.prodid, t.sumsales, RANK(t.sumsales)FROM (SELECT prodid, SUM(sales) FROM salestblGROUP BY 1) AS t(prodid, sumsales)QUALIFY RANK(sumsales)<=3; Result
prodid Sumsales
Rank
A
170000.00
1
C
115000.00
2
D
110000.00
3
Some things to note about the above query include:
The name of the Derived table is 't'.
The Derived column names are 'prodid' and 'sumsales'.
The table is created in spool using the inner SELECT.
The SELECT statement is always in parenthesis following the FROM clause.
Derived tables are a good choice if:
The temporary table is required for this query but no others.
The query will be run only one time with this data.
Volatile temporary tables are similar to derived tables in that they:
Are materialized in spool.
Require no Data Dictionary access or transaction locks.
Have a table definition that is kept in cache.
Are designed for optimal performance.
They are different from derived tables in that they:
Are local to the session, not the query.
Can be used with multiple queries in the session.
Are dropped manually anytime or automatically at session end.
Must be explicitly created with the CREATE VOLATILE TABLE statement.
Example
CREATE VOLATILE TABLE vt_deptsal, LOG (deptno SMALLINT,avgsal DEC(9,2),maxsal DEC(9,2) ,minsal DEC(9,2),sumsal DEC(9,2),empcnt SMALLINT) ON COMMIT PRESERVE ROWS; In the example above, we stated ON COMMIT PRESERVE ROWS. This statement allows us to use the Volatile table again for other queries in the session. The default statement is ON COMMIT DELETE ROWS, which means the data is deleted when the query is committed. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 40
LOG indicates that a transaction journal is maintained, while NO LOG allows for better performance. LOG is the default. Volatile tables do not survive a system restart.
(Error if databasename not username) L imi tations on Volatil e Tables
The following commands are not applicable to VT's:
COLLECT/DROP/HELP STATISTICS
CREATE/DROP INDEX
ALTER TABLE
GRANT/REVOKE privileges
DELETE DATABASE/USER (does not drop VT's)
VT's may not:
Use Access Logging.
Be Renamed.
Be loaded with Multiload or Fastload utilities.
VT's may be referenced in views and macros Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 41
Example
CREATE MACRO vt1 AS (SELECT * FROM vt_deptsal;); Session A
Session B
EXEC vt1
EXEC vt1
Each session has its own materialized instance of vt_deptsal, so each session may return different results. VT's may be dropped before session ends Example
DROP TABLE vt_deptsal; Global Temporary Tables
Global Temporary Tables are created using the CREATE GLOBAL TEMPORARY command. They require a base definition which is stored in the Data Dictionary(DD). Global temporary tables are materialized by the first SQL statement from the following list to access the table:
Global Temporary Tables are different from Volatile Tables in that:
Their base definition is permanent and kept in the DD.
They require a privilege to materialize the table (see list above).
Space is charged against the user's 'temporary space' allocation.
The User can materialize up to 32 global tables per session.
They can survive a system restart.
Global Temporary Tables are similar to Volatile Tables because:
Each instance of a global temporary table is local to a session.
Materialized tables are dropped automatically at the end of the session. (But the base definition is still in the DD)
They have LOG and ON COMMIT PRESERVE/DELETE options.
Materialized table contents are not sharable with other sessions.
Example
CREATE GLOBAL TEMPORARY TABLE gt_deptsal (deptno SMALLINT,avgsal DEC(9,2),maxsal DEC(9,2) ,minsal DEC(9,2),sumsal DEC(9,2),empcnt SMALLINT); The ON COMMIT DELETE ROWS clause is the default, so it does not need to appear in the CREATE TABLE statement. If you want to use the command ON COMMIT PRESERVE ROWS, you must specify that in the CREATE TABLE statement. With global temporary tables, the base table definition is stored in the Data Dictionary. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525
Page 43
ALTER TABLE may also be used to change the defaults. Creating Tables Using Subqueries
Subqueries may be used to limit column and row selection for the target table. Consider the employee table: SHOW TABLE employee; CREATE SET TABLE Customer_Service.employee ,FALLBACK , O BEFORE JOURNAL, O AFTER JOURNAL ( employee_number INTEGER, manager_employee_number INTEGER, department_number INTEGER, ob_code INTEGER, last_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC OT NULL, first_name
VARCHAR(30)
CHARACTER
SET
LATIN
NOT
CASESPECIFIC NOT NULL, hire_date DATE FORMAT 'YY/MM/DD' NOT NULL, birthdate DATE FORMAT 'YY/MM/DD' NOT NULL, salary_amount DECIMAL(10,2) NOT NULL) UNIQUE PRIMARY INDEX ( employee_number );
This example uses a subquery to limit the column choices. CREATE TABLE emp1 AS (SELECT employee_number ,department_number ,salary_amount FROM employee) WITH NO DATA; SHOW TABLE emp1; CREATE SET TABLE Customer_Service.emp1 ,NO FALLBACK , O BEFORE JOURNAL, O AFTER JOURNAL ( employee_number INTEGER, department_number INTEGER, salary_amount DECIMAL(10,2) NOT NULL) PRIMARY INDEX ( employee_number ); Note: When the subquery form of CREATE AS is used:
Table attributes (such as FALLBACK) are not copied from the source table.
Table attributes are copied from standard system defaults (e.g., NO FALLBACK) unless otherwise specified.
Secondary indexes, if present, are not copied from the source table.
The first column specified (employee_number) is created as a NUPI unless otherwise specified
There are some limitations on the use of subqueries for table creation:
The ORDER BY clause is not allowed.
All columns or expressions must have an assigned or defaulted name.
Renami ng Colu mns
Columns may be renamed using the AS clause (the Teradata NAMED extension may also be used). Example
This example changes the column names of the subset of columns used for the target table. CREATE TABLE emp1 AS (SELECT employee_number AS emp ,department_number AS dept ,salary_amount AS sal FROM employee) WITH NO DATA;
The SHOW command displays the current Data Definition Language (DDL) of a database object (e.g., Table, View, Macro, Trigger, Join Index or Stored Procedure). The SHOW command is used primarily to see how an object was created. Command
The EXPLAIN function looks at a SQL request and responds in English how the optimizer plans to execute it. It does not execute the statement and is a good way to see what database resources will be used in processing your request. For instance, if you see that your request will force a full-table scan on a very large table or cause a Cartesian Product Join, you may decide to rewrite a request so that it executes more efficiently. EXPLAIN provides a wealth of information, including the following: 1.) Which indexes if any will be used in the query. 2.) Whether individual steps within the query may execute concurrently (i.e. parallel steps). 3.) An estimate of the number of rows which will be processed. 4.) An estimate of the cost of the query (in time increments).
EXPLAIN SELECT * FROM department;
***QUERY
COMPLETED.10
ROWS
FOUND.1
COLUMN
RETURNED.***
Explanation
1. First, we lock a distinct CUSTOMER_SERVICE."pseudo table" for read
2. Next, we lock CUSTOMER_SERVICE.department for read. 3. We
do
an
all-AMPs
RETRIEVE
step
from
CUSTOMER_SERVICE.department by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 4 rows. The estimated time for this step is 0.15 seconds. 4. Finally, we send out an END TRANSACTION step to all AMPs involved
in
processing
the
request.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.15 seconds.