Teradata Material2

What is Teradata?



Teradata

database

is

a

Relational

Database

Management

System(RDBMS). 

It has been designed to run the world‘s largest commercial databases.



Preferred solution for enterprise data warehousing



Executes on UNIX MP-RAS and Windows 2000 operating systems



It is compliant with ANSI industry standards st andards



Runs on a single or multiple nodes



It is a ―database server‖



Uses parallelism to manage ―terabytes‖ of data



Capable of supporting many concurrent users from various client platforms

Teradata – Teradata – A Brief History

1979 – Teradata Teradata Corp founded in Los Angeles, California Development begins on a massively parallel computer – Development 1982 – YNET YNET technology is patented 1984 – Teradata Teradata markets the first database d atabase computer DBC/1012 First system purchased by Wells Fargo Bank of Cal. – First Total revenue for year -$3 million – Total 1987 – First First public offering of stock 1989 – Teradata Teradata and NCR partner on next generation of DBC 1991 – NCR Corporation Corporation is acquired by AT&T Teradata revenues at $280 million – Teradata Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Ameerpet, Hyderabad. ph-8374187525

Page 1

1992 – Teradata Teradata is merged into NCR 1996 – AT&T AT&T spins off NCR Corp. with Teradata product 1997 – The The Teradata Database becomes the industry leader in data warehousing 2000 – First First 100+ Terabyte system in production 2002 – Teradata Teradata V2R5 released 12/2002; major release including featuressuch as PPI, roles and profiles, multi-value compression, and more. 2003 – Teradata Teradata V2R5.1 released 12/2003; includes UDFs, BLOBs, CLOBs, and more. 2005 – Teradata Teradata V2R6 Released Collect Statistics enhancement 2007 – Teradata Teradata Td12 Released Query Rewrite, 2009 – Teradata Teradata TD13 Released Scalar Subquery, NOPI Ongoing Development Development TD14 TD14 Temporal feature

How large is a Trillion?

1 Kilobyte

= 10^3 = 1000 bytes

1 Megabyte = 10^6 = 1,000,000 bytes 1 Gigabyte

= 10^9 = 1,000,000,000 bytes

1 Terabyte = 10^12 = 1,000,000,000,000 1,000,000,000,000 bytes

1 Petabyte

= 10^15 = 1,000,000,000,000,000 1,000,000,000,000,000 bytes

Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Ameerpet, Hyderabad. ph-8374187525

Page 2

1992 – Teradata Teradata is merged into NCR 1996 – AT&T AT&T spins off NCR Corp. with Teradata product 1997 – The The Teradata Database becomes the industry leader in data warehousing 2000 – First First 100+ Terabyte system in production 2002 – Teradata Teradata V2R5 released 12/2002; major release including featuressuch as PPI, roles and profiles, multi-value compression, and more. 2003 – Teradata Teradata V2R5.1 released 12/2003; includes UDFs, BLOBs, CLOBs, and more. 2005 – Teradata Teradata V2R6 Released Collect Statistics enhancement 2007 – Teradata Teradata Td12 Released Query Rewrite, 2009 – Teradata Teradata TD13 Released Scalar Subquery, NOPI Ongoing Development Development TD14 TD14 Temporal feature

How large is a Trillion?

1 Kilobyte

= 10^3 = 1000 bytes

1 Megabyte = 10^6 = 1,000,000 bytes 1 Gigabyte

= 10^9 = 1,000,000,000 bytes

1 Terabyte = 10^12 = 1,000,000,000,000 1,000,000,000,000 bytes

1 Petabyte

= 10^15 = 1,000,000,000,000,000 1,000,000,000,000,000 bytes


Page 2

Differences to Teradata RDBMS and Other RDBMS:

Teradata RDBMS

Other RDBMS

1 Supports unconditional parallelism

Supports conditional parallelism

2 Designed for DSS & DW systems

Designed for OLTP systems

3 Architecture is Shared Nothing.

Architecture is shared Everything

4 Supports Tera Bytes of data

Supports Giga Bytes of data

5

Index used for Better storage and fast retrieval

6 Handles Billions of Rows data

Index use for Fast Retrieval Handles Millions of Rows data

Teradata in the Enterprise Large capacity database machine: The Teradata Database handles the

large data storage requirements to process the large amounts of detail data for decision support. Thisincludes Terabytes of detailed data stored in billions of rows and Thousands of Millions of Instructions per Second (MIPS) to process data.

Parallel processing:Parallel processingis the key thing which makes

Teradata RDBMS faster than other relational systems.

Single data store: Teradata RDBMS can be accessed by network-attached

and channel-attached systems. It also supports the requirements of many Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Ameerpet, Hyderabad. ph-8374187525

Page 3

diverse clients.

Fault tolerance: Teradata RDBMS automatically detects and recovers from

hardware failures.

Data integrity: Teradata RDBMS ensures that transactions either complete

or rollback to a stable state if a fault occurs.

Scalable growth: Teradata RDBMS allows expansion without sacrificing

performance.

SQL: Teradata RDBMS serves as a standard access language that permits

customers to control data.

Teradata Architecture and Components:

The BYNET

At the most elementary level, you can look at the BYNET as a bus that loosely couples all the SMP nodes in a multinode system. However, this view does an injustice to the BYNET, because the capabilities of the network range far beyond those of a simple system bus.

The BYNET also possesses high-speed logic arrays that provide bidirectional broadcast, multicast, and point-to-point communication and merge functions.

Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 4

A multinode system has at leas two BYNETs. This creates a fault-tolerant environment and enhances interprocessor communication. Load-balancing software optimizes the transmission of messages over the BYNETs. If one BYNET should fail, the second can handle the traffic.

The total bandwidth for each network link to a processor node is ten megabytes. The total throughput available for each node is 20 megabytes, because each node has two network links and the bandwidth is linearly scalable. For example, a 16-node system has 320 megabytes of bandwidth for point-to-point connections. The total, available broadcast bandwidth for any size system is 20 megabytes.The BYNET software also provides a standard TCP/IP interface for communication among the SMP nodes.The following figure shows how the BYNET connects individual SMP nodes tocreate an MPP system.

Boardless BYNET

Single-node SMP systems use Boardless BYNET (or virtual BYNET) software tosimulate the BYNET hardware driver. Both the SMP and MPP machines run theset of software processes called vprocs on a node under the Parallel DatabaseExtensions (PDE) software layer.

Parallel Database Extensions

Parallel Database Extensions (PDE) software is an interface layer on top of theoperating system.

The PDE provides the ability to: Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 5

• Execute vprocs • Run the Teradata RDBMS in a parallel environment • Apply a flexible priority scheduler to Teradata RDBMS sessions •Debug the operating system kernel and the Teradata RDBMS using

resident debugging facilities

The PDE also enables an MPP system to: • Take advantage of hardware features such as the BYNET and shared disk

arrays • Process user applications written for the underlying operating system on

non-Trusted Parallel Application (non-TPA) nodes and disks different fromthose configured for the parallel database

PDE can be start, reset, and stop on Windows systems using the TeradataMultiTool utility and on UNIX MP-RAS systems using the xctl utility.

Virtual Processors:

The versatility of the Teradata RDBMS is based on virtual processors (vprocs)that eliminate dependency on specialized physical processors. Vprocs are a setof software processes that run on a node under the Teradata Parallel DatabaseExtensions (PDE) within the multitasking environment of the operatingsystem.

The two types of vprocs are Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 6

PE:

The PE performs session control and dispatching tasks as well as

parsing functions. AMP: The AMP performs database functions to retrieve and update data on

the virtual disks (vdisks).

A single system can support a maximum of 16,384 vprocs. The maximum number of vprocs per node can be as high as 128.

Each vproc is a separate, independent copy of the processor software, isolatedfrom other vprocs, but sharing some of the physical resources of the node, suchas memory and CPUs. Multiple vprocs can run on an SMP platform or a node.

Vprocs and the tasks running under them communicate using unique-address messaging, as if they were physically isolated from one another. This messagecommunication is done using the Boardless BYNET Driver software on singlenodeplatforms or BYNET hardware and BYNET Driver software on multinodeplatforms.


Page 7

Parsing Engine:

A Parsing Engine (PE) is a virtual processor (vproc) that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions.

The PE handles an incoming request in the following manner: The Session Control component verifies the request for session authorization (user names and passwords), and either allows or disallows the request.

The Parser does the following: Interprets the SQL statement received from the application.Verifies SQL requests for the proper syntax and evaluates them semantically. Consults theData Dictionary to ensure that all objects exist and that the user has authority to access them.

The Optimizer is cost-based and develops the least expensive plan (in terms of time) to return the requested response set. Processing alternatives are evaluated and the fastest alternative is chosen. This alternative is converted into executable steps, to be performed by the AMPs , which are then passed to the Dispatcher.

The Dispatcher controls the sequence in which the steps are executed and passes the steps received from the optimizer onto the BYNET for execution Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 8

by the AMPs. After the AMPs process the steps, the PE receives their responses over the BYNET.The Dispatcher builds a response message and sends the message back to the user

Access Module Processor (AMP )

The AMP is a vproc in the Teradata Database's shared-nothing architecture that is responsible for managing a portion of the database. Each AMP will manage some portion of each table on the system. AMPs do the physical work associated with generating an answer set (output) including sorting, aggregating, formatting, and converting. The AMPs retrieve and perform

all database management functions on the required rows from a table.


Page 9

An AMP accesses data from its single associated vdisk , which is made up of multiple ranks of disks. An AMP responds to Parser/Optimizer steps transmitted across the BYNET by selecting data from or storing data to its disks. For some requests, the AMPs may redistribute a copy of the data to other AMPs.

Database Manager subsystem resides on each AMP. This subsystem will: Lock databases and tables. Create, modify, or delete definitions of tables. Insert, delete, or modify rows within the tables. Retrieve information from definitions and tables.

Return responses to the Dispatcher.

Teradata Directory Program

The Teradata Director Program (TDP) is a Teradata-supplied program that must run on any client system that will be channel-attached to the Teradata RDBMS. The TDP manages the session traffic between the Call-Level Interface and the RDBMS.

Functions of TDP include the following:

• Session initiation and termination • Logging, verification, recovery, and restart • Physical input to and output from the Teradata server, including session

balancing and queue maintenance • Security Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 10

The Call Level Interface (CLI) is a library of routines that resides on the client side. Client application programs use these routines to perform operations such as logging on and off, submitting SQL queries and receiving responses which contain the answer set. These routines are 98% the same in a network-attached environment as they are in a channel-attached.

The Teradata ODBC™ (Open Data base Connectivity) or JDBC (Java) drivers use open standards-based ODBC or JDBC interfaces to provide client applications access to Teradata across LAN-based environments.

The Micro Teradata Director Program (MTDP) is a Teradata-supplied program that must be linked to any application that will be network-attached to the Teradata RDBMS. The MTDP performs many of the functions of the channel based TDP including session management. The MTDP does not control session balancing across PEs. Connect and Assign Servers that run on the Teradata system handle this activity.

The Micro Operating System Interface (MOSI) is a library of routines providing operating system independence for clients accessing the RDBMS. By using MOSI, we only need one version of the MTDP to run on all network-attached platforms.

Trusted Parallel Applications

The PDE provide a series of parallel operating system services to a special classof tasks called a Trusted Parallel Application (TPA). Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 11

On an SMP or MPP system, the TPA is the Teradata RDBMS. TPA services include: • Facilities to manage parallel execution of the TPA on multiple nodes • Dynamic distribution of execution processes • Coordination of all execution threads, whether on the same or on different

nodes • Balancing of the TPA work load within a clique • Resident debugging facilities in addition to kernel and application

Debuggers Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 12

NODE:

Teradata Architecture: Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 13

Teradata MPP Architecture


Page 14



BYNET Interconnect 









Fully scalable bandwidth

Nodes 

Incrementally scalable to 1024 nodes



Windows or Unix

Storage 

Independent I/O



Scales per node

Connectivity 

Fully scalable



Channel – ESCON/FICON



LAN, WAN

Server Management 

One console to view the entire system

Shared Nothing Architecture 

―Virtual processors‖ (vprocs) do the work



Two types



o

AMP: owns and operates on the data

o

PE: handles SQL and external interaction

Configure multiple vprocs per hardware node o



Take full advantage of SMP CPU and memory

Each vproc has many threads of execution o

Many operations executing concurrently

o

Each thread can do work for any user or transaction


Page 15



Software is equivalent regardless of configuration o



No user changes as system grows from small SMP to huge MPP

Delivers linear scalability o

Maximizes utilization of SMP resources

o

To any size configuration

o

Allows flexible configurations

o

Incremental upgrades

SMP vs. MPP:

A Teradata Database system contains one or more nodes. A node is a term for a processing unit under the control of a single operating system. The node is where the processing occurs for the Teradata Database. There are two types of Teradata Database systems:

Symmetric multiprocessing (SMP) - An SMP Teradata Database has a

single node that contains multiple CPUs sharing a memory pool. Massively parallel processing (MPP) - Multiple SMP nodes working

together comprise a larger, MPP implementation of a Teradata Database. The nodes are connected using the BYNET, which allows multiple virtual processors on multiple nodes to communicate with each other.

Benefits of Teradata : Shared Nothing - Dividing the Data 

Data automatically distributed to AMPs via hashing



Even distribution results in scalable performance


Page 16



The Teradata Database virtual processors, or vprocs (which are the PEs and AMPs), share the components of the nodes (memory and cpu). The main component of the "shared-nothing" architecture is that each AMP manages its own dedicated portion of the system's disk space (called the vdisk) and this space is not shared with other AMPs. Each AMP uses system resources independently of the other AMPs so they can all work in parallel for high system performance overall.



Prime Index (PI) column(s) are hashes



Hash is always the same - for the same value



No partitioning or repartitioning ever required

Space Allocation: 

Space allocation is entirely dynamic o

o

No tablespaces or journal spaces or any pre-allocation Spool (temp) and tables share space pool, no fixed reserved allocations





If no cylinder free, combine partial cylinders o

Dynamic and automatic

o

Background compaction based on tunable threshold

Quotas control disk space utilization o

Increase quota (trivial online command) to allow user to use more space

Data Management - Bottom Line 

No reorgs o

Don‘t even have a reorg utility


Page 17



No index rebuilds



No re-partitioning



No detailed space manageme



Easy database and table definition



Minimum ongoing maintenance o

All performed automatically

Optimizer - Parallelization 

Cost based optimizer o

Parallel aware



Rewrites built-in and cost based



Parallelism is automatic



Parallelism is unconditional



Each query step fully parallelized



No single threaded operations o

Scans, Joins, Index access, Aggregation, Sort, Insert, Update, Delete


Page 18

Traditional ―Conditional Parallelism‖

Teradata ―Conditional Parallelism‖

Data Recovery and Protection: Locks 

Locks may be applied at three levels:



Database Locks: Apply to all tables and views in the database.



Table Locks: Apply to all rows in the table or view.



Row Hash Locks: Apply to a group of one or more rows in a table


Page 19

100 80 60

East

40

West

20

North

0 1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

The four types of locks are described below. 

Exclusive

Exclusive locks are applied to databases or tables, never to rows. They are the mostrestrictive type of lock. With an exclusive lock, no other user can access the database ortable. Exclusive locks are used when a Data Definition Language (DDL) command isexecuted (i.e., CREATE TABLE). An exclusive lock on a database or table prevents otherusers from obtaining any lock on the locked object. 

Write

Write locks enable users to modify data while maintaining data consistency. While the datahas a write lock on it, other users can only obtain an access lock. During this time, all otherlocks are held in a queue until the write lock is released. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 20



Read

Read locks are used to ensure consistency during read operations. Several users may holdconcurrent read locks on the same data, during which time no data modification ispermitted. Read locks prevent other users from obtaining the following locks on the lockeddata: Exclusive locks and Write locks 

Access

Access locks can be specified by users unconcerned about data consistency. The use of anaccess lock allows for reading data while modifications are in process. Access locks aredesigned for decision support on tables that are updated only by small, single-row changes. Access locks are sometimes called "stale read" locks, because you may get "stale data"that has not been updated. Access locks prevent other users from obtaining the followinglocks on the locked data: Exclusive locks

Raid1 - Hardware Data Protection

RAID 1 is a data protection scheme that uses mirrored pairs of disks to protect data from a single drive failure

RAID 1 requires double the number of disks because every drive has an identical mirrored copy. Recovery with RAID 1 is faster than with RAID 5. The highest level of data protection is RAID 1 with Fallback.


Page 21

Raid5 - Hardware Data Protection 

RAID 5 uses a data parity scheme to provide data protection.



Rank: For the Teradata Database, RAID 5 uses the concept of a rank,

which is a set of disks working together. Note that the disks in a rank are not directly cabled to each other 

If one of the disk drives in the rank becomes unavailable, the system

uses the parity byte to calculate the missing data from the down drive so the system can remain operational. With a rank of 4 disks, if a disk fails, any missing data block may be reconstructed using the other 3 disks.

Disk Allocation in Teradata

The operating system, PDE, and the Teradata Database do not recognize the Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 22

physical disk hardware. Each software component recognizes and interacts withdifferent components of the data storage environment:

Operating system: Recognizes a logical unit (LUN). The operating system recognizes the LUN as its "disk," and is not aware that it is actually writing tospaces on multiple disk drives. This technique enables the use of RAIDtechnology to provide data availability without affecting the operating system.

PDE: Translates LUNs into vdisks using slices (in UNIX) or partitions (in MicrosoftWindows and Linux) in conjunction with the Teradata Parallel Upgrade Tool.

Teradata Database: Recognizes a virtual disk (vdisk). Using vdisks instead ofdirect connections to physical disk drives enables the use of RAID technologywith the Teradata Database.

Pdisks: User Data Space 

Space on the physical disk drives is organized into LUNs ,After a LUN iscreated, it is divided into partitions.



In UNIX systems, a LUN consists of one partition, which is further dividedinto slices: o

Boot slice (a very small slice, taking up only 35 sectors)

o

User slices for storing data. These user slices are called "pdisks" in theTeradata Database.


Page 23

o

In

summary,

pdisks

are

the

user

slices

(UNIX),

partitions(Microsoft Windows), or partitions (Linux) and are usedfor storage of the tables in a database. A LUN may haveone or more pdisks.

Vdisks

The pdisks (user slices or partitions, depending on the operating system) are assigned to an AMP through the software. No cabling is involved.

The combined space on the pdisks is considered the AMP's vdisk. AnAMP manages only its own vdisk (disk space assigned to it), not thevdisk of any other AMP. All AMPs then work in parallel, processing theirportion of the data.

Each

AMP

in

the

system

is

assigned

one

vdisk.

Although

numerousconfigurations are possible, generally all pdisks from a rank (RAID 5) ormirrored pair (RAID 1) are assigned to the same AMP for optimalperformance.

However, an AMP recognizes only the vdisk. The AMP has no controlover the physical disks or ranks that compose the vdisk

Fall Back

Fallback provides data protection at the table level by automatically storing a Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 24

copy of each permanent data row of a table on a different or ―fallback‖

AMP. If an AMP fails, the Teradata Database can access the fallback copy and continue operation. If you cluster your AMPs, fallback also provides for automatic recovery of the down AMP once you bring it back online

The benefits are • Permits access to table data when an AMP is offline. • Adds a level of data protection beyond disk array RAID. • Automatically applies changes to the offline AMP when it is back online.

The disadvantage of fallback is that this method doubles the storage space and the I/O (on inserts, updates, and deletes) for tables.

Clique:



A clique is a collection of nodes with shared access to the same disk

arrays. Each multi-nodesystem has at least one clique.



Nodes are interconnected via the BYNET. Nodes and disks are

interconnected via shared busesand thus can communicate directly.Whilethe shared access is defined to the configuration, it is not activelyusedwhen the Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 25

systemis up and running. On a running system, each rankof disks is addressed by exactly one node.



The shared access allows the system to continue operating during a node

failure. The vprocsremain operational and can access stored data.



If a node fails and then resets: o

Teradata Database restarts across all the nodes.

o

Teradata Database recovers, the BYNET redistributes the vprocs of the node to theothernodes within the clique.

o

Processing continues while the node is being repaired.

Clustering

Clustering provides data protection at the system level. A cluster is a logical group of AMPs that provide fallback capability . If an AMP fails, the remainingAMPs in the same cluster do their own work plus the work of the down AMP.Teradata recommends the cluster size of 2.


Page 26

Although AMPs are virtual processes and cannot experience a hardware failure, they can be ―down‖ if the AMP cannot get to the data on the disk

array. If two disks in a rank go down, an AMP will be unable to access its data, which is the only situation where an AMP will stay down.

AMP Clustering and Fallback

If the primary AMP fails, the system can still access data on the fallback AMP.This ensures that one copy of a row is available if one or more hardware orsoftware failures occur within an entire array, or an entire node.

The following figure illustrates eight AMPs grouped into two clusters of fourAMPs each. In this configuration, if AMP 3 (or its vdisk) fails and stays offline, itsdata remains available on AMPs 1, 2, and 4. Even if AMPs 3 and 5 failsimultaneously and remain offline, the data for each remains available on the other AMPs in its cluster.

Other AMPs in its cluster. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 27

Down AMP Recovery Journal

The DownAMP Recovery Journal provides automatic data recovery on fallback-protected data tables when a clustered AMP is out of service. This journal

consists

of

two

system

files

stored

in

user

DBC:

DBC.ChangedRowJournal and DBC.OrdSysChngTable.

When a clustered AMP is out of service, the Down AMP Recovery Journal automatically captures changes to fallback-protected tables from the other Amps in the cluster

Each time a change is made to a fallback-protected row that has a copy that resides on a down AMP, the Down AMP Recovery Journal stores the table ID and row ID of the committed changes. When the AMP comes back online, Teradata Database opens the Down AMP Recovery Journal to update, or roll forward, any changes made while the AMP was down.

The recovery operation uses fallback rows to replace primary rows and primary rows to replace fallback rows. The journal ensures that the information on the fallback AMP and on the primary AMP is identical. Once Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 28

the transfer of information is complete and verified, the Down AMP Recovery Journal is discarded automatically.

Transient Journal

The Teradata Database system offers a variety of methods to protect data.Some data protection methods require that you set options when you createtables such as specifying fallback. Other methods are automatically activated when particular events occur in the system. Each data protection techniqueoffers different types of advantages under different circumstances. The followinglist describes a few of automatic data protection methods:

• The Transient Journal (TJ) automatically protects data by storing the image

ofan existing row before a change is made, or the ID of a new row after an insertis made. It enables the snapshot to be copied back to, or a new row to bedeleted from, the data table if a transaction fails or is aborted.The TJ protects against failures that may occur during transaction processing.To safeguard the integrity of your data, the TJ stores:

• A snapshot of a row before an UPDATE or DELETE • The row ID after an INSERT • A control record for each CREATE and DROP statement • Control records for certain operations


Page 29

Permanent journal



Is active continuously



Is available for tables or databases



Can contain "before" images, which permit rollback, or after images, which permit rollforward, or both before and after images



Provides rollforward recovery



Provides rollback recovery



Provides full recovery of nonfallback tables



Reduces need for frequent, full-table archives

Teradata Storage and retrival Architectures.

Request Processing

1. SQL request is sent from the client to the appropriate component on the node: a. Channel-attached client: request is sent to Channel Driver (through the TDP). b. Network-attached client: request is sent to Teradata Gateway (through CLIv2 or ODBC). 2. Request is passed to the PE(s). 3. PEs parse the request into AMP steps. 4. PE Dispatcher sends steps to the AMPs over the BYNET. 5. AMPs perform operations on data on the vdisks. 6. Response is sent back to PEs over the BYNET. 7. PE Dispatcher receives response. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 30

8. Response is returned to the client (channel-attached or network-attached).

Parsing Engine Request Processing

The SQL parser handles all incoming SQL requests. It processes an incomingrequest as follows:

Stage 1: The Parser looks in the Request cache to determine if the requestis

already there. IF the request is…

THEN the Parser…

in the Request cache

Reuses the plastic steps found in thecache and passes them togncApply. Go to step 8 afterchecking access rights (step 4).

not in the Request Begins processing the request withthe Syntaxer. cache

Stage 2: The Syntaxer checks the syntax of an incoming request. IF there are…

THEN the Syntaxer…

no errors

converts the request to a parse treeand passes it to the Resolver.

errors

passes an error message back to therequestor and stops.

Stage 3 :The Resolver adds information from the Data Dictionary (or cached

copy ofthe information) to convert database, table, view, stored procedure, andmacro names to internal identifiers. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 31

Stage 4: The Parser looks in the Request cache to determine if the requestis

already there. IF the access rights are…

THEN the Security module…

valid

passes the request to the Optimizer

not valid

aborts

the

request and

passes

anerror

message and stops.

Stage 5: The Optimizer determines the most effective way to implement the

SQLrequest.

Stage 6: The Optimizer scans the request to determine where locks should

be placed,then passes the optimized parse tree to the Generator.

Stage 7: The Generator transforms the optimized parse tree into plastic steps

andpasses them to gncApply.Plastic steps are directives to the database management system that do notcontain data values.

Stage 8 :gncApply takes the plastic steps produced by the Generator and

transformsthem into concrete steps.Concrete steps are directives to the AMPs that contain any needed user- orsession-specific values and any needed data parcels.

Stage 9: gncApply passes the concrete steps to the Dispatcher.

The Dispatcher Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 32

The Dispatcher controls the sequence in which steps are executed. It also passesthe steps to the BYNET to be distributed to the AMP database managementsoftware as follows:

Stage 1: The Dispatcher receives concrete steps from gncApply.

Stage2:The Dispatcher places the first step on the BYNET; tells the BYNET

whetherthe step is for one AMP, several AMPS, or all AMPs; and waits for acompletion response.

Whenever possible, the Teradata RDBMS performs steps in parallel toenhance performance. If there are no dependencies between a step and thefollowing step, the following step can be dispatched before the first stepcompletes, and the two will execute in parallel. If there is a dependency, forexample, the following step requires as input data that is produced by thefirst step, then the following step can't be dispatched until the first stepcompletes.

Stage 3:

The Dispatcher receives a completion response from all expected AMPsand places the next step on the BYNET. It continues to do this until all theAMP steps associated with a request are done.

The AMPs


Page 33

The AMPs are responsible for obtaining the rows required to process therequests (assuming that the AMPs are processing a SELECT statement). TheBYNET system controls the transmission of messages to and from the AMPs.An AMP step can be sent to one of the following: 

One AMP



A selected set of AMPs, called a dynamic BYNET group



All AMPs in the system

Teradata SQL Reference.

Data Definition Language (DDL) – Defines database structures (tables, users, views, macros, triggers, etc.)

CREATE

REPLACE

DROP

ALTER

Data Manipulation Language (DML) – Manipulates rows and data values

SELECT

INSERT

UPDATE

DELETE

Data Control Language (DCL) – Grants and revokes access rights

GRANT

REVOKE

Teradata Extensions to SQL HELP

SHOW

EXPLAIN

CREATE SET TABLE Per_DB.Employee, FALLBACK , Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 34

NO BEFORE JOURNAL, NO AFTER JOURNAL (

employee_number INTEGER NOT NULL, dept_number SMALLINT, job_code INTEGER COMPRESS , first_name VARCHAR(20) NOT CASESPECIFIC, birth_date DATE FORMAT 'YYYY-MM-DD', salary_amount DECIMAL(10,2))

UNIQUE PRIMARY INDEX ( employee_number ) INDEX ( dept_number);

Views

Views are pre-defined subsets of existing tables consisting of specified columns and/or rows from the table(s).

A single table view: 

is a window into an underlying table



allows users to read and update a subset of the underlying table



has no data of its own

CREATE VIEW Emp_403 AS SELECT employee_number, epartment_number, last_name, first_name, hire_date ROM Employee WHERE department_number = 403.

CREATE VIEW EmpDept AS SELECT last_name, department_name FROM Employee E INNER JOIN Department D ON E.department_number = D.department_number ; Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 35

MACRO

A MACRO is a predefined set of SQL statements which is logically stored in a database. Macros may be created for frequently occurring queries of sets of operations. Macros have many features and benefits: •Simplify end-user access •Control which operations may be performed by users •May accept user -provided parameter values •Are stored on the RDBMS, thus available to all clients •Reduces query size, thus reduces LAN/channel traffic •Ar e optimized at execution time •May contain multiple SQL statements

To create a macro: CREATE MACRO Customer_List AS (SELECT customer_name FROM Customer;);

To Execute a macro: EXEC Customer_List;

To replace a macro: REPLACE

MACRO

Customer_List

AS

(SELECT

customer_name,

customer_number FROM Customer;);


Page 36

INSERT INTO target_table SELECT * FROM source_table;

INSERT INTO birthdays SELECT

employee_number, ast_name, irst_name, birthdate

FROM

employee;

UPDATE T1 FROM (SELECT t2_1, MIN(t2_2) from T2 group by 1) as D (D1,D2) SET Field2 = D2 WHERE Field1 = D1

Temporary Tables

There are three types of temporary tables implemented in Teradata: 

Global



Volatile



Derived

Derived Tables

Derived tables were introduced in Teradata V2R2. Some characteristics of a derived table include: 

Local to the query - it exists for the duration of the query.



When the query is done the table is discarded.



Incorporated into SQL query syntax.



Spool rows are also discarded when query finishes.



There is no data dictionary involvement - less system overhead.


Page 37

Volatile Temporary Tables

Volatile tables have a lot of the advantages of derived tables, and additional benefits such as: 

Local to a session - it exists throughout the entire session, not just a single query.



It must be explicitly created using the CREATE VOLATILE TABLEsyntax.



It is discarded automatically at the end of the session.



There is no data dictionary involvement.

Global Temporary Tables

The major difference between a global temporary table and a volatile temporary table is that the global table has a definition in the data dictionary, thus the definition may be shared by many users. Each user session can materialize its own local instance of the table. Attributes of a global temporary table include: 

Local to a session, however each user session may have its own instance.



Uses CREATE GLOBAL TEMPORARY TABLE syntax.



Materialized instance of table discarded at session end.

Creates and keeps table definition in data dictionary. Eg derived table

To get the top three selling items across all stores. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 38

Solution

SELECT t.prodid, t.sumsales, RANK(t.sumsales)FROM (SELECT prodid, SUM(sales) FROM salestblGROUP BY 1) AS t(prodid, sumsales)QUALIFY RANK(sumsales)<=3; Result

prodid Sumsales

Rank

A

170000.00

1

C

115000.00

2

D

110000.00

3

Some things to note about the above query include: 

The name of the Derived table is 't'.



The Derived column names are 'prodid' and 'sumsales'.



The table is created in spool using the inner SELECT.



The SELECT statement is always in parenthesis following the FROM clause.

Derived tables are a good choice if: 

The temporary table is required for this query but no others.

The query will be run only one time with this data.


Page 39

Volatile Temporary Tables

Volatile temporary tables are similar to derived tables in that they: 

Are materialized in spool.



Require no Data Dictionary access or transaction locks.



Have a table definition that is kept in cache.



Are designed for optimal performance.

They are different from derived tables in that they: 

Are local to the session, not the query.



Can be used with multiple queries in the session.



Are dropped manually anytime or automatically at session end.



Must be explicitly created with the CREATE VOLATILE TABLE statement.

Example

CREATE VOLATILE TABLE vt_deptsal, LOG (deptno SMALLINT,avgsal DEC(9,2),maxsal DEC(9,2) ,minsal DEC(9,2),sumsal DEC(9,2),empcnt SMALLINT) ON COMMIT PRESERVE ROWS; In the example above, we stated ON COMMIT PRESERVE ROWS. This statement allows us to use the Volatile table again for other queries in the session. The default statement is ON COMMIT DELETE ROWS, which means the data is deleted when the query is committed. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 40

LOG indicates that a transaction journal is maintained, while NO LOG allows for better performance. LOG is the default. Volatile tables do not survive a system restart.

Examples

CREATE VOLATILE TABLE username.table1 CREATE VOLATILE TABLE table1 CREATE

VOLATILE

(Explicit)

(Implicit) TABLE

databasename.table1

(Error if databasename not username) L imi tations on Volatil e Tables

The following commands are not applicable to VT's: 

COLLECT/DROP/HELP STATISTICS



CREATE/DROP INDEX



ALTER TABLE



GRANT/REVOKE privileges



DELETE DATABASE/USER (does not drop VT's)

VT's may not: 

Use Access Logging.



Be Renamed.



Be loaded with Multiload or Fastload utilities.

VT's may be referenced in views and macros Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 41

Example

CREATE MACRO vt1 AS (SELECT * FROM vt_deptsal;); Session A

Session B

EXEC vt1

EXEC vt1

Each session has its own materialized instance of vt_deptsal, so each session may return different results. VT's may be dropped before session ends Example

DROP TABLE vt_deptsal; Global Temporary Tables

Global Temporary Tables are created using the CREATE GLOBAL TEMPORARY command. They require a base definition which is stored in the Data Dictionary(DD). Global temporary tables are materialized by the first SQL statement from the following list to access the table: 

CREATE INDEX.... ON TEMPORARY.......



DROP INDEX.... ON TEMPORARY.......



COLLECT STATISTICS



DROP STATISTICS



INSERT



INSERT SELECT


Page 42

Global Temporary Tables are different from Volatile Tables in that: 

Their base definition is permanent and kept in the DD.



They require a privilege to materialize the table (see list above).



Space is charged against the user's 'temporary space' allocation.



The User can materialize up to 32 global tables per session.



They can survive a system restart.

Global Temporary Tables are similar to Volatile Tables because: 

Each instance of a global temporary table is local to a session.



Materialized tables are dropped automatically at the end of the session. (But the base definition is still in the DD)



They have LOG and ON COMMIT PRESERVE/DELETE options.



Materialized table contents are not sharable with other sessions.

Example

CREATE GLOBAL TEMPORARY TABLE gt_deptsal (deptno SMALLINT,avgsal DEC(9,2),maxsal DEC(9,2) ,minsal DEC(9,2),sumsal DEC(9,2),empcnt SMALLINT); The ON COMMIT DELETE ROWS clause is the default, so it does not need to appear in the CREATE TABLE statement. If you want to use the command ON COMMIT PRESERVE ROWS, you must specify that in the CREATE TABLE statement. With global temporary tables, the base table definition is stored in the Data Dictionary. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 43

ALTER TABLE may also be used to change the defaults. Creating Tables Using Subqueries

Subqueries may be used to limit column and row selection for the target table. Consider the employee table: SHOW TABLE employee; CREATE SET TABLE Customer_Service.employee ,FALLBACK , O BEFORE JOURNAL, O AFTER JOURNAL ( employee_number INTEGER, manager_employee_number INTEGER, department_number INTEGER, ob_code INTEGER, last_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC OT NULL, first_name

VARCHAR(30)

CHARACTER

SET

LATIN

NOT

CASESPECIFIC NOT NULL, hire_date DATE FORMAT 'YY/MM/DD' NOT NULL, birthdate DATE FORMAT 'YY/MM/DD' NOT NULL, salary_amount DECIMAL(10,2) NOT NULL) UNIQUE PRIMARY INDEX ( employee_number );


Page 44

Example

This example uses a subquery to limit the column choices. CREATE TABLE emp1 AS (SELECT employee_number ,department_number ,salary_amount FROM employee) WITH NO DATA; SHOW TABLE emp1; CREATE SET TABLE Customer_Service.emp1 ,NO FALLBACK , O BEFORE JOURNAL, O AFTER JOURNAL ( employee_number INTEGER, department_number INTEGER, salary_amount DECIMAL(10,2) NOT NULL) PRIMARY INDEX ( employee_number ); Note: When the subquery form of CREATE AS is used: 

Table attributes (such as FALLBACK) are not copied from the source table.



Table attributes are copied from standard system defaults (e.g., NO FALLBACK) unless otherwise specified.



Secondary indexes, if present, are not copied from the source table.


Page 45



The first column specified (employee_number) is created as a NUPI unless otherwise specified

There are some limitations on the use of subqueries for table creation: 

The ORDER BY clause is not allowed.



All columns or expressions must have an assigned or defaulted name.

Renami ng Colu mns

Columns may be renamed using the AS clause (the Teradata NAMED extension may also be used). Example

This example changes the column names of the subset of columns used for the target table. CREATE TABLE emp1 AS (SELECT employee_number AS emp ,department_number AS dept ,salary_amount AS sal FROM employee) WITH NO DATA;


Page 46

HELP Command

HELP DATABASE

databasename;

HELP USER

username;

HELP TABLE

tablename;

HELP VIEW

viewname;

HELP MACRO

macroname;

HELP COLUMN

table or viewname.*; (all columns)

HELP INDEX

tablename;

HELP STATISTICS

tablename;

HELP JOIN INDEX

join_indexname;

HELP TRIGGER

triggername;

Th e SH OW Command

The SHOW command displays the current Data Definition Language (DDL) of a database object (e.g., Table, View, Macro, Trigger, Join Index or Stored Procedure). The SHOW command is used primarily to see how an object was created. Command

Returns

SHOW TABLE tablename;

CREATE TABLE statement

SHOW VIEW viewname;

CREATE VIEW statement

SHOW MACRO macroname;

CREATE MACRO statement


Page 47

The EXPL AI N Command

The EXPLAIN function looks at a SQL request and responds in English how the optimizer plans to execute it. It does not execute the statement and is a good way to see what database resources will be used in processing your request. For instance, if you see that your request will force a full-table scan on a very large table or cause a Cartesian Product Join, you may decide to rewrite a request so that it executes more efficiently. EXPLAIN provides a wealth of information, including the following: 1.) Which indexes if any will be used in the query. 2.) Whether individual steps within the query may execute concurrently (i.e. parallel steps). 3.) An estimate of the number of rows which will be processed. 4.) An estimate of the cost of the query (in time increments).

EXPLAIN SELECT * FROM department;

***QUERY

COMPLETED.10

ROWS

FOUND.1

COLUMN

RETURNED.***

Explanation

1. First, we lock a distinct CUSTOMER_SERVICE."pseudo table" for read

on

a

RowHash

to

prevent

global

deadlock

for

CUSTOMER_SERVICE.department. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 48

2. Next, we lock CUSTOMER_SERVICE.department for read. 3. We

do

an

all-AMPs

RETRIEVE

step

from

CUSTOMER_SERVICE.department by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 4 rows. The estimated time for this step is 0.15 seconds. 4. Finally, we send out an END TRANSACTION step to all AMPs involved

in

processing

the

request.

-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.15 seconds.

BETWEEN

The BETWEEN operator looks for values between the given lower limit and given upper limit as well as any values that equal either or (BETWEEN is inclusive.) Example

Select the name and the employee's manager number for all employees whose job codes are in the 430000 range. SELECT

first_name

,last_name

,manager_employee_number FROM

employee job_code

WHERE

BETWEEN

430000 AND 439999;


Page 49

I N Clause

Use the IN operator as shorthand for when multiple values are to be tested. Select the name and department for all employees in either department 401 or 403. This query may also be written using the OR operator which we shall see shortly. SELECT

first_name

FROM

,last_name

employee

department_number

IN

,department_number

WHERE (401, 403);

NOT I N Clause

Use the NOT IN operator to locate rows for which a column does not match any of a set of values. Specify the set of values which disqualifies the row. SELECT

first_name

FROM

,last_name

employee

department_number

NOT

,department_number

WHERE IN

(401, 403);

Using NUL L

Use NULL in a SELECT statement, to define that a range of values either IS NULL or IS NOT NULL. SELECT

employee_number

WHERE

extension

FROM IS

employee_phone NULL;

L I KE Operator

The LIKE operator searches for patterns matching character data strings. String pattern example:

Meaning:


Page 50

LIKE 'JO%'

begins with 'JO'

LIKE '%JO%'

contains 'JO' anywhere

LIKE '__HN'

contains 'HN' in 3rd and 4th position

LIKE '%H_'

contains 'H' in next to last position

ADD_MONTHS

The ADD_MONTHS function allows the addition of a specified number of months to an existing date, resulting in a new date. Query

Results

SELECT DATE; /* March 20, 2001 */

01/03/20

SELECT ADD_MONTHS (DATE, 2)

2001-05-20

SELECT ADD_MONTHS (DATE, 12*14)

2015-03-20

SELECT ADD_MONTHS (DATE, -3)

2000-12-20

Data Conver sions Using CA ST

The CAST function allows you to convert a value or expression from one data type to another. SELECT CAST (50500.75 AS INTEGER); Result: 50500 (truncated). SELECT CAST (50500.75 AS DEC (6,0)); Result: 50501. (rounded). SELECT CAST(6.74 AS DEC(2,1)); Result: 6.7 (Drops precision) SELECT CAST(6.75 AS DEC(2,1)); Result: 6.8 (Rounds up to even number) Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 51

SELECT CAST(6.85 AS DEC(2,1)); Result: 6.8 (Rounds down to even number) Attributes and Functions 

Use TITLE to add a heading to your output that differs from the column or expression name.



Use AS to specify a name for a column or expression in a SELECT statement.



Use CHARACTERS to determine the number of characters in a string.



Use TRIM to Trim blank characters or binary zeroes from data.

Use FORMAT to alter the display of a column or expression. Attributes are characteristics which may be defined for columns, such as

titlesand

formats.

Functions are performed on columns to alter their contents in some way.

Expressions are columns and/or values combined with mathematical

operators.

(i.e.

Col1

+

Col2

+

3)

Attributes for columns and expressions include the following:

AS

Provides a new name for a column. ANSI

TITLE

Provides a title for a column.

Teradata Extension


Page 52

FORMAT Provides formatting for a column.

Teradata Extension

Functions for columns and expressions include the following:

CHARACTERS

TRIM

Count the number of characters in a Teradata column.

Extension

Trim the trailing or leading blanks or ANSI binary zeroes from a column.

Aggr egate Oper ators

Aggregate operators perform computations on values in a specified group. The five aggregate operators are: ANSI Standard

Teradata Supported

COUNT

COUNT

SUM

SUM

AVG

AVERAGE, AVG

MAX

MAXIMUM, MAX

MIN

MINIMUM, MIN

AGGREGATE operations ignore NULLs and produce ONLY single-line answers. Example

SELECT

COUNT

( salary_amount )

(TITLE 'COUNT')


Page 53

,SUM ( salary_amount )

(TITLE 'SUM SALARY')

,AVG ( salary_amount )

(TITLE 'AVG SALARY')

,MAX ( salary_amount )

(TITLE 'MAX SALARY')

,MIN

(TITLE 'MIN SALARY')

( salary_amount )

FROM employee ; Result

COUNTSUM SALARYAVG SALARYMAX SALARYMIN SALARY 6

213750.00

35625.00

49700.00

29250.00

NOTE: If one salary amount value had been NULL, the COUNT would

have returned a count of 5. In this case, the average would have reflected an average of only five salaries. To COUNT all table rows use COUNT (*), which will count rows regardless of the presence of NULLs. Aggr egation usin g GROUP B Y

To find the total amount of money spent by each department on employee salaries. Without the GROUP BY clause, we could attempt to get an answer by running a separate query against each department. GROUP BY provides the answer with a single query, regardless of how many departments there are. SELECT department_number

,SUM (salary_amount) FROM employee

GROUP BY department_number department_number

;

Sum(salary_amount)

401

74150.00

403

80900.00

301

58700.00


Page 54

GROUP BY and ORDER B Y

GROUP BY does not imply any ordering of the output. An ORDER BY clause is needed to control the order of the ouput. GROUP BY and HAV I NG Condition

HAVING is just like WHERE , except that it applies to groups rather than rows. HAVING qualifies and selects only those groups that satisfy a conditional expression. GROUP BY Summar y

Here is the order of evaluation within a SQL statement if all four clauses are present: WHERE 

Eliminates some or all rows immediately based on condition.



Only rows which satisfy a WHERE condition are eligible for inclusion in groups.

GROUP BY 

Puts qualified rows into desired groupings.

HAVING 

Eliminates some (or all) of the groupings based on condition.


Page 55

ORDER BY 

Sorts

final

groups

for

output.

(ORDER BY is not implied by GROUP BY) Using WITH...BY

The WITH...BY clause is a Teradata extension that creates subtotal lines for a detailed list. It differs from GROUP BY in that detail lines are not eliminated. The WITH...BY clause allows subtotal "breaks" on more than one column and generates an automatic sort on all "BY" columns. SELECT last_name AS NAME, salary_amount AS SALARY ,department_number AS DEPT FROM employee WHERE employee_number BETWEEN 1003 AND 1008 WITH SUM(salary)(TITLE 'Dept Total'), AVG(salary)(TITLE 'Dept Avg ')BY DEPT;

Result

NAME

SALARY

DEPT

Stein

29450.00

301

Kaniesk

29250.00

301

-----------Dept Total

58700.00

Dept Avg

29350.00


Page 56

Johnson

36300.00

401

Trader

37850.00

401

-----------Dept Total Dept Avg

74150.00 37075.00

CHARACTERS Function

The CHARACTERS function is a Teradata-specific function which counts the number of characters in a string. It is particularly useful for working with VARCHAR fields where the size of the string can vary from row to row. To find all employees who have more than five characters in their first name. Solution

SELECT

first_name

FROM

employee

WHERE

CHARACTERS (first_name) > 5; TRIM Function

Use the TRIM function to suppress leading and/or trailing blanks in a CHAR column or leading and/or trailing binary zeroes in a BYTE or VARBYTE column. TRIM is most useful when performing string concatenations. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 57

There are several variations of the TRIM function: TRIM ([expression])

leading and trailing blanks/binary

zeroes TRIM (BOTH FROM [expression])

leading and trailing blanks/binary

zeroes TRIM (TRAILING FROM[expression]) trailing blanks/binary zeroes TRIM (LEADING FROM[expression])

leading blanks/binary zeroes

Solution 1

SELECT

first_name

,last_name (TITLE 'last')

FROM

employee

WHERE

CHAR (TRIM (TRAILING FROM last_name)) = 4;

Solution 2

SELECT

first_name

,last_name (TITLE 'last')

FROM

employee

WHERE CHAR(TRIM(last_name))=4; TRIM with Concatenation

The || (double pipe) symbol is the concatenation operator that creates a new string from the combination of the first string followed by the second. Example

1:

Concatenating of literals without the TRIM function: SELECT

'Jones' || ',' || 'Mary' AS Name;

Name -----------------------------Jones , Mary Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 58

TRIM with Other Characters Example 1:

SELECT TRIM(BOTH '?' FROM '??????PAUL??????') AS Trim_String; Trim_String ---------------PAUL

Example 2:

SELECT

TRIM(LEADING

'?'

FROM

'??????PAUL??????')

AS

'?'

FROM

'??????PAUL??????')

AS

Trim_String; Trim_String ---------------PAUL?????? Example 3:

SELECT

TRIM(TRAILING

Trim_String; Trim_String ---------------??????PAUL Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 59

FORMAT Phrase

The FORMAT phrase can be used to format column output and override the default format. For example: SELECT

salary_amount (FORMAT '$$$,$$9.99')

WHERE

employee_number

=

FROM

employee

1004;

Some Examples

FORMAT

'999999'

Data: 08777

Result: 008777

FORMAT

'ZZZZZ9'

Data: 08777

Result: 8777

FORMAT

'999-9999'

Data: 6495252 Result: 649-5252

FORMAT

'X(3)'

Data: 'Smith'

Result: Smi

FORMAT

'$$9.99'

Data: 85.65

Result: $85.65

FORMAT

'999.99'

Data: 85.65

Result: 085.65

FORMAT

'X(3)'

Data: 85.65

Result: Error

String Functions

Several functions are available for working with strings in SQL. Also, the concatenation operator is provided for combining strings. The string functions and the concatenation operator are listed here. String Operator

||

Description

Concatenates (combines) character strings together.

SUBSTRING

Obtains a section of a character string.

INDEX

Locates a character position in a string.


Page 60

TRIM *

Trims blanks from a string.

UPPER

Converts a string to uppercase.

SELECT SUBSTRING ('catalog' FROM 5 for 3); Result 'log' SELECT SUBSTR ('catalog', 5,3); Result 'log'

SUBSTRING

SUBSTR

Result

Result

SUBSTRING(‗catalog‘

FROM

5 ‗log‘

‗log‘

FROM

0 ‗ca‘

‗ca‘

FOR 4) SUBSTRING(‗catalog‘

FOR 3) SUBSTRING(‗catalog‘ FROM -1 ‗c‘

‗c‘


FROM

8 0 length string 0 length string

FROM

1 0 length string 0 length string

FROM

5 error



error

FOR -2) SUBSTRING(‗catalog‘ FROM 0)

‗catalog‘

‗catalog‘


Page 61

SUBSTRING(‗catalog‘ FROM 10)

0 length string 0 length string

SUBSTRING(‗catalog‘ FROM -1)

0 length string 0 length string

SUBSTRING(‗catalog‘ FROM 3)

‗talog‘

‗talog‘

COALESCE Function

Normally, concatenation of any string with a null produces a null result. The COALESCE Function allows values to be substituted for nulls. (The COALESCE function is described in more detail in Level 3 Module 6.) Example: Assume col1 = 'a', col2 = 'b' SELECT

col1 | | col2 From tblx; Result is: 'ab'

If either column contains a null, the result is null. Solution: Assume col1 = 'a', col2 = null SELECT

col1 | | (COALESCE (col2,'x')) FROM tblx; Result is: 'ax'

INDEX Function

The INDEX function locates a character position in a string. SELECT INDEX ('cat', 't');

returns 3

SELECT INDEX ('Adams', 'a');

returns 1


Page 62

SELECT INDEX ('dog', 'e');

returns 0

DATE Formats SYNTAX

RESULT

FORMAT 'YYYY/MM/DD‘

1996/03/27

FORMAT 'DDbMMMbYYYY'

27 Mar 1996

FORMAT 'mmmBdd,Byyyy' 'mmmBdd,Byyyy'

Mar 27, 1996

FORMAT 'DD.MM.YYYY'

27.03.1996

SELECT

last_name

,first_name

,hire_date

(FORMAT

'mmmBdd,Byyyy') FROM

employee

ORDER BY

last_name;

last_name

first_name

hire_date

Johnson

Darlene

Oct 15, 1976

Kanieski

Carol

Feb 01, 1977

Ryan

Loretta

Oct 15, 1976

Extracting Portions of DATEs

The EXTRACT function allows for easy extraction of year, month and day from any DATE data type. The following examples demonstrate its usage. Query

Result

SELECT DATE; /* March 20,2001 */

01/03/20 (Default format)

SELECT EXTRACT(YEAR FROM DATE); 2001 Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Ameerpet, Hyderabad. ph-8374187525

Page 63

SELECT

EXTRACT(MONTH

FROM

DATE); SELECT EXTRACT(DAY FROM DATE);

03 20

Date arithmetic may be applied to the date prior to the extraction. Added values always represent days. Query

Result

SELECT EXTRACT(YEAR FROM DATE + 365);

2002

SELECT EXTRACT(MONTH FROM DATE + 30);

04

SELECT EXTRACT(DAY FROM DATE + 12);

01

Extracting From Current Time

The EXTRACT function may also be applied against the current time. It permits extraction extraction of hours, minutes minutes and seconds. Query

Result

SELECT TIME; /* 2:42 PM */

14:42:32 (Default format)

SELECT EXTRACT(HOUR FROM TIME);

14

SELECT EXTRACT(MINUTE FROM TIME);

42

SELECT EXTRACT(SECOND FROM TIME);

32

Set Operator s

The following are graphic representations of the three set operators, INTERSECT, UNION and EXCEPT Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Ameerpet, Hyderabad. ph-8374187525

Page 64

The INTERSECT operator returns rows from multiple sets which share some criteria in common. SELECT manager_employee_number FROM employee INTERSECT SELECT

manager_employee_number

FROM

department

ORDER

BY 1;

Results

manager_employee_number 801 1003 1005 1011 The UNION operator operator returns all rows from multiple sets, displaying duplicate rows only once. SELECT

first_name

,last_name

,'employee'

(TITLE

'employee//type') FROM employee

WHERE

manager_employee_number manager_employee_number = 1019

UNION SELECT

first_name

,last_name

WHERE

employee_number employee_number = 1019

,' manager '

FROM

ORDER BY

employee

2

The EXCEPT operator subtracts the contents of one set from the contents of another. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Ameerpet, Hyderabad. ph-8374187525

Page 65

SELECT manager_employee_number FROM department EXCEPT SELECT

manager_employee_number manager_employee_number

FROM

employee

ORDER BY

1;

Result

manager_employee_number 1016 1099 NOTE: Using the Teradata keyword keyword ALL in conjuction with the UNION

operator allows duplicate rows to remain in the result set. What is a Trigger?

A trigger is an object in a database, like a macro or view. A trigger is created with a CREATE TRIGGER statement and defines events that will happen when some other event, called a triggering event , occurs. A trigger consists of one or more SQL statements s tatements which are associated with a table and which are executed when the trigger is 'fired'. In summary, a Trigger is: 

One or more stored SQL statements associated with a table.



An event driven procedure attached to a table.



An object in a database, like tables, views and macros.


Page 66

Many of the DDL commands which apply to other database objects, also apply to triggers. All of the following statements are valid with triggers: 

CREATE TRIGGER



DROP TRIGGER



SHOW TRIGGER



ALTER TRIGGER



RENAME TRIGGER



REPLACE TRIGGER



HELP TRIGGER

Triggers may not be used in conjunction with: 

The FastLoad utility



The MultiLoad utility



Updatable Cursors (Stored Procedures or Preprocessor)



Join Indexes

To use the FastLoad or MultiLoad utilities, or to create stored procedures with updatable cursors (covered in a later module), you must first disable any triggers defined on the affected tables via an ALTER TRIGGER command. Join indexes are never permitted on tables which have defined triggers. You can drop all Triggers using: 

DELETE DATABASE



DELETE USER


Page 67

Privileges are required to CREATE and DROP Triggers: 

GRANT CREATE Trigger



GRANT DROP Trigger



REVOKE CREATE Trigger



REVOKE DROP Trigger

These

new

privileges

have

been

created

in

the

the

Data

Dictionary/Directory. Note: The Teradata implementation of triggers is updated with Release V2R5.1 (January 2004) to conform to the ANSI specification. The changes are fully demonstrated in Level 6, Module 15 of this SQL Webbased training. In the current module (Module 3), notation will be provided to indicated which features are no longer supported in V2R5.1. Triggered and Triggering Statements

A trigger is said to ‗fire‘ when the triggering event occurs and various conditions are met. When a trigger fires, it causes other events, called triggered events to occur. A triggered event consists of one or more

triggered statements. A triggering statement is an SQL statement which causes a trigger to fire. It is the 'launching' statement. Triggering statements may be any of the following: 

INSERT


Page 68



UPDATE



DELETE



INSERT SELECT

A triggered statement is the statement (or statements) which are executed as a result of firing the trigger. Triggered statements may be any of these: 

INSERT



UPDATE



DELETE



INSERT SELECT



ABORT/ROLLBACK



EXEC (macro)

A macro may only contain the approved DML statements. Triggered statements may never be any of these: 

BEGIN TRANSACTION



CHECKPOINT



COMMIT



END TRANSACTION



SELECT

You can do transaction processing in a triggered statement without using Begin Transaction/End Transaction (BTET). We will see how to do this later. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 69


Page 70

Trigger Types

There are two types of triggers: Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 71



ROW triggers



STATEMENT triggers

ROW triggers 

fire once for each row affected by the triggering statement.



reference OLD and NEW rows of the subject table.



permit only simple inserts, rollbacks, or macros containing them in a triggered statement.

STATEMENT triggers 

fire once per statement.



reference OLD_TABLE and NEW_TABLE subject tables.

Example 1 

CREATE TABLE tab1 (a INT, b INT, c INT);



CREATE TABLE tab2 (d INT, e INT, f INT);



CREATE TABLE tab3 (g INT, h INT, i INT);



Example 2



CREATE TRIGGER trig1 AFTER INSERT ON tab1



REFERENCING NEW_TABLE AS newtable



FOR EACH STATEMENT



(INSERT INTO tab2 SELECT a + 10, b + 10, c FROM newtable;);



Example 3



CREATE TRIGGER trig2 AFTER INSERT ON tab2



REFERENCING NEW_TABLE AS newtable



FOR EACH STATEMENT


Page 72

(INSERT INTO tab3 SELECT



d + 100, e + 100, f FROM

newtable;);

Example 4

INSERT INTO tab1 VALUES (1,2,3); SELECT * FROM tab1; a

b

c

-----------

-----------

-----------

1

2

3

SELECT * FROM tab2; d

e

f

-----------

-----------

-----------

11

12

3

SELECT * FROM tab3; g

h

i

-----------

-----------

-----------

111

112

3

RANDOM Function

The RANDOM function may be used to generate a random number between a specified range. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 73

RANDOM (Lower limit, Upper limit) returns a random number between the lower and upper limits inclusive. Both limits must be specified, otherwise a random number between 0 and approximately 4 billion is generated. Consider the department table, which consists of nine rows. SELECT department_number FROM department; department_number ----------------501 301 201 600 100 402 403 302 401 L imi tations tations On On U se Of RA ND OM 

RANDOM is non-ANSI standard



RANDOM may be used in a SELECT list or a WHERE clause, but not both



RANDOM may be used in Updating, Inserting or Deleting rows



RANDOM may not be used with aggregate or OLAP functions



RANDOM cannot be referenced by numeric position in a GROUP BY or ORDER BY clause


Page 74

Join processing: Inner Join

Suppose we need to display employee number, last l ast name, and department name for all employees. The employee number and last name come from the employee table. The department name comes from the department table. A join, by definition, is necessary whenever data is needed from more than one table or view, In order to perform a join, we need to find a column that both tables have in common. Fortunately, both tables have a department number column, which may be used to join the rows of both tables. Solution

SELECT

employee.employee_number

,department.department_nam ,department.department_namee

,employee.last_name

FROM employee

INNER

JOIN

department ON

employee.department_number employee.department_number = department.department_number department.department_number;;

employee_number last_name department_name

1006

Stein

research and development

1008

Kanieski

research and development

1005

Ryan

education

1004

Johnson

customer support

1007

Villegas

education


Page 75

1003

Trader

customer support

We fully qualified every column referenced in our SELECT statement to include the table that the column is in ( e.g., employee.employer_number). employee.employer_number). It is only necessary to qualify columns that have identical names in both tables (i.e., department_number). department_number). The ON clause is used to define the join condition used to link the two tables Cross Joins

A Cross Join is a join that requires no join condition (Cross Join syntax does not allow an ON clause). Each participating row of one table is joined with each participating row of another table. The WHERE clause restricts which rows participate from either table. SELECTe.employee_number,d.dep SELECTe.employee _number,d.department_num artment_numberFROM berFROM employeeeCROSS employeeeCRO SS JOINdepartm JOINdepartmentd entd WHEREe.employee_number=1008;

employee_number department_number

1008

301

1008

501

1008

402

1008

201


Page 76

1008

302

1008

600

1008

401

1008

100

1008

403

The employee table has 26 rows. The department table has 9 rows. Without the WHERE clause, we would expect that 26 x 9 = 234 rows in our result set. With the constraint that the employee_number must equal 1008 (which only matches one row in the employee table), we now get 1 x 9 = 9 rows in our result set. Cross Joins by themselves often do not produce meaningful results. This result shows employee 1008 associated with each department. This is not meaningful output.

Self Joins

A self join occurs when a table is joined to itself. Which employees share the same surname Brown and to whom do they report? SELECT

emp.first_name

(TITLE 'Emp//First Name')

,emp.last_name

(TITLE 'Emp//Last Name')

,mgr.first_name

(TITLE 'Mgr//First Name')

,mgr.last_name

(TITLE 'Mgr//Last Name')


Page 77

FROM employee emp INNER JOIN

employeemgr

ON emp.manager_employee_number = mgr.employee_number WHERE emp.last_name = 'Brown'; Results

Emp First Name

Emp Last Name

Mgr First Name

Mgr Last Name

Allen

Brown

Loretta

Ryan

Alan

Brown

James

Trader

Join Processing:

Rows must be on the same AMP to be joined. •If necessary, the system creates spool copies of one or both rows and

Moves them to a common AMP. •Join processing NEVER moves or changes the original table rows.

Typical kinds of joins are: •Merge Join •Product Join • Nested Join •Exclusion Join

The Optimizer chooses the best join strategy based on: Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 78

•Available Indexes •Demographics (Collected STATISTICS or Dynamic Sample)

EXPLAIN shows what kind of join a query uses. Join Redistribution:

The Primary Index is the major consideration used by the Optimizer in determining how to join two tables and deciding which rows to move. Three general scenarios may occur when two tables are to be Merge Joined:

1. The Join column(s) is the Primary Index of both tables (best case). 2. The Join column is the Primary Index of one of the tables. 3. The Join column is not a Primary Index of either table (worst case).

Nested Joins:

This is a special join case. •This is the only join that doesn't always use all of the AMPs. •It is the most efficient in terms of system resources. •It is the best choice for OLTP applications.

To choose a Nested Join, the Optimizer must have: – An equality value for a unique index (UPI or USI) on Table1. – A join on a column of that single row to any index on Table2. •The system retrieves the single row from Table1. •It hashes the join column value to access matching Table2 row(s). Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 79

Utilities: Bteq: Steps for submitting SQL in BTEQ’s Batch Mode

1. Invoke BTEQ 2. Type in the input file name 3. Type in the location and output file name. BTEQ is invoked and takes instructions from BatchScript.txt. The output file is called Output.txt.

a

file

called

C:/>BTEQ < BatchScript.txt > Output.txt BatchScript.txt File

Using BTEQ Conditional Logic

Below is a BTEQ batch script example. The initial steps of the script will establish the logon, the database, and the delete all the rows from the Employee_Table. If the table does not exist, the BTEQ conditional logic will instruct Teradata to create it. However, if the table already exists, then Teradata will move forward and insert data. .RUN FILE = mylogon.txt

Logon to Teradata

DATABASE SQL_Class;

Make the default database SQL_Class

DELETE FROM Employee_Table;

Deletes all the records from the Employee_Table.

.IF ERRORCODE = 0 THEN .GOTOINSEMPS

BTEQ conditional logic that will check to ensure that the


Page 80

/* ERRORCODE is a reserved word that contains the outcome status for every SQL statement executed in BTEQ. A zero (0) indicates that statement worked. */

delete worked or if the table even existed.

.LABEL INSEMPS

The Label INSEMPS provides code so the BTEQ Logic can go directly to inserting records into the Employee_Table.

INSERT INTO Employee_Table (1232578, 'Chambers', 'Mandee', 48850.00, 100); INSERT INTO Employee_Table (1256349, 'Harrison' ,'Herbert', 54500.00, 400); .QUIT

Using BTEQ to Export Data

BTEQ allows data to be exported directly from Teradata to a file on a mainframe or network-attached computer. In addition, the BTEQ export function has several export formats that a user can choose depending on the desired output. Generally, users will export data to a flat file format that is composed of a variety of characteristics. These characteristics include: field mode, indicator mode, or dif mode. Below is an expanded explanation of the different mode options. Format of the EXPORT command: .EXPORT {FILE | DDNAME } = [, LIMIT=n] Record Mode: (also called DATA mode): This is set by .EXPORT DATA. This will bring data back as a flat file. Each parcel will contain a complete record. Since it is not a report, there are no headers or white space between the data contained in each column and the data is written to the file (e.g., disk drive file) in native format. For example, this means that INTEGER data is written as a 4-byte binary field. Therefore, it cannot be read and understood using a normal text editor. Field Mode (also called REPORT mode): This is set by .EXPORT REPORT. This is the default mode for BTEQ and brings the data back as if Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 81

it was a standard SQL SELECT statement. The output of this BTEQ export would return the column headers for the fields, white space, expanded packed or binary data (for humans to read) and can be understood using a text editor. Indicator Mode: This is set by .EXPORT INDICDATA. This mode writes the data in data mode, but also provides host operating systems with the means of recognizing missing or unknown data (NULL) fields. This is important if the data is to be loaded into another Relational Database System (RDBMS).

The issue is that there is no standard character defined to represent either a numeric or character NULL. So, every system uses a zero for a numeric NULL and a space or blank for a character NULL. If this data is simply loaded into another RDBMS, it is no longer a NULL, but a zero or space. To remedy this situation, INDICATA puts a bitmap at the front of every record written to the disk. This bitmap contains one bit per field/column. When a Teradata column contains a NULL, the bit for that field is turned on by setting it to a ―1‖. Likewise, if the data is not NULL, the bit remains a zero. Therefore, the loading utility reads these bits as indicators of NULL data and identifies the column(s) as NULL when data is loaded back into the table, where appropriate. Since both DATA and INDICDATA store each column on disk in native format with known lengths and characteristics, they are the fastest method of transferring data. However, it becomes imperative that you be consistent. When it is exported as DATA, it must be imported as DATA and the same is true for INDICDATA. Again, this internal processing is automatic and potentially important. Yet, on a network-attached system, being consistent is our only responsibility. However, on a mainframe system, you must account for these bits when defining the LRECL in the Job Control Language (JCL). Otherwise, your length is too short and the job will end with an error. To determine the correct length, the following information is important. As mentioned earlier, one bit is needed per field output onto disk. However, computers allocate data in bytes, not bits. Therefore, if one bit is needed a minimum of eight (8 bits per byte) are allocated. Therefore, for every eight fields, the LRECL becomes 1 byte longer and must be added. In other Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 82

words, for nine columns selected, 2 bytes are added even though only nine bits are needed. With this being stated, there is one indicator bit per field selected. INDICDATA mode gives the Host computer the ability to allocate bits in the form of a byte. Therefore, if one bit is required by the host system, INDICDATA mode will automatically allocate eight of them. This means that from one to eight columns being referenced in the SELECT will add one byte to the length of the record. When selecting nine to sixteen columns, the output record will be two bytes longer. When executing on non-mainframe systems, the record length is automatically maintained. However, when exporting to a mainframe, the JCL (LRECL) must account for this addition length. DIF Mode: Known as Data Interchange Format, which allows users to export data from Teradata to be directly utilized for spreadsheet applications like Excel, FoxPro and Lotus.

The optional limit is to tell BTEQ to stop returning rows after a specific number (n) of rows. This might be handy in a test environment to stop BTEQ before the end of transferring rows to the file. Determining Out Record Lengths

Some hosts, such as IBM mainframes, require the correct LRECL (Logical Record Length) parameter in the JCL, and will abort if the value is incorrect. The following page will discuss how to figure out the record lengths. There are three issues involving record lengths and they are: Fixed columns Variable columns NULL indicators Fixed Length Columns: For fixed length columns you merely count the length of the column. The lengths are: INTEGER

4 bytes


Page 83

SMALLINT

2 bytes

BYTEINT

1 byte

CHAR(10)

10 bytes

CHAR(4)

4 bytes

DATE

4 bytes

DECIMAL(7,2)

4 bytes (packed data, total digits / 2 +1 )

DECIMAL(12,2)

8 bytes

Variable columns: Variable length columns should be calculated as the maximum value plus two. This two bytes is for the number of bytes for the binary length of the field. In reality you can save much space because trailing blanks are not kept. The logical record will assume the maximum and add two bytes as a length field per column. VARCHAR(8) 10 Bytes VARCHAR(10) 12 Bytes Indicator columns: As explained earlier, the indicators utilize a single bit for each field. If your record has 8 fields (which require 8 bits), then you add one extra byte to the total length of all the fields. If your record has 9 -16 fields, then add two bytes. BTEQ Return Codes

Return codes are two-digit values that BTEQ returns to the user after completing each job or task. The value of the return code indicates the completion status of the job or task as follows: Return Code Description

00 Job completed with no errors. 02 User alert to log on to the Teradata DBS. 04 Warning error. 08 User error. 12 Severe internal error. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 84

You can over-ride the standard error codes at the time you terminate BTEQ. This might be handy for debug purposes. The error code or ―return code‖ can be any number you specify using one of the following:

F ast Export:

An Introduction to FastExport Why it is called “FAST”Export

FastExport is known for its lightning speed when it comes to exporting vast amounts of data from Teradata and transferring the data into flat files on either a mainframe or network-attached computer. In addition, FastExport has the ability to except OUTMOD routines, which provides the user the capability to write, select, validate, and preprocess the exported data. Part of this speed is achieved because FastExport takes full advantage of Teradata‘s parallelism. As the demand increases to store data, the ever-growing requirement for tools to export massive amounts of data. This is the reason why FastExport (FEXP) is brilliant by design. A good rule of thumb is that if you have more than half a million rows of data to export to either a flat file format or with NULL indicators, then FastExport is the best choice to accomplish this task. Keep in mind that FastExport is designed as a one-way utility — that is, the sole purpose of FastExport is to move data out of Teradata. It does this by harnessing the parallelism that Teradata provides. FastExport is extremely attractive for exporting data because it takes full advantage of multiple sessions, which leverages Teradata parallelism. FastExport can also export from multiple tables during a single operation. In addition, FastExport utilizes the Support Environment, which provides a job restart capability from a checkpoint if an error occurs during the process of executing an export job. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 85

How FastExport Works

When FastExport is invoked, the utility logs onto the Teradata database and retrieves the rows that are specified in the SELECT statement and puts them into SPOOL. From there, it must build blocks to send back to the client. In comparison, BTEQ starts sending rows immediately for storage into a file. If the output data is sorted, FastExport may be required to redistribute the selected data two times across the AMP processors in order to build the blocks in the correct sequence. Remember, a lot of rows fit into a 64K block and both the rows and the blocks must be sequenced. While all of this redistribution is occurring, BTEQ continues to send rows. FastExport is getting behind in the processing. However, when FastExport starts sending the rows back a block at a time, it quickly overtakes and passes BTEQ‘s row at time processing. The other advantage is that if BTEQ terminates abnormally, all of your rows (which are in SPOOL) are discarded. You must rerun the BTEQ script from the beginning. However, if FastExport terminates abnormally, all the selected rows are in worktables and it can continue sending them where it left off. Pretty smart and very fast! Also, if there is a requirement to manipulate the data before storing it on the computer‘s hard drive, an OUTMOD routine can be written to modify the result set after it is sent back to the client on either the mainframe or LAN. Just like the BASF commercial states, ―We don‘t make the products you buy, we make the products you buy better‖. FastExport is designed off the same premise, it does not make the SQL SELECT statement faster, but it does take the SQL SELECT statement and processes the request with lighting fast parallel processing! FastExport Fundamentals #1: FastExport EXPORTS data from Teradata. The reason they call it FastExport is because it takes data off of Teradata (Exports Data). FastExport does not import data into Teradata. Additionally, like BTEQ it can output multiple files in a single run. #2: FastExport only supports the SELECT statement. The only DML statement that FastExport understands is SELECT. You SELECT the data you want exported and FastExport will take care of the rest. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 86

#3: Choose FastExport over BTEQ when Exporting Data of more than half a million+ rows . When a large amount of data is being exported, FastExport is recommended over BTEQ Export. The only drawback is the total number of FastLoads, FastExports, and MultiLoads that can run at the same time, which is limited to 15. BTEQ Export does not have this restriction. Of course, FastExport will work with less data, but the speed may not be much faster than BTEQ. #4: FastExport supports multiple SELECT statements and multiple tables in a single run. You can have multiple SELECT statements with FastExport and each SELECT can join information up to 64 tables. #5: FastExport supports conditional logic, conditional expressions, arithmetic calculations, and data conversions.FastExport is flexible and supports the above conditions, calculations, and conversions. #6: FastExport does NOT support error files or error limits. FastExport does not record particular error types in a table. The FastExport utility will terminate after a certain number of errors have been encountered. #7: FastExport supports user-written routines INMODs and OUTMODs. FastExport allows you write INMOD and OUTMOD routines so you can select, validate and preprocess the exported data.

Maximum of 15 Loads

The Teradata RDBMS will only support a maximum of 15 simultaneous FastLoad, MultiLoad, or FastExport utility jobs. This maximum value is determined and configured by the DBS Control record. This value can be set from 0 to 15. When Teradata is initially installed, this value is set at 5. The reason for this limitation is that FastLoad, MultiLoad, and FastExport all use large blocks to transfer data. If more then 15 simultaneous jobs were supported, a saturation point could be reached on the availability of resources. In this case, Teradata does an excellent job of protecting system resources by queuing up additional FastLoad, MultiLoad, and FastExport jobs that are attempting to connect. For example, if the maximum numbers of utilities on the Teradata system is reached and another job attempts to run that job does not start. This limitation should be viewed as a safety control feature. A tip for Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 87

remembering how the load limit applies is this, ― If the name of the load utility contains either the word “Fast” or the word “Load” , then there can be only a total of fifteen of them running at any one time”.

BTEQ does not have this load limitation. FastExport is clearly the better choice when exporting data. However, if two many load jobs are running. BTEQ is an alternate choice for exporting data.

A FastExport in its Simplest Form

The hobby of racecar driving can be extremely frustrating, challenging, and rewarding all at the same time. I always remember my driving instructor coaching me during a practice session in a new car around a road course racetrack. He said to me, ―Before you can learn to run, you need to learn how to walk.‖ This same philosophy can be applied when working with FastExport. If FastExport is broken into steps, then several things that appear to be complicated are really very simple. With this being stated, FastExport can be broken into the following steps: Logging onto Teradata Retrieves the rows you specify in your SELECT statement Exports the data to the specified file or OUTMOD routine Logs off of Teradata

LOGTABLE sql01.SWA_Log;

Creates the logtable Required

.LOGON demo/usr01,demopwd;

Logon to Teradata

BEGIN EXPORT SESSIONS 12;

Begin the Export and set the number of sessions on Teradata

.EXPORT OUTFILE Student.txt

Defines the output file name.


Page 88

MODE RECORD FORMAT TEXT;

In addition, specifies the output mode and format (LAN – ONLY) The SELECT defines the column used to create the export file. NOTE: The selected columns for the export are being converted to character types. This will simplify the importing process into a different database.

/* Finish the Export Job and Write to File */ .END EXPORT; .LOGOFF;

End the Export and logoff Teradata.

FastExport Modes and Formats FastExport Modes

FastExport has two modes: RECORD or INDICATOR. In the mainframe world, only use RECORD mode. In the UNIX or LAN environment, RECORD mode is the default, but you can use INDICATOR mode if desired. The difference between the two modes is INDICATOR mode will set the indicator bits to 1 for column values containing NULLS. Both modes return data in a client internal format with variable-length records. Each individual record has a value for all of the columns specified by the SELECT statement. All variable-length columns are preceded by a two-byte control value indicating the length of the column data. NULL columns have a value that is appropriate for the column data type. Remember, INDICATOR mode will set bit flags that identify the columns that have a null value. FastExport Formats

FastExport has many possible formats in the UNIX or LAN environment. The FORMAT statement specifies the format for each record being exported which are: Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 89

FASTLOAD BINARY TEXT UNFORMAT The default FORMAT is FASTLOAD in a UNIX or LAN environment. FASTLOAD Format is a two-byte integer, followed by the data, followed by an end-of-record marker. It is called FASTLOAD because the data is exported in a format ready for FASTLOAD. BINARY Format is a two-byte integer, followed by data. TEXT is an arbitrary number of bytes followed by an end-of-record marker. UNFORMAT is exported as it is received from CLIv2 without any client modifications.

F ast load:

FastLoad Has Two Phases

Teradata is famous for its end-to-end use of parallel processing. Both the data and the tasks are divided up among the AMPs. Then each AMP tackles its own portion of the task with regard to its portion of the data. This same ―divide and conquer‖ mentality also expedites the load process. FastLoad divides its job into two phases, both designed for speed. They have no fancy names but are typically known simply as Phase 1 and Phase 2. Sometimes they are referred to as Acquisition Phase and Application Phase. PHASE 1: Acquisition

The primary function of Phase 1 is to transfer data from the host computer to the Access Module Processors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradata does not does not take the time to hash each row of data based on the Primary Index. That will be done later. Instead, it does the following: Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 90

When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the SQL just once. The PE is the Teradata software processor responsible for parsing syntax and generating a plan to execute the request. It then opens a Teradata session from the FastLoad client directly to the AMPs. By default, one session is created for each AMP. Therefore, on large systems, it is normally a good idea to limit the number of sessions using the SESSIONS command. This capability is shown below. Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for transfer to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. To accomplish this, the rows are packed, unhashed, into large blocks and sent to the AMPs without any concern for which AMP gets the block. The result is that data rows arrive on different AMPs than those they would live, had they been hashed. So how do the rows get to the correct AMPs where they will permanently reside? Following the receipt of every data block, each AMP hashes its rows based on the Primary Index, and redistributes them to the proper AMP. At this point, the rows are written to a worktable on the AMP but remain unsorted until Phase 1 is complete. Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shipping industry today. How do the key players in this industry handle a parcel? When the shipping company receives a parcel, that parcel is not immediately sent to its final destination. Instead, for the sake of speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from that hub it is sent to the destination city. FastLoad‘s Phase 1 uses the AMPs in much the same way that the shipper uses its hubs. First, all the data blocks in the load get rushed randomly to any AMP. This just gets them to a ―hub‖ somewhere in Teradata country. Second, each AMP forwards them to their true destination. This is like the shipping parcel being sent from a hub city to its destination city! PHASE 2: Application

Following the scenario described above, the shipping vendor must do more than get a parcel to the destination city. Once the packages arrive at the destination city, they must then be sorted by street and zip code, placed onto local trucks and be driven to their final, local destinations. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 91

Similarly, FastLoad‘s Phase 2 is mission critical for getting every row of data to its final address (i.e., where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then it writes the rows into the table space on disks where they will permanently reside. Rows of a table are stored on the disks in data blocks. The AMP uses the block size as defined when the target table was created. If the table is Fallback protected, then the Fallback will be loaded after the Primary table has finished loading. This enables the Primary table to become accessible as soon as possible. FastLoad is so ingenious, no wonder it is the darling of the Teradata load utilities!.

Steps to wr i te F astexpor t scri pt: Step One: Before logging onto Teradata, it is important to specify how many sessions you need. The syntax is [SESSIONS {n}]. Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commands in FastLoad are similar to those in BTEQ. FastLoad commands were designed from the underlying commands in BTEQ. However, unlike BTEQ, most of the FastLoad commands do not allow a dot [―.‖] in front of them and therefore need a semi-colon. At this point we chose to have Teradata tell us which version of FastLoad is being used for the load. Why would we recommend this? We do because as FastLoad‘s capabilities get enhanced with newer versions, the syntax of the scripts may have to be revisited. Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structure in the DEFINE statement, you must first set the RECORD layout type for the file being passed by FastLoad. We have used VARTEXT in our example with a comma delimiter. The other options are FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need to know this about your input file ahead of time. Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of the flat file to be used as the input FILE, or source file for the load. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 92

Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to what you want loaded. In the BEGIN LOADING statement, the script must name the target table and the two error tables for the load. Did you notice that there is no CREATE TABLE statement for the error tables in this script? FastLoad will automatically create them for you once you name them in the script. In this instance, they are named ―Emp_Err1‖ and ―Emp_Err2‖. Phase 1 uses ―Emp_Err1‖ because it comes first and Phase 2 uses ―Emp_Err2‖. The names are arbitrary, of course. You may call them whatever you like. At the same time, they must be unique within a database, so using a combination of your userid and target table name helps insure this uniqueness between multiple FastLoad jobs occurring in the same database.

In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter. We included [CHECKPOINT 100000]. Although not required, this optional parameter performs a vital task with regard to the load. In the old days, children were always told to focus on the three ―R‘s‘ in grade school (―reading, ‗riting, and ‗rithmatic‖). There are two very diff erent, yet equally important, R‘s to consider whenever you run FastLoad. They are RERUN and RESTART. RERUN means that the job is capable of running all the processing again from the beginning of the load. RESTART means that the job is capable of running the processing again from the point where it left off when the job was interrupted , causing it to fail. When CHECKPOINT is requested, it allows FastLoad to resume loading from the first row following the last successful CHECKPOINT. We will learn more about CHECKPOINT in the section on Restarting FastLoad. Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier‘s do when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will not proceed to Phase 2 without the END LOADING command.

In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at the start of the job, it prevents loading rows as they arrive from different time zones. However, to accomplish this processing, simply omit the END LOADING on the load job. Then, you can run the same FastLoad multiple times and continue loading the worktables until the last file is received. Then run the last FastLoad job with an END LOADING and you have partitioned your load jobs into smaller segments instead of one huge job. This makes FastLoad even faster! Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 93

Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP or CREATE commands within the script. Additionally, every script is exactly the same with the exception of the last one, which contains the END LOADING causing FastLoad to proceed to Phase 2. That‘s a pretty clever way to do a partitioned type of data load. Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be the last utility command in your script. At this point the table lock is released and if there are no rows in the error tables, they are dropped automatically. However, if a single row is in one of them, you are responsible to check it, take the appropriate action and drop the table manually.

Converting Data Types with FastLoad

Converting data is easy. Just define the input data types in the input file. Then, FastLoad will compare that to the column definitions in the Data Dictionary and convert the data for you! But the cardinal rule is that only one data type conversion is allowed per column . In the example below, notice how the columns in the input file are converted from one data type to another simply by redefining the data type in the CREATE TABLE statement. FastLoad allows six kinds of data conversions. Here is a chart that displays them: IN FASTLOAD YOU MAY CONVERT

CHARACTER DATA

TO

NUMERIC DATA

FIXED LENGTH DATA

TO

VARIABLE LENGTH DATA

CHARACTER DATA

TO

DATE

INTEGERS

TO

DECIMALS

DECIMALS

TO

INTEGERS

DATE

TO

CHARACTER DATA

NUMERIC DATA

TO

CHARACTER DATA

Figure 4-4 Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 94

When we said that converting data is easy, we meant that it is easy for the user. It is actually quite resource intensive, thus increasing the amount of time needed for the load. Therefore, if speed is important, keep the number of columns being converted to a minimum! When You Cannot RESTART FastLoad

There are two types of FastLoad scripts: those that you can restart and those that you cannot without modifying the script. If any of the following conditions are true of the FastLoad script that you are dealing with, it is NOT restartable: The Error Tables are DROPPED The Target Table is DROPPED The Target Table is CREATED Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience a system reset or some glitch that stops the job one half way through it. Maybe the mainframe went down. Well, it is not really a big deal because FastLoad is so lightning-fast that you could probably just RERUN the job for small data loads. However, when you are loading a billion rows, this is not a good idea because it wastes time. So the most common way to deal with these situations is simply to RESTART the job. But what if the normal load takes 4 hours, and the glitch occurs when you already have two thirds of the data rows loaded? In that case, you might want to make sure that the job is totally restartable. Let‘s see how this is done. When You Can RESTART FastLoad

If all of the following conditions are true, then FastLoad is ALWAYS restartable: The Error Tables are NOT DROPPED in the script The Target Table is NOT DROPPED in the script The Target Table is NOT CREATED in the script You have defined a checkpoint Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 95

So, if you need to drop or create tables, do it in a separate job using BTEQ. Imagine that you have a table whose data changes so much that you typically drop it monthly and build it again. Let‘s go back to the script we just reviewed above and see how we can break it into the two parts necessary to make it fully RESTARTABLE. It is broken up below. STEP ONE: Run the following SQL statements in Queryman or BTE Q before you start FastLoad: DROP TABLE SQL01.Department; DROP TABLE SQL01.Dept_Err1; DROP TABLE SQL01.Dept_Err2;

DROPS TARGET TABLE AND ERROR TABLES

CREATES THE DEPARTMENT TARGET TABLE IN THE SQL01 DATA BASE IN TERADATA Figure 4-6 First, you ensure that the target table and error tables, if they existed previously, are blown away. If there had been no errors in the error tables, they would be automatically dropped. If these tables did not exist, you have not lost anything. Next, if needed, you create the empty table structure needed to receive a FastLoad. STEP TWO: Run the FastLoad script

This is the portion of the earlier script that carries out these vital steps: Defines the structure of the flat file Tells FastLoad where to load the data and store the errors Specifies the checkpoint so a RESTART will not go back to row one Loads the data If these are true, all you need do is resubmit the FastLoad job and it starts loading data again with the next record after the last checkpoint. Now, with Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 96

that said, if you did not request a checkpoint, the output message will normally indicate how many records were loaded. You may optionally use the RECORD command to manually restart on the next record after the one indicated in the message. Now, if the FastLoad job aborts in Phase 2, you can simply submit a script with only the BEGIN LOADING and END LOADING. It will then restart right into Phase 2. What Happens When FastLoad Finishes You Receive an Outcome Status

The most important thing to do is verify that FastLoad completed successfully. This is accomplished by looking at the last output in the report and making sure that it is a return code or status code of zero (0). Any other value indicates that somethi ng wasn‘t perfect and needs to be fixed. The locks will not be removed and the error tables will not be dropped without a successful completion. This is because FastLoad assumes that it will need them for its restart. At the same time, the lock on the target table will not be released either. When running FastLoad, you realistically have two choices once it is started. First choice is that you get it to run to a successful completion, or lastly, rerun it from the beginning. As you can imagine, the best course of action is normally to get it to finish successfully via a restart. You Receive a Status Report

What happens when FastLoad finishes running? Well, you can expect to see a summary report on the success of the load. Following is an example of such a report. Line 1: Line 2: Line 3: Line 4: Line 5:

TOTAL RECORDS READ = 1000000 TOTAL ERRORFILE1 = 50 TOTAL ERRORFILE2 = 0 TOTAL INSERTS APPLIED = 999950 TOTAL DUPLICATE ROWS = 0

Figure 4-7 Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 97

The first line displays the total number of records read from the input file. Were all of them loaded? Not really. The second line tells us that there were fifty rows with constraint violations, so they were not loaded. Corresponding to this, fifty entries were made in the first error table. Line 3 shows that there were zero entries into the second error table, indicating that there were no duplicate Unique Primary Index violations. Line 4 shows that there were 999950 rows successfully loaded into the empty target table. Finally, there were no duplicate rows. Had there been any duplicate rows, the duplicates would only have been counted. They are not stored in the error tables anywhere. When FastLoad reports on its efforts, the number of rows in lines 2 through 5 should always total the number of records read in line 1. Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally be duplicate rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint (quiet or quiescent point) when nothing is happening within FastLoad. Therefore, some number of rows will be sent to the AMPs again because the restart starts on the next record after the value stored in the checkpoint. Hence, when a restart occurs, the first row after the checkpoint and some of the consecutive rows are sent a second time. These will be caught as duplicate rows after the sort. This restart logic is the reason that FastLoad will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic.

You Can Troubleshoot

In the example above, we know that the load was not entirely successful. But that is not enough. Now we need to troubleshoot in order identify the errors and correct them. FastLoad generates two error tables that will enable us to find the culprits. The first error table, which we named Errorfile1, contains just three columns: The column ErrorCode contains the Teradata FastLoad code number to a corresponding translation or constraint error. The second column, named ErrorField , specifies which column in the table contained the error. The third column, DataParcel , contains the row with the problem. Both error tables contain the same three columns; they just track different types of errors. As a user, you can select from either error table. To check errors in Errorfile1 you would use this syntax:


Page 98

Corrected rows may be inserted to the target table using another utility that does not require an empty table. To check errors in Errorfile2 you would the following syntax:

The definition of the second error table is exactly the same as the target table with all the same columns and data types. 2A0C022C00000

How the CHECKPOINT Option Works

CHECKPOINT option defines the points in a load job where the FastLoad utility pauses to record that Teradata has processed a specified number of rows. When the parameter ―CHECKPOINT [n]‖ is included in the BEGIN LOADING clause the system will stop loading momentarily at increments of [n] rows. At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading smoothly. Then FastLoad sends a checkpoint report (entry) to the SYSADMIN .Fastlog table. This log contains a list of all currently running FastLoad jobs and the last successfully reached checkpoint for each job. Should an error occur that requires the load to restart, FastLoad will merely go back to the last successfully reported checkpoint prior to the error. It will then restart from the record immediately following that checkpoint and start building the next block of data to load. If such an error occurs in Phase 1, with CHECKPOINT 0, FastLoad will always restart from the very first row. Restarting with CHECKPOINT

Sometimes you may need to restart FastLoad. If the FastLoad script requests a CHECKPOINT (other than 0), then it is restartable from the last successful checkpoint. Therefore, if the job fails, simply resubmit the job. Here are the two options: Suppose Phase 1 halts prematurely; the Data Acquisition phase is incomplete. Resubmit the FastLoad script. FastLoad will begin from RECORD 1 or the first record past the last checkpoint. If you wish to manually specify where FastLoad should restart, locate the last successful checkpoint record by referring to the SYSADMIN.FASTLOG table. To specify where a restart will start from, use the RECORD command. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 99

Normally, it is not necessary to use the RECORD command — let FastLoad automatically determine where to restart from. If the interruption occurs in Phase 2, the Data Acquisition phase has already completed. We know that the error is in the Application Phase. In this case, resubmit the FastLoad script with only the BEGIN and END LOADING Statements. This will restart in Phase 2 with the sort and building of the target table. Restarting without CHECKPOINT (i.e., CHECKPOINT 0)

When a failure occurs and the FastLoad Script did not utilize the CHECKPOINT (i.e., CHECKPOINT 0), one procedure is to DROP the target table and error tables and rerun the job. Here are some other options available to you: Resubmit job again and hope there is enough PERM space for all the rows already sent to the unsorted target table plus all the rows that are going to be sent again to the same target table. Other than using space, these rows will be rejected as duplicates. As you can imagine, this is not the most efficient way since it processes many of the same rows twice. If CHECKPOINT wasn‘t specified, then CHECKPOINT defaults to 100,000. You can perform a manual restart using the RECORD statement. If the output print file shows that checkpoint 100000 occurred, use something like the following command: [RECORD 100001;]. This statement will skip records 1 through 10000 and resume on record 100001.

Using INMODs with FastLoad

When you find that FastLoad does not read the file type you have or you wish to control the access for any reason, then it might be desirable to use an INMOD. An INMOD (Input Module), is fully compatible with FastLoad in either mainframe or LAN environments, providing that the appropriate programming languages are used. However, INMODs replace the normal mainframe DDNAME or LAN defined FILE name with the following statement: DEFINE INMOD=. For a more in-depth discussion of INMODs, see th e chapter of this book titled ―INMOD Processing‖. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 100

Multiload: Why it is called “Multi”Load

If we were going to be stranded on an island with a Teradata Data Warehouseand we could only take along one Teradata load utility, clearly, MultiLoad would be our choice. MultiLoad has the capability to load multiple tables at one time from either a LAN or Channel environment. This is in stark contrast to its fleet-footed cousin, FastLoad, which can only loadone table at a time. And it gets better, yet! This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE, DELETE and UPSERT on up to five (5) empty or populated target tables at a time. These DML functions may be run either solo or in combinations, against one or more tables. For these reasons, MultiLoad is the utility of choice when it comes to loading populated tables in the batch environment. As the volume of data being loaded or updated in a single block, the performance of MultiLoad improves. MultiLoad shines when it can impact more than one row in every data block. In other words, MultiLoad looks at massive amounts o f data and says, ―Bring it on!‖ Leo Tolstoy once said, ―All happy families resemble each other.‖ Like happy families, the Teradata load utilities resemble each other, although they may have some differences. You are going to be pleased to find that you do not have to learn all new commands and concepts for each load utility. MultiLoad has many similarities to FastLoad. It has even more commands in common with TPump. The similarities will be evident as you work with them. Where there are some quirky differences, we will point them out for you.

Two MultiLoad Modes: IMPORT and DELETE

MultiLoad provides two types of operations via modes: IMPORT and DELETE. In MultiLoad IMPORT mode, you have the freedom to ―mix and match‖ up to twenty (20) INSERTs, UPDATEs or DEL ETEs on up to five target tables. The execution of the DML statements is not mandatory for all rows in a table. Instead, their execution hinges upon the conditions contained in the APPLY clause of the script. Once again, MultiLoad demonstrates its user-friendly flexibility. For UPDATEs or DELETEsto be Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 101

successful in IMPORT mode, they must reference the Primary Index in the WHERE clause. The MultiLoad DELETE mode is used to perform a global (all AM P) delete on just one table. The reason to use .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can be RESTARTed if an error causes it to terminate prior to finishing. When performing in DELETE mode, the DELETE SQL statement cannot referencethe Primary Index in the WHERE clause. This due to the fact that a primary index access is to a specific AMP; this is a global operation. The other factor that makes a DELETE mode operation so good is that it examines an entire block of rows at a time. Once all the eligible rows have been removed, the block is written one time and a checkpoint is written. So, if a restart is necessary, it simply starts deleting rows from the next block without a checkpoint. This is a smart way to continue. Remember, when using the TJ all deleted rows are put back into the table from the TJ as a rollback. A rollback can take longer to finish then the delete. MultiLoad does not do a rollback; it does a restart. The Purpose of DELETE MLOAD

In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limited to four months, monthly data is rotated in and out . Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 102

At the end of every month, the oldest month of data is removed and the new month is added. The cycle is ―add a month, delete a month, add a month, delete a month.‖ In our illustration, that means that January data must be deleted to make room for May‘s data. Here is a question for you: What if there was another way to accomplish this same goal without con suming all of these extra resources? To illustrate, let‘s consider the following scenario: Suppose you have Table A that contains 12 billion rows. You want to delete a range of rows based on a date and then load in fresh data to replace these rows. Normally, the process is to perform a MultiLoad DELETE to DELETE FROM Table A WHERE < ‗2002-02-01‘. The final step would be to INSERT the new rows for May using MultiLoad IMPORT. Block and Tackle Approach

MultiLoad never loses sight of the fact that it is designed for functionality, speed, and the ability to restart. It tackles the proverbial I/O bottleneck problem like FastLoad by assembling data rows into 64K blocks and writing them to disk on the AMPs. This is much faster than writing data one row at a time like BTEQ. Fallback table rows are written after the base table has been loaded. This allows users to access the base table immediately upon completion of the MultiLoad while fallback rows are being loaded in the background. The benefit is reduced time to access the data. Amazingly, MultiLoad has full RESTART capability in all of its five phases of operation. Once again, this demonstrates its tremendous flexibility as a load utility. Is it pure magic? No, but it almost seems so. MultiLoad makes effective useof two error tables to save different types of errors and a LOGTABLE that stores built-in checkpoint information for restarting. This is why MultiLoad does not use the Transient Journal, thus averting timeconsuming rollbacks when a job halts prematurely. Here is a key difference to note between MultiLoad and FastLoad. Sometimes an AMP (Access Module Processor) fails and the system administrators say that the AMP is ―down‖ or ―offline.‖ When using FastLoad, you must restart the AMP to restart the job. MultiLoad, however, can RESTART when an AMP fails, if the table is fallback protected. As the same time, you can use the AMPCHECK option to make it work like FastLoad if you want. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 103

MultiLoad Imposes Limits Rule #1: Unique Secondary Indexes are not supported on a Target Table. LikeFastLoad, MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support the use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on the same AMP as the data row. MultiLoad uses every AMP independently and in parallel. If two AMPs must communicate, they are not independent. Therefore, a NUSI (same AMP) is fine, but a USI (different AMP) is not. Rule #2: Referential Integrity is not supported. MultiLoad will not load data into tables that are defined with Referential Integrity (RI). Like a USI, this requires the AMPs to communicate with each other. So, RI constraints must be dropped from the target table prior to using MultiLoad. Rule #3: Triggers are not supported at load time. Triggers cause actions on related tables based upon what happens in a target table. Again, this is a multi-AMP operation and to a different table. To keep MultiLoad running smoothly, disable all Triggers prior to using it. Rule #4: No concatenation of input files is allowed. MultiLoad does not want you to do this because it could impact are restart if the files were concatenated in a different sequence or data was deleted between runs. Rule #5: The host will not process aggregates, arithmetic functions or exponentiation. If you need data conversions or math, you might be better off using an INMOD to prepare the data prior to loading it.

Error Tables, Work Tables and Log Tables

Besides target table(s), MultiLoad requires the use of four special tables in order to function. They consist of two error tables (per target table), one worktable (per target table), and one log table. In essence, the Er ror Tables will be used to store any conversion, constraint or uniqueness violations during a load. Work T ables are used to receive and sort data and SQL on each AMP prior to storing them permanently to disk. A L og Table (also called, ―Logtable‖) is used to store successful checkpoints during load processing in case a RESTART is needed. HINT: Sometimes a company wants all of these load support tables to be housed in a particular database. When these tables are to be stored in any Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 104

database other than the user‘s own default database, then you must give them a qualified name (.) in the script or use the DATABASE command to change the current database.

Where will you find these tables in the load script? The Logtable is generally identified immediately prior to the .LOGON command. Worktables and error tables can be named in the BEGIN MLOAD statement. Do not underestimate the value of these tables. They are vital to the operation of MultiLoad. Without them a MultiLoad job can not run. Now that you have had the ―executive summary‖, let‘s look at each type of table individually. Two Er r or Tables: Here is another place where FastLoad and MultiLoad are

similar. Both require the use of two error tables per target table. MultiLoad will automatically create these tables. Rows are inserted into these tables only when errors occur during the load process. The first error table is the acquisition Error Table (ET). It contains all translation and constraint errors that may occur while the data is being acquired from the source(s). The second is the Uniqueness Violation (UV)table that stores rows with duplicate values for Unique Primary Indexes (UPI). Since a UPI must be unique, MultiLoad can only load one occurrence into a table. Any duplicate value will be stored in the UV error table. For example, you might see a UPI error that shows a second employee number ―99.‖ In this case, if the name for employee ―99‖ is Kara Morgan, you will be glad that the row did not load since Kara Morgan is already in the Employee table. However, if the name showed up as David Jackson, then you know that further investigation is needed, because employee numbers must be unique. Each error table does the following: Identifies errors Provides some detail about the errors Stores the actual offending row for debugging You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if you do not name them, they default to ET_ and UV_. In either case, MultiLoad will not accept error table names that are the same as target table names. It does not matter what you name them. It is recommended that you Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 105

standardize on the naming convention to make it easier for everyone on your team. For more details on how these error tables can help you, see the subsection in this chapter titled , ―Troubleshooting MultiLoad Errors.‖ L og Table: MultiLoad requires a LOGTABLE. This table keeps a record of

the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART. There is one LOGTABLE for each run. Since MultiLoad will not resubmit a command that has been run previously, it will use the LOGTABLE to determine the last successfully completed step. Work Table(s): MultiLoad will automatically create one worktable for each

target table. This means that in IMPORT mode you could have one or more worktables. In the DELETE mode, you will only have one worktable since that mode only works on one target table. The purpose of worktables is to hold two things: The Data Manipulation Language (DML) tasks The input data that is ready to APPLY to the AMPs The worktables are created in a database using PERM space. They can become very large. If the script uses multiple SQL statements for a single data record, the data is sent to the AMP once for each SQL statement. This replication guarantees fast performance and that no SQL statement will ever be done more than once. So, this is very important. However, there is no such thing as a free lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this disk space by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad run is in your hands.

MultiLoad Has Five IMPORT Phases MultiLoad IMPORT has five phases, but don‘t be fazed by this! Here is the short list:

Phase 1: Preliminary Phase Phase 2: DML Transaction Phase Phase 3: Acquisition Phase Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 106

Phase 4: Application Phase Phase 5: Cleanup Phase Let‘s take a look at each phase and see what it contributes to the overall load process of this magnificent utility. Should you memorize every detail about each phase? Probably not. But it is important to know the essence of each phase because sometimes a load fails. When it does, you need to know in which phase it broke down since the method for fixing the error to RESTART may vary depending on the phase. And if you can picture what MultiLoad actually does in each phase, you will likely write better scripts that run more efficiently.

Phase 1: Preliminary Phase

The ancient oriental proverb says, “Measure one thousand times; Cut once.” MultiLoad uses Phase 1 to conduct several preliminary set-up activities whose goal is to provide a smooth and successful climate for running your load. The first task is to be sure that the SQL syntax and MultiLoad commands are valid. After all, why try to run a script when the system will just find out during the load process that the statements are not useable? MultiLoad knows that it is much better to identify any syntax errors, right up front. All the preliminary steps are automated. No user intervention is required in this phase. Second, all MultiLoad sessions with Teradata need to be established. The default is the number of available AMPs. Teradata will quickly establish this number as a factor of 16 for the basis regarding the number of sessions to create. The general rule of thumb for the number of sessions to use for smaller systems is the following: use the number of AMPs plus two more . For larger systems with hundreds of AMP processors, the SESSIONS option is available to lower the default. Remember, these sessions are running on your poor little computer as well as on Teradata. Each session loads the data to Teradata across the network or channel. Every AMP plays an essential role in the MultiLoad process. They receive the data blocks, hash each row and send the rows to the correct AMP. When the rows come to an AMP, it stores them in worktable blocks on disk. But, lest we get ahead of ourselves, suffice it to say that there is ample reason for multiple sessions to be established. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 107

What about the extra two sessions? Well, the first one is a control session to handle the SQL and logging. The second is a back up or alternate for logging. You may have to use some trial and error to find what works best on your system configuration. If you specify too few sessions it may impair performance and increase the time it takes to complete load jobs. On the other hand, too many sessions will reduce the resources available for other important database activities. Third, the required support tables are created. They are the following: Type of Table

Table Details

ERRORTABLES MultiLoad requires two error tables per target table. The first error table contains constraint violations, while the second error table stores Unique Primary Index violations. WORKTABLES Work Tables hold two things: the DML tasks requested and the input data that is ready to APPLY to the AMPs. LOGTABLE

The LOGTABLE keeps a record of the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART.

Figure 5-2 The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access locks are placed on all target tables, allowing other users to read or write to the table for the time being. However, this lock does prevent the opportunity for a user to request an exclusive lock. Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP or ALTER a target table while it is locked for loading. This leads us to Phase 2. Phase 2: DML Transaction Phase

In Phase 2, all of the SQL Data Manipulation Language (DML) statements are sent ahead to Teradata. MultiLoad allows the use of multiple DML functions. Teradata‘s Parsing Engine (PE) parses the DML and generates a step-by-step plan to execute the request. This execution plan is then communicated to each AMP and stored in the appropriate worktable for each target table. In other words, each AMP is going to work off the same page. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 108

Later, during the Acquisition phase the actual input data will also be stored in the worktable so that it may be applied in Phase 4, the Application Phase. Next, a match tag is assigned to each DML request that will match it with the appropriate rows of input data. The match tags will not actually be used until the data has already been acquired and is about to be applied to the worktable. This is somewhat like a student who receives a letter from the university in the summer that lists his courses, professor‘s names, and classroom locations for the upcoming semester. The letter is a ―match tag‖ for the student to his school schedule, although it will not be used for several months. This matching tag for SQL and data is the reason that the data is replicated for each SQL statement using the same data record. Phase 3: Acquisition Phase

With the proper set-up complete and the PE‗s plan stored on each AMP, MultiLoad is now ready to receive the INPUT data. This is where it gets interesting! MultiLoad now acquires the data in large, unsorted 64K blocks from the host and sends it to the AMPs. At this point, Teradata does not care about which AMP receives the data block. The blocks are simply sent, one after the other, to the next AMP in line. For their part, each AMP begins to deal with the blocks that they have been dealt. It is like a game of cards — you take the cards that you have received and then play the game. You want to keep some and give some away. Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMP hashes each row on the primary index and sends it over the BYNET to the proper AMP where it will ultimately be used. But the row does not get inserted into its target table, just yet. The receiving AMP must first do some preparation before that happens. Don‘t you have to get ready before company arrives at your house? The AMP puts all of the hashed rows it has received from other AMPs into the worktables where it assembles them into the SQL. Why? Because once the rows are reblocked, they can be sorted into the proper order for storage in the target table. Now the utility places a load lock on each target table in preparation for the Application Phase. Of course, there is no Acquisition Phase when you perform a MultiLoad DELETE task, since no data is being acquired. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 109

Phase 4: Application Phase

The purpose of this phase is to write, or APPLY, the specified changes to both the target tables and NUSI subtables. Once the data is on the AMPs, it is married up to the SQL for execution. To accomplish this substitution of data into SQL, when sending the data, the host has already attached some sequence information and five (5) match tags to each data row. Those match tags are used to join the data with the proper SQL statement based on the SQL statement within a DMP label. In addition to associating each row with the correct DML statement, match tags also guarantee that no row will be updated more than once, even when a RESTART occurs. Remember, MultiLoad allows for the existence of NUSI processing during a load. Every hash-sequence sorted block from Phase 3 and each block of the base table is read only once to reduce I/O operations to gain speed. Then, all matching rows in the base block are inserted, updated or deleted before the entire block is written back to disk, one time. This is why the match tags are so important. Changes are made based upon corresponding data and DML (SQL) based on the match tags. They guarantee that the correct operation is performed for the rows and blocks with no duplicate operations, a block at a time. And each time a table block is written to disk successfully, a record is inserted into the LOGTABLE. This permits MultiLoad to avoid starting again from the very beginning if a RESTART is needed. What happens when several tables are being updated simultaneously? In this case, all of the updates are scripted as a multi-statement request. That means that Teradata views them as a single transaction. If there is a failure at any point of the load process, MultiLoad will merely need to be RESTARTed from the point where it failed. No rollback is required. Any errors will be written to the proper error table. Phase 5: Clean Up Phase

Those of you reading these paragraphs that have young children or teenagers will certainly appreciate this final phase! MultiLoad actually cleans up after itself . The utility looks at the final Error Code (&SYSRC). MultiLoad believes the adage, ―All is well that ends well.‖ If the last error code is zero (0), all of the job steps have ended successfully (i.e., all has certainly ended well). This being the case, all empty error tables, worktables and the log table are dropped. All locks, both Teradata and MultiLoad, are released. The Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 110

statistics for the job are generated for output (SYSPRINT) and the system count variables are set. After this, each MultiLoad session is logged off. So what happens if the final error code is not zero? Stay tuned. Restarting MultiLoad is a topic that will be covered later in this chapter.

A Simple MultiLoad IMPORT Script

MultiLoad can be somewhat intimidating to the new user because there are many commands and phases. In reality, the load scripts are understandable when you think through what the IMPORT mode does: Setting up a Logtable Logging onto Teradata Identifying the Target, Work and Error tables Defining the INPUT flat file Defining the DML activities to occur Naming the IMPORT file Telling MultiLoad to use a particular LAYOUT Telling the system to start loading Finishing loading and logging off of Teradata Step One: Setting up a Logtable and Logging onto Teradata — MultiLoad requires you specify a log table right at the outset with the .LOGTABLE command. We have called it CDW_Log. Once you name the Logtable, it will be automatically created for you. The Logtable may be placed in the same database as the target table, or it may be placed in another database. Immediately after this you log onto Teradata using the .LOGON command. The order of these two commands is interchangeable, but it is recommended to define the Logtable first and then to Log on, second. If you reverse the order, Teradata will give a warning message. Notice that the commands in MultiLoad require a dot in front of the command key word. Step Two: Identifying the Target, Work and Error tables — In this step of the script you must tell Teradata which tables to use. To do this, you use Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 111

the .BEGIN IMPORT MLOAD command. Then you will preface the names of these tables with the sub-commands TABLES, WORKTABLES AND ERROR TABLES. All you must do is name the tables and specify what database they are in. Work tables and error tables are created automatically for you. Keep in mind that you get to name and locate these tables. If you do not do this, Teradata might supply some defaults of its own! At the same time, these names are optional. If the WORKTABLES and ERRORTABLES had not specifically been named, the script would still execute and build these tables. They would have been built in the default database for the user. The name of the worktable would be WT_EMPLOYEE_DEPT1 and the two error tables would be called ET_EMPLOYEE_DEPT1 and UV_EMPLOYEE_DEPT1, respectively. Sometimes, large Teradata systems have a work database with a lot of extra PERM space. One customer calls this database CORP_WORK. This is where all of the logtables and worktables are normally created. You can use a DATABASE command to point all table creations to it or qualify the names of these tables individually. Step Three: Defining the INPUT flat file record structure — MultiLoad is going to need to know the structure the INPUT flat file. Use the .LAYOUT command to name the layout. Then list the fields and their data types used in your SQL as a .FIELD. Did you notice that an asterisk is placed between the column name and its data type? This means to automatically calculate the next byte in the record. It is used to designate the starting location for this data based on the previous fields length. If you are listing fields in order and need to skip a few bytes in the record, you can either use the .FILLER (like above) to position to the cursor to the next field, or the ―*‖ on the Dept_No field could have been replaced with the number 132 ( CHAR(11)+CHAR(20)+CHAR(100)+1 ). Then, the .FILLER is not needed. Also, if the input record fields are exactly the same as the table, the .TABLE can be used to automatically define all the .FIELDS for you. The LAYOUT name will be referenced later in the .IMPORT command. If the input file is created with INDICATORS, it is specified in the LAYOUT. Step Four: Defining the DML activities to occur — The .DML LABEL names and defines the SQL that is to execute. It is like setting up executable code in a programming language, but using SQL. In our example, MultiLoad is being told to INSERT a row into the SQL01.Employee_Dept table. The Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 112

VALUES come from the data in each FIELD because it is preceded by a colon (:). Are you allowed to use multiple labels in a script? Sure! But remember this: Every label must be referenced in an APPLY clause of the .IMPORT clause. Step Five: Naming the INPUT file and its format type — This step is vital! Using the .IMPORT command, we have identified the INFILE data as being contained in a file called ―CDW_Join_Export.txt‖. Then we list the FORMAT type as TEXT. Next, we referenced the LAYOUT named FILEIN to describe the fields in the record. Finally, we told MultiLoad to APPLY the DML LABEL called INSERTS — that is, to INSERT the data rows into the target table. This is still a sub-component of the .IMPORT MLOAD command. If the script is to run on a mainframe, the INFILE name is actually the name of a JCL Data Definition (DD) statement that contains the real name of the file.

Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continues until it finds the semi-colon to define the end of the command. This is how it determines one operation from another. Therefore, it is very important or it would have attempted to process the END LOADING as part of the IMPORT — it wouldn‘t work. Step Six: Finishing loading and logging off of Teradata — This is the closing ceremonies for the load. MultiLoad to wrap things up, closes the curtains, and logs off of the Teradata system.

Important note: Since the script above in Figure 5-7 does not DROP any tables, it is completely capable of being restarted if an error occurs. Compare this to the next script in Figure 5-8. Do you think it is restartable? If you said no, part yourself on the back. Error Treatment Options for the .DML LABEL Command

MultiLoad allows you to tailor how it deals with different types of errors that it encounters during the load process, to fit your needs. Here is a summary of the options available to you:

ERROR TREATMENT OPTIONS FOR .DML LABEL Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 113

Figure 5-9 In IMPORT mode, you may specify as many as f ive distin cter r or -treatment options for one .DML statement. For example, if there is more than one instance of a row, do you want MultiLoad to IGNORE the duplicate row, or to MARK it (list it) in an error table? If you do not specify IGNORE, then MultiLoad will MARK, or record all of the errors. Imagine you have a standard INSERT load that you know will end up recording about 20,000 duplicate row errors. Using the following syntax ―IGNORE DUPLICATE INSERT ROWS;‖ will keep them out of the error table. By ignoring those errors, you gain three benefits: 1. You do not need to see all the errors. 2. The error table is not filled up needlessly. 3. MultiLoad runs much faster since it is not conducting a duplicate row check. When doing an UPSERT, there are two rules to remember: The default is IGNORE MISSING UPDATE ROWS. Mark is the default for all operations. When doing an UPSERT, you anticipate that some rows are missing, otherwise, why do an UPSERT. So, this keeps these rows out of your error table. The DO INSERT FOR MISSING UPDATE ROWS is mandatory. This tells MultiLoad to insert a row from the data source if that row does not exist in the target table because the update didn‘t find it. The table that follows shows you, in more detail, how flexible your options are:


Page 114

ERROR TREATMENT OPTIONS IN DETAIL DML LABEL OPTION

WHAT IT DOES

MARK DUPLICATE INSERT ROWS

This option logs an entry for all duplicate INSERT rows in the UV_ERR table. Use this when you want to know about the duplicates.

IGNORE DUPLICATE INSERT ROWS

This tells MultiLoad to IGNORE duplicate INSERT rows because ou do not want to see them.

MARK DUPLICATE UPDATE ROWS This logs the existence of every duplicate UPDATE row. IGNORE DUPLICATE UPDATE ROWS

This eliminates the listing of duplicate update row errors.

MARK MISSING UPDATE ROWS

This option ensures a listing of data rows that had to be INSERTed since there was no row to UPDATE.

IGNORE MISSING UPDATE ROWS

This tells MultiLoad NOT to list UPDATE rows as an error. This is a good option when doing an UPSERT since UPSERT will INSERT a new row.

MARK MISSING DELETE ROWS

This option makes a note in the ET_Error Table that a row to be deleted is missing.

IGNORE MISSING DELETE ROWS

This option says, ―Do not tell me that a row to be deleted is missing‖.

DO INSERT for MISSING UPDATE ROWS

This is required to accomplish an UPSERT. It tells MultiLoad that if the row to be updated does not exist in the target table, then INSERT the entire row from the data source.


Page 115

An IMPORT Script with Error Treatment Options The command .DML LABEL names any DML options (INSERT, UPDATE OR DELETE) that immediately follow it in the script. Each label must be given a name. In IMPORT mode, the label will be referenced for use in the APPLY Phase when certain conditions are met.

/* Setup the MultiLoad Logtables, Logon Statements*/ .LOGTABLE SQL01.CDW_Log; .LOGON TDATA/SQL01,SQL01;

Sets up a Logtable and then logs on to Teradata.

DATABASE SQL01;

Specifies the database in which to find the target table.

/*Drop Error Tables */ DROP TABLE WORKDB.CDW_ET; DROP TABLE WORKDB.CDW_UV;

Drops Existing error tables in the work database.

/* Begin Import and Define Work and Error Tables */ .BEGIN IMPORT MLOAD TABLES Employee_Dept WORKTABLES WORKDB.CDW_WT ERRORTABLES WORKDB.CDW_ET WORKDB.CDW_UV;

Begins the Load Process by telling us first the names of the Target Table, Work table and error tables are in a work database. Note there is no comma between the names of the error tables (pair).

/* Define Layout of Input File */ .LAYOUT FILEIN; .FIELD Employee_No * CHAR(11) ; .FIELD First_Name * CHAR(14); .FIELD Last_Name * CHAR(20); .FIELD Dept_No * CHAR(6); .FIELD Dept_Name * CHAR(20);

Names the LAYOUT of the INPUT file.

/* Begin INSERT Process on Table */

Names the DML Label

.DML LABEL INSERTS IGNORE DUPLICATE INSERT ROWS;

Tells MultiLoad NOT TO LIST duplicate INSERT

Defines the structure of the INPUT file. Notice the dots before the FIELD command and the semicolons after each FIELD definition.


Page 116

rows in the error table; notice the option is placed AFTER the LABEL identification and immediately BEFORE the DML function. INSERT INTO SQL01.Employee_Dept ( Employee_No ,First_Name ,Last_Name ,Dept_No ,Dept_Name ) VALUES ( :Employee_No ,:First_Name, ,:Last_Name, ,:Dept_No, ,:Dept_Name );

Lists, in order, the VALUES to be INSERTed.

/* Specify IMPORT File and Apply Parameters */ .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS;

Names the Import File and States its Format type; names the Layout file to use and tells MultiLoad to APPLY the INSERTs.

.END MLOAD; .LOGOFF;

Ends MultiLoad and logs off of Teradata

An UPSERT Sample Script

The following sample script is provided to demonstrate how do an UPSERT — that is, to update a table and if a row from the data source table does not exist in the target table, then insert a new row. In this instance we are loading the Student_Profile table with new data for the next semester. The clause ―DO INSERT FOR MISSING UPDATE ROWS‖ indicates an UPSERT. The DML statements that follow this option must be in the order of a single UPDATE statement followed by a single INSERT statement. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 117

/* Setup Logtable, Logon Statements*/ .LOGTABLE SQL01.CDW_Log; .LOGON CDW/SQL01,SQL01;

Sets Up a Logtable and then logs on to Teradata. Specifies the database to work in (optional).

DATABASE SQL01; Begins the Load Process by telling us first the names of the target table, work table and error tables. Names the LAYOUT of the INPUT file; An AL L CHARACTER based flat file. Defines the structure of the INPUT file; Notice the dots before the FIELD command and the semi-colons after each FIELD definition;

/* Begin INSERT and UPDATE Process on Table Names the DML Label */ Tells MultiLoad to INSERT a row if there .DML LABEL UPSERTER is not one to be DO INSERT FOR MISSING UPDATE UPDATED, i.e., ROWS; UPSERT. /* Without the above DO, one of these is guaranteed to fail on this same table. If the UPDATE fails because rows is missing, it corrects by doing the INSERT */ Defines the UPDATE. UPDATE SQL01.Student_Profile SET Last_Name = :Last_Name ,First_Name = :First_Name ,Class_Code = :Class_Code

Qualifies the UPDATE.


Page 118

,Grade_Pt = :Grade_Pt WHERE Student_ID = :Student_ID; INSERTINTO SQL01.Student_Profile VALUES ( :Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt );

Defines the INSERT. We recommend placing comma separators in front of the following column or value for easier debugging.

/* Specify IMPORT File and Apply Parameters */ Names the Import File and it names the .IMPORT INFILE CDW_EXPORT.DAT Layout file to use and LAYOUT FILEIN tells MultiLoad to APPLY UPSERTER; APPLY the UPSERTs. .END MLOAD; .LOGOFF;

Ends MultiLoad and logs off of Teradata

Troubleshooting MultiLoad Errors — More on the Error Tables

The output statistics in the above example indicate that the load was entirely successful. But that is not always the case. Now we need to troubleshoot in order identify the errors and correct them, if desired. Earlier on, we noted that MultiLoad generates two error tables, the Acquisition Error and the Application error table. You may select from these tables to discover the problem and research the issues. For the most part, the Acquisition error table logs errors that occur during that processing phase. The Application error table lists Unique Primary Index violations, field overflow errors on non-PI columns, and constraint errors that occur in the APPLY phase. MultiLoad error tables not only list the errors they encounter, they also have the capability to STORE those errors. Do you remember the MARK and IGNORE parameters? This is where they come into play. MARK will ensure that the error rows, along with some details about the errors are stored in the error table. IGNORE does neither; it is as if the error never occurred. THREE COLUMNS SPECIFIC TO THE ACQUI SI TI ON ERROR TABLE Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 119

ErrorCode

System code that identifies the error.

ErrorField

ame of the column in the target table where the error happened; is Left blank if the offending column cannot be identified.

HostData

The data row that contains the error.

Figure 5-19 THREE COLUMNS SPECIFIC TO THE APPL I CATI ON ERROR TABLE

Uniqueness

Contains a certain value that disallows duplicate row errors in this table; can be ignored, if desired.

DBCErrorCode

System code that identifies the error.

DBCErrorField

ame of the column in the target table where the error happened; is left blank if the offending column cannot be identified. NOTE: A copy of the target table column immediately follows this column.

RESTARTing MultiLoad Who hasn‘t experienced a failure at some time when attempting a load? Don‘t take it personally! Failures can and do occur on the host or Teradata (DBC) for many reasons. MultiLoad has the impressive ability to RESTART from failures in either environment. In fact, it requires almost no effort to continue or resubmit the load job. Here are the factors that determine how it works: First, MultiLoad will check the Restart Logtable and automatically resume the load process from the last successful CHECKPOINT before the failure occurred. Remember, the Logtable is essential for restarts. MultiLoad uses neither the Transient Journal nor rollbacks during a failure. That is why you must designate a Logtable at the beginning of your script. MultiLoad either restarts by itself or waits for the user to resubmit the job. Then MultiLoad takes over right where it left off. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 120

Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the host program will restart MultiLoad after Teradata is back up and running. You do not have to do a thing! Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you may simply resubmit the script without changing a thing. MultiLoad will find out where it stopped and start again from that very spot. Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to run until complete. Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOAD clause will be enacted. The results are stored in the Logtable. During the Application Phase, CHECKPOINTs are logged each time a data block is successfully written to its target table. HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT as 60 or less, minutes are assumed. If you specify the checkpoint at 61 or above, the number of records is assumed.

RELEASE MLOAD — When You DON'T Want to Restart MultiLoad

What if a failure occurs but you do not want to RESTART MultiLoad? Since MultiLoad has already updated the table headers, it assumes that it still ―owns‖ them. Therefore, it limits access to the table(s). So what is a user to do? Well there is good news and bad news. The good news is that if the job you may use the RELEASE MLOAD command to release the locks and rollback the job. The bad news is that if you have been loading multiple millions of rows, the rollback may take a lot of time. For this reason, most customers would rather just go ahead and RESTART. Before V2R3: In the earlier days of Teradata it was NOT possible to use RELEASE MLOAD if one of the following three conditions was true: In IMPORT mode, once MultiLoad had reached the end of the Acquisition Phase you could not use RELEASE MLOAD. This is sometimes referred to as the ―point of no return.‖ In DELETE mode, the point of no return was when Teradata received the DELETE statement. If the job halted in the Apply Phase, you will have to RESTART the job. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 121

With and since V2R3: The advent of V2R3 brought new possibilities with regard to using the RELEASE MLOAD command. It can NOW be used in the APPLY Phase, if:

You are running a Teradata V2R3 or later version You use the correct syntax: RELEASE MLOAD IN APPLY

The load script has NOT been modified in any way The target tables either: Must be empty, or Must have no Fallback, no NUSIs, no Permanent Journals You should be very cautious using the RELEASE command. It could potentially leave your table half updated. Therefore, it is handy for a test environment, but please don‘t get too reliant on it for production runs. They should be allowed to finish to guarantee data integrity. MultiLoad and INMODs

INMODs, or Input Modules, may be called by MultiLoad in either mainframe or LAN environments, providing the appropriate programming languages are used. INMODs are user written routines whose purpose is to read data from one or more sources and then convey it to a load utility, here MultiLoad, for loading into Teradata. They allow MultiLoad to focus solely on loading data by doing data validation or data conversion before the data is ever touched by MultiLoad. INMODs replace the normal MVS DDNAME or LAN file name with the following statement: .IMPORT INMOD=

You will find a more detailed discussion on how to write INMODs for MultiLoad in ―Teradata Utilities: Breaking The Barriers‖. How MultiLoad Compares with FastLoad Function

FastLoad

MultiLoad

Error Tables must be defined

Yes

Optional.


Page 122

2 Error Tables have to exist for each target table and will automatically be assigned. Work Tables must be defined

No

Optional. 1 Error Table has to exist for each target table and will automatically be assigned.

Logtable must be defined

No

Yes

Allows Referential Integrity

No

No

Allows Unique Secondary Indexes No

No

Allows Non-Unique Secondary Indexes

No

Yes

Allows Triggers

No

No

Loads a maximum of n number of One tables

Five

DML Statements Supported

INSERT

INSERT, UPDATE, DELETE, and ―UPSERT―

DDL Statements Supported

CREATE and DROP DROP TABLE TABLE

Transfers data in 64K blocks

Yes

Yes

Number of Phases

Two

Five

Is RESTARTable

Yes

Yes, in all 5 phases (auto CHECKPOINT)

Stores UPI Violation Rows

Yes

Yes

Allows use of Aggregated,

No

Yes


Page 123

Arithmetic calculations or Conditional Exponentiation Allows Data Conversion

Yes, 1 per column

Yes

NULLIF function

Yes

Yes

T-Pump: An Introduction to TPump

The chemistry of relationships is very interesting. Frederick Buechner once stated, ―My assumption is that the story of any one of us is in some measure the story of us all.‖ In this chapter, you will find that TPump has similarities with the rest of the family of Teradata utilities. But this newer utility has been designed with fewer limitations and many distinguishing abilities that the other load utilities do not have. Do you remember the first Swiss ArmyTM knife you ever owned? Aside from its original intent as a compact survival tool, this knife has thrilled generations with its multiple capabilities. TPump is the Swiss Army TM knife of the Teradata load utilities. Just as this knife was designed for small tasks, TPump was developed to handle batch loads with low volumes. And, just as the Swiss ArmyTM knife easily fits in your pocket when you are loaded down with gear, TPump is a perfect fit when you have a large, busy system with few resources to spare. Let‘s look in mo re detail at the many facets of this amazing load tool. Why It Is Called “TPump”

TPump is the shortened name for the load utility Teradata Parallel Data Pump. To understand this, you must know how the load utilities move the data. Both FastLoad and MultiLoad assemble massive volumes of data rows into 64K blocks and then moves those blocks. Picture in your mind the way that huge ice blocks used to be floated down long rivers to large cities prior to the advent of refrigeration. There they were cut up and distributed to the people. TPump does NOT move data in the large blocks. Instead, it loads data one row at a time , using row hash locks. Because it locks at this level, Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 124

and not at the table level like MultiLoad, TPump can make many simultaneous, or concurrent, updates on a table. Envision TPump as the water pump on a well. Pumping in a very slow, gentle manner results in a steady trickle of water that could be pumped into a cup. But strong and steady pumping results in a powerful stream of water that would require a larger container. TPump is a data pump which, like the water pump, may allow either a trickle-feed of data to flow into the warehouse or a strong and steady stream. In essence, you may ―throttle‖ the flow of data based upon your system and business user requirements. Remember, TPump is THE PUMP! TPump Has Many Unbelievable Abilities Just in Time: Transactional systems, such those implemented for ATM machines or Point-of-Sale terminals, are known for their tremendous speed in executing transactions. But how soon can you get the information pertaining to that transaction into the data warehouse? Can you afford to wait until a nightly batch load? If not, then TPump may be the utility that you are looking for! TPump allows the user to accomplish near real-time updates from source systems into the Teradata data warehouse. Throttle-switch Capability: What about the throttle capability that was mentioned above? With TPump you may stipulate how many updates may occur per minute. This is also called the statement rate. In fact, you may change the statement rate during the job, ―throttling up‖ the rate with a higher number, or ―throttling down‖ the number of updates with a lower one. An example: Having this capability, you might want to throttle up the rate during the period from 12:00noon to 1:30 PM when most of the users have gone to lunch. You could then lower the rate when they return and begin running their business queries. This way, you need not have such clearly defined load windows, as the other utilities require. You can have TPump running in the background all the time, and just control its flow rate. DML Functions: Like MultiLoad, TPump does DML functions, including INSERT, UPDATE and DELETE. These can be run solo, or in combination with one another. Note that it also supports UPSERTs like MultiLoad. But here is one place that TPump differs vastly from the other utilities: FastLoad can only load one table and MultiLoad can load up to five tables. But, when it pulls data from a single source, TPump can load more than 60 tables at a Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 125

time! And the number of concurrent instances in such situations is unlimited. That‘s right, not 15, but unlimited for Teradata ! Well OK, maybe by your computer. I cannot imagine my laptop running 20 TPump jobs, but Teradata does not care. How could you use this ability? Well, imagine partitioning a huge table horizontally into multiple smaller tables and then performing various DML functions on all of them in parallel. Keep in mind that TPump places no limit on the number of sessions that may be established. Now, think of ways you might use this ability in your data warehouse environment. The possibilities are endless. More benefits: Just when you think you have pulled out all of the options on a Swiss ArmyTM knife, there always seems to be just one more blade or tool you had not noticed. Similar to the knife, TPump always seems to have another advantage in its list of capabilities. Here are several that relate to TPump requirements for target tables. TPump allows both Unique and NonUnique Secondary Indexes (USIs and NUSIs), unlike FastLoad, which allows neither, and MultiLoad, which allows just NUSIs. Like MultiLoad, TPump allows the target tables to either be empty or to be populated with data rows. Tables allowing duplicate rows (MULTISET tables) are allowed. Besides this, Referential Integrity is allowed and need not be dropped. As to the existence of Triggers, TPump says, ―No problem!‖ Support Environment compatibility: The Support Environment (SE) works in tandem with TPump to enable the operator to have even more control in the TPump load environment. The SE coordinates TPump activities, assists in managing the acquisition of files, and aids in the processing of conditions for loads. The Support Environment aids in the execution of DML and DDL that occur in Teradata, outside of the load utility. Stopping without Repercussions: Finally, this utility can be stopped at any time and all of locks may be dropped with no ill consequences. Is this too good to be true? Are there no limits to this load utility? TPump does not like to steal any thunder from the other load utilities, but it just might become one of the most valuable survival tools for businesses in today‘s data warehouse environment. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 126

TPump Has Some Limits

TPump has rightfully earned its place as a superstar in the family of Teradata load utilities. But this does not mean that it has no limits. It has a few that we will list here for you: Rule #1: No concatenation of input data files is allowed. TPump is not designed to support this. Rule #2: TPump will not process aggregates, arithmetic functions or exponentiation. If you need data conversions or math, you might consider using an INMOD to prepare the data prior to loading it. Rule #3: The use of the SELECT function is not allowed. You may not use SELECT in your SQL statements. Rule #4: No more than four IMPORT commands may be used in a single load task. This means that a most, four files can be directly read in a single run. Rule #5: Dates before 1900 or after 1999 must be represented by the yyyy format for the year portion of the date, not the default format of yy. This must be specified when you create the table. Any dates using the default yy format for the year are taken to mean 20 th century years. Rule #6: On some network attached systems, the maximum file size when using TPump is 2GB. This is true for a computer running under a 32 bit operating system. Rule #7: TPump performance will be diminished if Access Logging is used. The reason for this is that TPump uses normal SQL to accomplish its tasks. Besides the extra overhead incurred, if you use Access Logging for successful table updates, then Teradata will make an entry in the Access Log table for each operation. This can cause the potential for row hash conflicts between the Access Log and the target tables.

A Simple TPump Script — A Look at the Basics

Setting up a Logtable and Logging onto Teradata Begin load process, add Parameters, naming the error table Defining the INPUT flat file Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 127

Defining the DML activities to occur Naming the IMPORT file and defining its FORMAT Telling TPump to use a particular LAYOUT Telling the system to start loading data rows Finishing loading and logging off of Teradata The following script assumes the existence of a Student_Names table in the SQL01 database. You may use pre-existing target tables when running TPump or TPump may create the tables for you. In most instances you will use existing tables. The CREATE TABLE statement for this table is listed for your convenience.

Much of the TPump command structure should look quite familiar to you. It is quite similar to MultiLoad. In this example, the Student_Names table is being loaded with new data from the university‘s registrar. It will be used as an associative table for linking various tables in the data warehouse. /* This script inserts rows into a table Sets Up a Logtable and then logs called student_names from a single file */ on with .RUN. .LOGTABLE WORK_DB.LOG_PUMP; The logon.txt file contains: .logon .RUN FILE C:\mydir\logon.txt; TDATA/SQL01,SQL01;. DATABASE SQL01;

Also specifies the database to find the necessary tables.

.BEGIN LOAD ERRLIMIT 5 CHECKPOINT 1 SESSIONS 64 TENACITY 2 PACK 40 RATE 1000

Begins the Load Process; Specifies optional parameters.


Page 128

ERRORTABLE SQL01.ERR_PUMP;

ERRORTABLE names the error table for this run. Names the LAYOUT of the INPUT record; Notice the dots before the .FIELD and .FILLER commands and the semi-colons after each FIELD definition. Also, the more_junk field moves the field pointer to the start of the First_name data. Notice the comment in the script. Names the DML Label Tells TPump to INSERT a row into the target table and defines the row format; Comma separators are placed in front of the following column or value for easier debugging. Lists, in order , the VALUES to be INSERTed. Colons precede VALUEs. Names the IMPORT file; Names the LAYOUT to be called from above; tells TPump which DML Label to APPLY.

.END LOAD; .LOGOFF;

Tells TPump to stop loading and logs off all sessions.

Figure 6-4 Step One: Setting up a Logtable and Logging onto Teradata — First, you define the Logtable using the .LOGTABLE command. We have named it LOG_PUMP in the WORK_DB database. The Logtable is automatically created for you. It may be placed in any database by qualifying the table Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 129

name with the name of the database by using syntax like this: . Next, the connection is made to Teradata. Notice that the commands in TPump, like those in MultiLoad, require a dot in front of the command key word. Step Two: Begin load process, add Parameters, naming the Error Table — Here, the script reveals the parameters requested by the user to assist in managing the load for smooth operation. It also names the one error table, calling it SQL01.ERR_PUMP. Now let ‘s look at each parameter:

ERRLIMIT 5 says that the job should terminate after encountering five errors. You may set the limit that is tolerable for the load. CHECKPOINT 1 tells TPump to pause and evaluate the progress of the load in increments of one minute. If the factor is between 1 and 60, it refers to minutes. If it is over 60, then it refers to the number of rows at which the checkpointing should occur. SESSIONS 64 tells TPump to establish 64 sessions with Teradata. TENACITY 2 says that if there is any problem establishing sessions, then to keep on trying for a period of two hours. PACK 40 tells TPump to ―pack‖ 40 data rows and load them at one time.

RATE 1000 means that 1,000 data rows will be sent per minute. Step Three: Defining the INPUT flat file structure — TPump, like MultiLoad, needs to know the structure the INPUT flat file record. You use the .LAYOUT command to name the layout . Following that, you list the columns and data types of the INPUT file using the .FIELD, .FILLER or .TABLE commands. Did you notice that an asterisk is placed between the column name and its data type? This means to automatically calculate the next byte in the record. It is used to designate the starting location for this data based on the previous field‘s length. If you are listing fields in order and need to skip a few bytes in the record, you can either use the .FILLER with the correct number of bytes as character to position to the cursor to the next field, or the ―*‖ can be replaced by a number that equals the lengths of all previous fields added together plus 1 extra byte. When you use this technique, the .FILLER is not needed. In our example, this says to begin Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 130

with Student_ID, continue on to load Last_Name, and finish when First_Name is loaded. Step Four: Defining the DML activities to occur — At this point, the .DML LABEL names and def ines the SQL that is to execute. It also names the columns receiving data and defines the sequence in which the VALUES are to be arranged. In our example, TPump is to INSERT a row into the SQL01.Student_NAMES. The data values coming in from the record are named in the VALUES with a colon prior to the name. This provides the PE with information on what substitution is to take place in the SQL. Each LABEL used must also be referenced in an APPLY clause of the .IMPORT clause. Step Five: Naming the INPUT file and defining its FORMAT — Using the .IMPORT INFILE command, we have identified the INPUT data file as ―CDW_Export .txt‖. The file was cre ated using the TEXT format. Step Six: Associate the data with the description — Next, we told the IMPORT command to use the LAYOUT called, ―FILELAYOUT.‖ Step Seven: Telling TPump to start loading — Finally, we told TPump to APPLY the DML LABEL called INSREC — that is, to INSERT the data rows into the target table. Step Seven: Finishing loading and logging off of Teradata — The .END LOAD command tells TPump to finish the load process. Finally, TPump logs off of the Teradata system. 2A0C022B00000

TPump Script with Error Treatment Options

/* Setup the TPUMP Logtables, Logon Statements Sets up a Logtable and and Database Default */ then logs on to Teradata. .LOGTABLE SQL01.LOG_PUMP; .LOGON CDW/SQL01,SQL01; DATABASE SQL01;

Specifies the database containing the table.

/* Begin Load and Define TPUMP Parameters and Error Tables */ Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 131

.BEGIN LOAD ERRLIMIT 5 CHECKPOINT 1 SESSIONS 1 TENACITY 2 PACK 40 RATE 1000 ERRORTABLE SQL01.ERR_PUMP;

BEGINS THE LOAD PROCESS SPECIFIES MULTIPLE PARAMETERS TO AID IN PROCESS CONTROL NAMES THE ERRROR TABLE; TPump HAS ONLY ONE ERROR TABLE.

.LAYOUT FILELAYOUT; .FIELD Student_ID * VARCHAR (11); .FIELD Last_Name * VARCHAR (20); .FIELD First_Name * VARCHAR (14); .FIELD Class_Code * VARCHAR (2); .FIELD Grade_Pt * VARCHAR (8);

Names the LAYOUT of the INPUT file.

.DML LABEL INSREC IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS;

Names the DML Label; SPECIFIES 3 ERROR TREATMENT OPTIONS with the ; after the last option.

INSERTINTO Student_Profile4 ( Student_ID ,Last_Name ,First_Name ,Class_Code ,Grade_Pt ) VALUES ( :Student_ID ,:Last_Name

Defines the structure of the INPUT file; here, all Variable CHARACTER data and the file has a comma delimiter. See .IMPORT below for file type and the declaration of the delimiter.

Tells TPump to INSERT a row into the target table and defines the row format. Note that we place comma separators in front of the following column


Page 132

,:First_Name ,:Class_Code ,:Grade_Pt );

or value for easier debugging. Lists, in order, the VALUES to be INSERTed. A colon always precedes values.

.IMPORT INFILE CDW_Export.txt FORMAT VARTEXT ‗,‘ LAYOUT FILELAYOUT APPLY INSREC;

Names the IMPORT file; Names the LAYOUT to be called from above; Tells TPump which DML Label to APPLY. Notice the FORMAT with a comma in the quotes to define the delimiter between fields in the input record.

.END LOAD; .LOGOFF;

Tells TPump to stop loading and Logs Off all sessions.

A TPump UPSERT Sample Script Begins the load process Specifies multiple parameters to aid in load management Names the error table; TPump HAS ONLY ONE ERROR TABLE PER TARGET TABLE Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 133


Page 134

A TPump UPSERT Sample Script Sets Up a Logtable and then logs on to Teradata. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 135

Begins the load process Specifies multiple parameters to aid in load management Names the error table; TPump HAS ONLY ONE ERROR TABLE PER TARGET TABLE Defines the LAYOUT for the 1st INPUT file; also has the indicators for NULL data. st

Names the 1 DML Label and specifies 2 Error Treatment options. Tells TPump to INSERT a row into the target table and defines the row format. Lists, in order, the VALUES to be INSERTed. A colon always precedes values. Names the Import File as UPSERT-FILE.DAT. The file name is under Windows so the ―-― is fine. The file type is FASTLOAD. .END LOAD; .LOGOFF;

Tells TPump to stop loading and logs off all


Page 136

sessions.

NOTE: The above UPSERT uses the same syntax as MultiLoad. This continues to work. However, there might soon be another way to accomplish this task. NCR has built an UPSERT and we have tested the following statement, without success:

We are not sure if this will be a future technique for coding a TPump UPSERT, or if it is handled internally. For now, use the original coding technique. Monitoring TPump

TPump comes with a monitoring tool called the TPump Monitor. This tool allows you to check the status of TPump jobs as they run and to change (remember ―throttle up‖ and ―throttle down?‖) the statement rate on the fly. Key to this monitor is the ―SysAdmin.TpumpStatusTbl‖ table in the Data Dictionary Directory. If your Database Administrator creates this table, TPump will update it on a minute-by-minute basis when it is running. You may update the table to change the statement rate for an IMPORT. If you want TPump to run unmonitored, then the table is not needed. You can start a monitor program under UNIX with the following command:

Below is a chart that shows the Views and Macros used to access the ―SysAdmin.TpumpStatusTbl‖ table. Queries may be written against the Views. The macros may be executed. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 137

Views and Macros to access the table SysAdmin.TpumpStatusTbl

View

SysAdmin.TPumpStatus

View

SysAdmin.TPumpStatusX

Macro

Sysadmin.TPumpUpdateSelect

Macro

TPumpMacro.UserUpdateSelect

Handling Errors in TPump Using the Error Table One Error Table

Unlike FastLoad and MultiLoad, TPump uses only ONE Error Table per target table, not two. If you name the table, TPump will create it automatically. Entries are made to these tables whenever errors occur during the load process. Like MultiLoad, TPump offers the option to either MARK errors (include them in the error table) or IGNORE errors (pay no attention to them whatsoever). These options are listed in the .DML LABEL sections of the script and apply ONLY to the DML functions in that LABEL. The general default is to MARK. If you specify nothing, TPump will assume the default. When doing an UPSERT, this default does not apply. The error table does the following: Identifies errors Provides some detail about the errors Stores a portion the actual offending row for debugging When compared to the error tables in MultiLoad, the TPump error table is most similar to the MultiLoad Acquisition error table. Like that table, it stores information about errors that take place while it is trying to acquire data. It is the errors that occur when the data is being moved, such as data translation problems that TPump will want to report on. It will also want to report any difficulties compiling valid Primary Indexes. Remember, TPump has less tolerance for errors than FastLoad or MultiLoad. COLUMNS IN THE TPUMP ERROR TABLE

ImportSeq

Sequence number that identifies the IMPORT command where the error occurred


Page 138

DMLSeq

Sequence number for the DML statement involved with the error

SMTSeq

Sequence number of the DML statement being carried out when the error was discovered

ApplySeq

Sequence number that tells which APPLY clause was running when the error occurred

SourceSeq

The number of the data row in the client file that was being built when the error took place

DataSeq

Identifies the INPUT data source where the error row came from

ErrorCode

System code that identifies the error

ErrorMsg

Generic description of the error

ErrorField

umber of the column in the target table where the error happened; is left blank if the offending column cannot be identified; This is different from MultiLoad, which supplies the column name.

HostData

The data row that contains the error, limited to the first 63,728 bytes related to the error

Common Error Codes and What They Mean

TPump users often encounter three error codes that pertain to: Missing data rows Duplicate data rows Extra data rows Become familiar with these error codes and what they mean. This could save you time getting to the root of some common errors you could see in your future! #1: Error 2816: Failed to insert duplicate row into TPump Target Table. Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 139

Nothing is wrong when you see this error. In fact, it can be a very good thing. It means that TPump is notifying you that it discovered a DUPLICATE row. This error jumps to life when one of the following options has been stipulated in the .DML LABEL: MARK DUPLICATE INSERT ROWS MARK DUPLICATE UPDATE ROWS Note that the original row will be inserted into the target table, but the duplicate row will not. #2: Error 2817: Activity count greater than ONE for TPump UPDATE/DELETE.

Sometimes you want to know if there were too may ―successes.‖ This is the case when there are EXTRA rows when TPump is attempting an UPDATE or DELETE.

TPump will log an error whenever it sees an activity count greater than zero for any such extra rows if you have specified either of these options in a .DML LABEL: MARK EXTRA UPDATE ROWS MARK EXTRA DELETE ROW At the same time, the associated UPDATE or DELETE will be performed. #3: Error 2818: Activity count zero for TPump UPDATE or DELETE.

Sometimes, you want to know if a data row that was supposed to be updated or deleted wasn‘t! That is when you want to know that the activity count was zero, indicating that the UPDATE or DELETE did not occur. To see this error, you must have used one of the following parameters: MARK MISSING UPDATE ROWS MARK MISSING DELETE ROWS .BEGIN LOAD Parameters UNIQUE to TPump

MACRODB

This parameter identifies a database that will contain any macros utilized by TPump. Remember, TPump does not run the SQL statements by itself. It places


Page 140

them into Macros and executes those Macros for efficiency. NOMONITOR

Use this parameter when you wish to keep TPump from checking either statement rates or update status information for the TPump Monitor application.

PACK (n)

Use this to state the number of statements TPump will ―pack‖ into a multiple-statement request. Multistatement requests improve efficiency in either a network or channel environment because it uses fewer sends and receives between the application and Teradata.

RATE

This refers to the Statement Rate. It shows the initial maximum number of statements that will be sent per minute. A zero or no number at all means that the rate is unlimited. If the Statement Rate specified is less than the PACK number, then TPump will send requests that are smaller than the PACK number.

ROBUST ON/OFF ROBUST defines how TPump will conduct a RESTART. ROBUST ON means that one row is written to the Logtable for every SQL transaction. The downside of running TPump in ROBUST mode is that it incurs additional, and possibly unneeded, overhead. ON is the default. If you specify ROBUST OFF, you are telling TPump to utilize ―simple‖ RESTART logic: Just start from the last successful CHECKPOINT. Be aware that if some statements are reprocessed, such as those processed after the last CHECKPOINT, then you may end up with extra rows in your error tables. Why? Because some of the statements in the original run may have already have found errors, in which case they would have recorded those errors in an error table.

TPump and MultiLoad Comparison Chart


Page 141

Function

MultiLoad

TPump

Error Tables must be defined

Optional, 2 per target table

Optional, 1 per target table

Work Tables must be defined

Optional, 1 per target No table

Logtable must be defined

Yes

Yes

Allows Referential Integrity

No

Yes

Allows Unique Secondary Indexes

No

Yes

Allows Non-Unique Secondary Yes Indexes

Yes

Allows Triggers

Yes

No

Loads a maximum of n number Five of tables

60

Maximum Concurrent Load Instances

15

Unlimited

Locks at this level

Table

Row Hash

DML Statements Supported

INSERT, UPDATE, INSERT, UPDATE, DELETE, ―UPSERT― DELETE, ―UPSERT―

How DML Statements are Performed

Runs actual DML commands

Compiles DML into MACROS and executes

DDL Statements Supported

All

All

Transfers data in 64K blocks

Yes

No, moves data at row level

RESTARTable

Yes

Yes

Stores UPI Violation Rows

Yes, with MARK option

Yes, with MARK option

Allows use of Aggregated, Arithmetic calculations or Conditional Exponentiation

No

No

Allows Data Conversion

Yes

Yes


Page 142

Performance Improvement Improvement

As data volumes increase

By using multistatement requests

Table Access During Load

Uses WRITE lock on tables in Application Phase

Allows simultaneous READ and WRITE access due to Row Hash Locking

Effects of Stopping the Load

Consequences

No repercussions

Resource Consumption

Hogs available resources

Allows consumption management via Parameters

Some impor tant commands: commands:

ABORT

Abort any and all active running requests and transactions, but do not exit BTEQ.

DEFAULTS

Reset all BTEQ Format command command options to their defaults. This will utilize the default configurations.

LOGOFF

End the current session or sessions, but do not exit BTEQ.

LOGON

Starts a BTEQ Session. Every user, application, or utility must LOGON to Teradata to establish a session.

QUIT

End the current session or sessions and exit BTEQ.

SESSIONS

Specifies the number of sessions to use with the next LOGON command.

ERROROUT

Write error messages to a specific output file.

EXPORT

Open a file with a specific format to transfer information directly from the Teradata database.

FORMAT

Enable/inhibit the page-oriented format command options.


Page 143

IMPORT

Open a file with a specific format to import information into Teradata.

INDICDATA

One of multiple data mode options for data selected from Teradata. The modes are INDICDATA, FIELD, or RECORD MODE.

QUIET

Limit BTEQ output displays to all error messages and request processing statistics.

REPEAT

Submit the next request a certain amount of times

RUN

Execute Teradata SQL requests and BTEQ commands directly from a specified run file.

ABORT

Abort any active transactions and requests.

ERRORLEVEL

Assign severity levels to particular error numbers.

EXIT


GOTO

Skip all intervening commands and resume after branching forward to the specified label.

HANG

Pause BTEQ processing for a specific amount of time.

IF…THEN

Test a stated condition, and then resume processing based on the test results.

LABEL

The GOTO command will always GO directly TO a particular line of code based on a label.

MAXERROR

Specifies a maximum allowable error severity level.

QUIT


REPEAT

Submit the next request a certain amount of times.

QUIET

Limit BTEQ output displays to all error messages and request processing statistics.

RECORDMODE

One of multiple data mode options for data selected from Teradata. (INDICDATA, FIELD, or RECORD).


Page 144

SEPARATOR

Specifies a character string or specific width of blank characters separating columns of a report.

SUPPRESS

Replace each and every consecutively consecutively repeated value with completely-blank character strings.

ACCEPT

Allows the value of utility variables to be accepted directly from a file or from environmental environmental variables.

LOGON

LOGON command or string used to connect sessions established through the FastExport utility.

LOGTABLE

FastExport utilizes this to specify a restart log table. The purpose is for FastExport checkpoint information.

RUN FILE

Used to point to a file that FastExport is to use as standard input. This will Invoke the specified external file as the current source of utility and Teradata SQL commands. commands.

SET

Assigns a data type and value to a variable.

FIELD

Constitutes a field in the input record section that provides data values for the SELECT statement.

FILLER

Specifies a field in the input record that will not be sent to Teradata for processing. It is part of the input record to provide data values for the SELECT statement.

LAYOUT

Specifies the data layout for a file. It contains a sequence of FIELD and FILLER commands. This is used to describe the import file that can optionally provide data values for the SELECT.

BEGIN LOADING This identifies and locks the FastLoad target table for the duration of the load. It also identifies the two error tables to be used for the load. CHECKPONT and INDICATORS are subordinate s ubordinate commands commands in the BEGIN LOADING clause of the script. CHECKPOINT, which will be discussed below in detail, is not the default for FastLoad. It must be Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Ameerpet, Hyderabad. ph-8374187525

Page 145

specified in the script. INDICATORS is a keyword related to how FastLoad handles nulls in the input file. It identifies columns with nulls and uses a bitmap at the beginning of each row to show which fields contain a null instead of data. When the INDICATORS option is on, FastLoad looks at each bit to identify the null column. The INDICATORS option does not work with VARTEXT.

DEFINE

This names the Input file and describes the columns in that file and the data types for those columns.

DELETE

Deletes all the rows of a table. This will only work in the initial run of the script. Upon restart, it will fail because the table is locked.

DROP TABLE

Drops a table and its data. It is used in FastLoad to drop previous Target and error tables. At the same time, this is not a good thing to do within a FastLoad script since it cancels the ability to restart.

ERRLIMIT

Specifies the maximum number of rejected ROWS allowed in error table 1 (Phase I). This handy command can be a lifesaver when you are not sure how corrupt the data in the Input file is. The more corrupt it is, the greater the clean up effort required after the load finishes. ERRLIMIT provides you with a safety valve. You may specify a particular number of error rows beyond which FastLoad will immediately precede to the abort. This provides the option to restart the FastLoad or to scrub the input data more before loading it. Remember, all the rows in the error table are not in the data table. That becomes your responsibility.

HELP

Designed for online use, the Help command provides a list of all possible FastLoad commands along with brief, but pertinent tips for using them.

HELP TABLE

Builds the table columns list for use in the FastLoad DEFINE statement when the data matches the Create Table statement exactly. In real life this does


Page 146

not happen very often.

INSERT

This is FastLoad’s favorite command! It inserts rows into the target table.

SLEEP

Working in conjunction with TENACITY, the SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions. This situation can occur if all of the loader slots are used or if the number of requested sessions are not available. The default is 6 minutes. For example, suppose that Teradata sessions are already maxed-out when your job is set to run. If TENACITY were set at 4 and SLEEP at 10, then FastLoad would attempt to logon every 10 minutes for up to 4 hours. If there were no success by that time, all efforts to logon would cease.

TENACITY

Sometimes there are too many sessions already established with Teradata for a FastLoad to obtain the number of sessions it requested to perform its task or all of the loader slots are currently used. TENACITY specifies the amount of time, in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The default for FastLoad is ―no tenacity‖, meaning that it will not retry at all. If several FastLoad jobs are executed at the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon for the number of sessions requested for up to four hours.

.BEGIN [IMPORT] MLOAD .BEGIN DELETE MLOAD

Task This command communicates directly with Teradata to specify if the MultiLoad mode is going to be IMPORT or DELETE. Note that the word IMPORT is optional in the syntax because it is the DEFAULT, but DELETE is required . We recommend using the word IMPORT to make the coding consistent and easier for others to read. Any parameters for the load, such as error limits or checkpoints will be included under the .BEGIN command, too. It is important to know which commands or parameters are optional since, if you do not include them,


Page 147

MultiLoad may supply defaults that may impact your load. .DML LABEL

Task The DML LABEL defines treatment options and labels for the application (APPLY) of data for the INSERT, UPDATE, UPSERT and DELETE operations. A LABEL is simply a name for a requested SQL activity. The LABEL is defined first, and then referenced later in the APPLY clause.

.END MLOAD

Task This instructs MultiLoad to finish the APPLY operations with the changes to the designated databases and tables.

.FIELD

Task This defines a column of the data source record that will be sent to the Teradata database via SQL. When writing the script, you must include a FIELD for each data field you need in SQL. This command is used with the LAYOUT command.

Bteq scripts: Simple script: .RUN FILE = mylogon.txt (127.0.0.1/database name then password) DATABASE tmp; DELETE FROM Employee_Table; .IF ERRORCODE = 0 THEN .GOTO INSEMPS /* ERRORCODE is a reserved word that contains the outcome status for every SQL statement executed in BTEQ. A zero (0) indicates that statement worked. */ Create Table Employee_Table (Employee_No Integer, Last_name char(20), First_name char(20), Salary Decimal(8,2), Dept_No SmallInt) Unique Primary Index (Employee_No); .LABEL INSEMPS INSERT INTO Employee_Table (1232578, 'Chambers', 'Mandee', 48850.00, 100); INSERT INTO Employee_Table (1256349, 'Harrison' ,'Herbert', 54500.00, 400); .QUIT

Bteqexport script: exporting a file from database to a parameter file .run file = mylogon.txt database tmp; .export indicdata file= sample1ex.txt sel * from employee_table; .export reset .logoff exit;

Bteq import: importing a parameter file into a database, Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 148

Program: .Run file = mylogon.txt database tmp(name); .import indicdata(mode) file = sample1ex.txt .quiet on .repeat * using eno (integer),f_name(char(20)),l_name(char(20)),sal(decimal(8,2)),deptno (smallint) insert into employee_table(employee_no,first_name,last_name,salary,dept_no) values(:eno,:f_name,:l_name,:sal,:deptno); .quit

Fast export scripts: Data: ct t1(x1 int,y1 char(10), z1 decimal(9,4)); ;ins t1(1,'Netezza' , 600.0000) ;ins t1(2,'Netezza' , 600.0000) ;ins t1(3,'teradata', 500.0000) ;ins t1(4,'Netezza' , 600.0000) ;ins t1(5,'DB2' , 500.0000) ;

Fast export using set command: .LOGTABLE tmp.RestartLog1_fxp; .logon 127.0.0.1/dbc,dbc ; database tmp; .SET YY TO 'Netezza'; .SET ZZ TO 600.0000; .BEGIN EXPORT SESSIONS 4 ; .EXPORT OUTFILE FXP_DEF.OUT; SELECT x1,y1,z1 FROM T1 WHERE y1 = '&YY' AND z1 = &ZZ ORDER BY 1 ; .END EXPORT ; .LOGOFF ;

Fast export using acceptcommand: .LOGTABLE tmp.RestartLog1_fxp; .logon 127.0.0.1/dbc,dbc ; database tmp; Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 149

.ACCEPT YY, ZZ FROM FILE parmfile.txt; .BEGIN EXPORT SESSIONS 4 ; .EXPORT OUTFILE FXP_DEF_ACCEPT.out; SELECT x1,y1,z1 FROM T1 WHERE y1 = '&YY' AND z1 = &ZZ ORDER BY 1 ; .END EXPORT ; .LOGOFF ;

Fast export using layout command: .LOGTABLE tmp.RestartLog1_fxp; .logon 127.0.0.1/dbc,dbc ; database tmp; .BEGIN EXPORT SESSIONS 4 ; .LAYOUT Record_Layout; .FIELD YY 1 CHAR(8); .FIELD ZZ * CHAR(8); .IMPORT INFILE 'fexplaydatafile.txt' LAYOUT Record_Layout FORMAT TEXT; .EXPORT OUTFILE FXP_DEF_LAYOUT.txt; SELECT x1,y1,z1 FROM T1 WHERE y1 = :YY AND z1 = :ZZ ORDER BY 1 ; .END EXPORT ; .LOGOFF ;

Fast load scripts: sessions 8;tenacity 4;sleep 3; logon 127.0.0.1/dbc,dbc; errlimit 1000; begin loading tmp.emp_table errorfiles tmp.emp_err1, tmp.emp_err2; define empno (INTEGER),ename (VARCHAR(10)),sal (INTEGER),job (CHAR(10)), loc (CHAR(10)) file=myfexpload.txt; insert into tmp.emp_table values(:empno,:ename,:sal,:job,:loc); end loading; logoff;

fload optimized scripts: LOGON 127.0.0.1/dbc,dbc; Visualpath, #306, Niligiri Block, Aditya Enclave, Ameerpet, Hyderabad. ph-8374187525

Page 150

BEGIN LOADING TMP.T1 ERRORFILES TMP.T1_1, TMP.T1_2; DEFINE FILE=FXP_rec_text.out; HELP TABLE TMP.T1; INSERT INTO TMP.T1.*; END LOADING; LOGOFF;

Multiload scripts using vartxt mode: logtable tmp.t1_log; .logon 127.0.0.1/dbc,dbc; .begin import mload tables tmp.t1 worktables tmp.t1_wrk errortables tmp.t1_er1 tmp.t1_er2 ; .layout internal; .field x1 * varchar(10); .field y1 * varchar(20); .field z1 * varchar(10); .dml label tdmload; insert tmp.t1(x1,y1,z1) values (:x1, :y1, :z1); .import infile md.txt format var text ',' layout internal apply tdmload; .end mload; .logoff;

Multiload scripts using txt mode: .logtable tmp.t1_log; .logon 127.0.0.1/dbc,dbc; .begin import mload tables tmp.t1 worktables tmp.t1_wrk errortables tmp.t1_er1 tmp.t1_er2 ; .layout internal; .field x1 1 Integer; .field y1 13 varchar(20); .field z1 26 decimal(9,4); .dml label tdmload; insert tmp.t1(x1,y1,z1) values (:x1, :y1, :z1); .import infile md.txt format text layout internal apply tdmload; .end mload; .logoff;


Page 151

Teradata Material2

Recommend Documents