Documente Academic
Documente Profesional
Documente Cultură
Chapter 1: Introduction
An Introduction to the Teradata Utilities
"It's not the data load that breaks us down, it's the way you carry it." Tom Coffing Teradata has been doing data transfers to and from the largest data warehouses in the world for close to two decades. While other databases have allowed the loads to break them down, Teradata has continued to set the standards and break new barriers. The brilliance behind the Teradata load utilities is in their power and flexibility. With five great utilities Teradata allows you to pick the utility for the task at hand. This book is dedicated to explaining these utilities in a complete and easy manner. This book has had contributions from over 10 Certified Teradata Masters with experience at over 125 Teradata sites worldwide. Let our experience be your guide. The intent of this book is twofold. The first is to help you write and use the various utilities. A large part of this is taken up with showing the commands and their functionality. In addition, it will show examples using the various utility commands and SQL in conjunction with each other that you will come to appreciate. The second intention is to help you know which utility to use under a variety of conditions. You will learn that some of the utilities use very large blocks to transfer the data either to or from the Teradata Relational Database Management System (RDBMS). From this perspective, they provide a high degree of efficiency using a communications path of either the mainframe channel or network. The other approach to transferring data rows either to or from the Teradata RDBMS is a single row at a time. The following sections provide a high level introduction to the capabilities and considerations for both approaches. You can use this information to help decide which utilities are appropriate for your specific need.
Chapter 2: BTEQ
"Civilization advances by extending the number of important operations which we can perform without thinking about them." - Alfred Whitehead
Before you can use BTEQ, you must have user access rights to the client system and privileges to the Teradata DBS. Normal system access privileges include a user ID and a password. Some systems may also require additional user identification codes depending on company standards and operational procedures. Depending on the configuration of your Teradata DBS, you may need to include an account identifier (acctid) and/or a Teradata Director Program Identifier (TDPID).
Employee_No
Last_Name
First_Name
Salary
Dept_No
Figure 2-1 BTEQ execution .LOGON cdw/sql01 Password: XXXXX Enter your BTEQ/SQL Request or BTEQ Command. Type at command prompt: Logon with TDPID and USERNAME. Then enter PASSWORD at the second prompt. BTEQ will respond and is waiting for a command.
SELECT * FROM Employee_Table An SQL Statement WHERE Dept_No = 400; *** Query Completed. 2 rows found. 5 Columns returned. *** Total elapsed time was 1 second. Employee_No Last_name First_name Salary Dept_No 1256349 54500.00 Harrison 400 Herbert The result set BTEQ displays information about the answer set.
WITH BY Statement
"Time is the best teacher, but unfortunately, it kills all of its students." Robin Williams Investing time in Teradata can be a killer move for your career. We can use the WITH BY statement in BTEQ, whereas we cannot use it with Nexus or SQL Assistant. The WITH BY statement works like a correlated subquery in the fact that you can us aggregates based on a distinct column value. BTEQ has the ability to use WITH BY statements:
"I've learned that you can't have everything and do everything at the same time." Oprah Winfrey The great thing about the WITH statement is that you can do everything to a specific group while having everything done to a column as a whole. We can get a grand total or an overall average with the WITH statement, just leave out BY. Here's a good example: Using WITH on a whole column:
"What is defeat? Nothing but education; nothing but the first step to something better." Wendell Phillips The final query in our last transaction is what caused our updates to fail. This was not the sweet taste of victory, but instead the smell of de Feet! Actually, it really was an education leading to something better. When using BT/ET in your transaction, you're telling Teradata that when it
comes to committing, we either want all or none. Since our last query in the transaction failed the Transient Journal rolled back all the queries in our entire transaction. Make sure that your syntax is correct when using the method of BT and ET because a mistake causes a massive rollback. The last query in our set did not work:
Our updates didn't work! That's because we bundled all four queries into one transaction. Since our last query failed, the tables were rolled back to their original state before the transaction took place.
Placing the semi-colon at the beginning of the next line (followed by another statement) will bundle those statements together as one transaction. Notice that our Employee_Table was not updated, just like in the first example.
"Be not afraid of growing slowly, be afraid only of standing still." -Chinese Proverb
Remember the first rule of ANSI mode: all transactions must be committed by the user actually using the word 'COMMIT'. Also, in ANSI mode after any DDL statement (CREATE, DROP, ALTER, DATABASE) we have to use the 'commit' command immediately. This tells Teradata to commit to what's been done. Our query below will attempt to find anyone with a last_name of 'larkins'. It will fail even though we have 'Mike' 'Larkins' in our table. This is because ANSI is case sensitive and we did not capitalize the 'L' in 'Larkins'. Let's run a few queries in ANSI mode:
Notice that we have to COMMIT after any DDL or Update before the transaction is committed. We even have to COMMIT after setting our DATABASE or we will get an error. We didn't have any rows return, but we know there's a Mike Larkins within the table. That's because BTEQ is case sensitive. Change 'larkins' to 'Larkins'.
Rollback
"Insanity: doing the same thing over and over again and expecting different results." Albert Einstein The Rollback keyword is the SQL mulligan of Teradata. Rollback will erase any changes made to a table. This can be very useful if something didn't work. However, you cannot rollback once you've used the commit keyword. Not keeping rollback in your arsenal would be insane.
"All truths are easy to understand once they are discovered; the point is to discover them." Galileo Galilei Discovering the advantages in using ANSI will only make SQL easier to write. It might take a little bit more typing, but a little work now can save you lots of time later. The Employee_Table was updated!
In ANSI mode, only failed transactions are rolled back when it comes to multi-statement transactions.
Once you're in DOS, type in the following: 'BTEQ < c:\temp\BTEQ_First_Script.txt', then hit enter. BTEQ will automatically open in DOS, and then it will access the file from the location you listed.
We can use BTEQ to export our results to another text document. Exporting data also works very well when you're trying to document your query along with the results. We can export our results in batch as well
Notice that the BTEQ command is immediately followed by the '<BTEQ_First_Script.txt' to tell BTEQ which file contains the commands to execute. Then, the '>BTEQ_First_Export.txt' names the file where the output messages are written. Since putting password information into a script is scary for security reasons, inserting the password directly into a script that is to be processed in batch mode may not be a good idea. It is generally recommended and a common practice to store the logon and password in a separate file that that can be secured. That way, it is not in the script for anyone to see. For example, the contents of a file called "mylogon.txt" might be: '.LOGON cdw/sql00,whynot'. Then, the script should contain the following command instead of a .LOGON: .RUN FILE=c:\temp\mylogon.txt. To submit results to two different files at once, simply initialize BTEQ as follows: BTEQ < c:\filename > c:\location1 > c:\location2
CHAR(12), SMALLINT)
DECIMAL(8,2),
UNIQUE PRIMARY INDEX (Employee_No); .LABEL INSEMPS[*] INSERT INTO Employee_Table (1232578, 'Chambers', 'Mandee', 48850.00, 100); INSERT INTO Employee_Table (1256349, 'Harrison', 'Herbert', 54500.00, 400); .QUIT [*] Both labels have to be identical or it will not work. Open table as spreadsheet Once the table has been created, Teradata will then insert the two new rows into the empty table.
And I thought French was tough; it's like they have a different word for everything
We now have a flat file that contains all information found in the Employee_Table. We will be able to use this flat file for future exercises.
The issue is that there is no standard character defined to represent either a numeric or character NULL. So, every system uses a zero for a numeric NULL and a space or blank for a character NULL. If this data is simply loaded into another RDBMS, it is no longer a NULL, but a zero or space. To remedy this situation, INDICATA puts a bitmap at the front of every record written to the disk. This bitmap contains one bit per field/column. When a Teradata column contains a NULL, the bit for that field is turned on by setting it to a "1". Likewise, if the data is not NULL, the bit remains a zero. Therefore, the loading utility reads these bits as indicators of NULL data and identifies the column(s) as NULL when data is loaded back into the table, where appropriate. Since both DATA and INDICDATA store each column on disk in native format with known lengths and characteristics, they are the fastest method of transferring data. However, it becomes imperative that you be consistent. When it is exported as DATA, it must be imported as DATA and the same is true for INDICDATA. Again, this internal processing is automatic and potentially important. Yet, on a network-attached system, being consistent is our only responsibility. However, on a mainframe system, you must account for these bits when defining the LRECL in the Job Control Language (JCL). Otherwise, your length is too short and the job will end with an error. To determine the correct length, the following information is important. As mentioned earlier, one bit is needed per field output onto disk. However, computers allocate data in bytes, not bits. Therefore, if one bit is needed a minimum of eight (8 bits per byte) are allocated. Therefore, for every eight fields, the LRECL becomes 1 byte longer and must be added. In other words, for nine columns selected, 2 bytes are added even though only nine bits are needed. With this being stated, there is one indicator bit per field selected. INDICDATA mode gives the Host computer the ability to allocate bits in the form of a byte. Therefore, if one bit is required by the host system, INDICDATA mode will automatically allocate eight of them. This means that from one to eight columns being referenced in the SELECT will add one byte to the length of the record. When selecting nine to sixteen columns, the output record will be two bytes longer. When executing on non-mainframe systems, the record length is automatically maintained. However, when exporting to a mainframe, the JCL (LRECL) must account for this additional 2 bytes in the length. DIF Mode: Known as Data Interchange Format, which allows users to export data from Teradata to be directly utilized for spreadsheet applications like Excel, FoxPro and Lotus. The optional LIMIT is to tell BTEQ to stop returning rows after a specific number (n) of rows. This might be handy in a test environment to stop BTEQ before the end of transferring rows to the file.
The following example uses a Record (DATA) Mode format. The output of the exported data will be a flat file. Employee_Table
Open table as spreadsheet
Last_Name
First_Name
Salary
Dept_No
Employee_No 2000000 1256349 1333454 1121334 Jones Harrison Smith Strickling Squiggy Herbert John Cletus 32800.50 54500.00 48000.00 54500.00 ? 400 200 400 Logon to TERADATA This Export statement will be in record (DATA) mode. The EMPS.TXT file will be created as a flat file Finish the execution.
.QUIT
Figure 2-6
SELECT * FROM Department_Table; .EXPORT RESET .LABEL Done .QUIT Open table as spreadsheet Figure 2-7 After this script has completed, the following report will be generated on disk. Employee_No 2000000 1256349 1333454 1121334 1324657 2341218 1232578 1000234 2312225 Last_name Jones Harrison Smith Strickling Coffing Reilly Chambers Smythe Larkins First_name Squiggy Herbert John Cletus Billy William Mandee Richard Loraine Salary 32800.50 54500.00 48000.00 54500.00 41888.88 36000.00 56177.50 64300.00 40200.00 Dept_No ? 400 200 400 200 400 100 10 300 Reverse previous export command and fall through to Done.
I remember when my mom and dad purchased my first Lego set. I was so excited about building my first space station that I ripped the box open, and proceeded to follow the instructions to complete the station. However, when I was done, I was not satisfied with the design and decided to make changes. So I built another space ship and constructed another launching station. BTEQ export works in the same manner. As a user gains experience with BTEQ export, the easier it will get to work with the utility. With that being said, the following is an example that displays a more robust example of utilizing the Field (Report) option. This example will export data in Field (Report) Mode format. The output of the exported data will appear like a standard output of a SQL SELECT statement. In addition, aliases and a title have been added to the script. .LOGON CDW/sql01,whynot; .SET WIDTH 90 .SET FORMAT ON .SET HEADING 'Employee Profiles' .EXPORT REPORT FILE = C:\EMP_REPORT.TXT SELECT Employee_No AS "Employee Number", Last_name AS "Last Name", This Export statement will be in field (REPORT) mode. The EMP_REPORT.TXT file will be created as a report. Specifies the columns that are being selected. Notice that the columns have an alias. Logon to TERADATA Set the format parameters for the final report
FROM Employee_Table; .EXPORT RESET .QUIT Open table as spreadsheet Figure 2-8 After then following script has been completed, the following report will be generated on disk. Employee Profiles Employee Number 2000000 1256349 1333454 1121334 1324657 2341218 1232578 1000234 2312225 Last Name Jones Harrison Smith Strickling Coffing Reilly Chambers Smythe Larkins First Name Squiggy Herbert John Cletus Billy William Mandee Richard Loraine Salary 32800.50 54500.00 48000.00 54500.00 41888.88 36000.00 56177.50 64300.00 40200.00 Department Number ? 400 200 400 200 400 100 10 300 Reverse previous export command effects
From the above example, a number of BTEQ commands were added to the export script. Below is a review of those commands. The WIDTH specifies the width of screen displays and printed reports, based on characters per line. The FORMAT command allows the ability to enable/inhibit the page-oriented format option. The HEADING command specifies a header that will appear at the top every page of a report.
The SKIP option is used when you wish to bypass the first records in a file. For example, a mainframe tape may have header records that should not be processed. Other times, maybe the job started and loaded a few rows into the table with a UPI defined. Loading them again will cause an error. So, you can skip over them using this option. The following example will use a Record (DATA) Mode format. The input of the imported data will populate the Employee_Table. .SESSIONS 4 .LOGON CDW/sql01,whynot; .IMPORT DATA FILE = C:\EMPS.TXT, SKIP = 2 .QUIET ON .REPEAT * USING Emp_No (INTEGER), Logon to TERADATA Specify DATA mode, name the file to read "EMPS.TXT", but skip the first 2 records. Limit messages out. Loop in this script until end of records in file. Specify the number of SESSIONS to establish with Teradata
(CHAR(20)), (VARCHAR(12)), (DECIMAL(8,2)), (SMALLINT) The USING Specifies the field in the input file and names them.
INSERT INTO Employee_Table (Employee_No, Last_name, First_name, Salary, Dept_No) VALUES (:Emp_No,
Specify the insert parameters for the employee_table Substitutes data from the fields into the SQL command.
:L_name, :F_name, :Salary, :Dept_No) ; .QUIT Open table as spreadsheet Figure 2-9
Exit the script after all data read and rows inserted.
From the above example, a number of BTEQ commands were added to the import script. The next page contains a review of those commands. .QUIET ON limits BTEQ output to reporting only errors and request processing statistics. Note: Be careful how you spell .QUIET, else forgetting the E becomes .QUIT and it will. .REPEAT * causes BTEQ to read a specified number of records or until EOF. The default is one record. Using REPEAT 10 would perform the loop 10 times. The USING defines the input data fields and their associated data types coming from the host.
The following builds upon the IMPORT Record (DATA) example above. The example below will still utilize the Record (DATA) Mode format. However, this script adds a CREATE TABLE statement. In addition, the imported data will populate the newly created Employee_Profile table. .SESSIONS 2 .LOGON CDW/sql01,whynot; DATABASE SQL_Class; CREATE TABLE Employee_Profile ( Employee_No INTEGER, Last_name Salary Dept_No ) UNIQUE PRIMARY INDEX (Employee_No) ; .IMPORT INDICDATA FILE = C:\IND-EMPS.TXT .QUIET ON .REPEAT 120 USING Employee_No (INTEGER), Last_name (CHAR(20)), First_name (VARCHAR(12)), Salary (DECIMAL(8,2)), Dept_No (SMALLINT) This import statement specifies INDICDATA mode. The input file is from a LAN file called IND-EMPS.TXT. Quiet on limits the output to reporting only errors and processing statistics. This causes BTEQ to read the first 120 records from the file. CHAR(20), DECIMAL(8,2), SMALLINT First_name VARCHAR(12), Specify the number of SESSIONS to establish with Teradata Logon to TERADATA Make the default database SQL_Class This statement will create the Employee_Profile table.
INSERT INTO Employee_Profile The USING Specifies the parameters of the input file. (Employee_No, Last_name, First_name, Salary, Dept_No) VALUES (:Employee_No, Specify the insert parameters for the employee_profile.
:Last_name,
:First_name, :Salary, :Dept_No) ; .LOGOFF .QUIT Open table as spreadsheet Figure 2-10 Notice that some of the scripts have a .LOGOFF and .QUIT. The .LOGOFF is optional because when BTEQ quits, the session is terminated. A logoff makes it a friendly departure and also allows you to logon with a different user name and password. Substitute the values to be inputted into the SQL command.
Variable columns: Variable length columns should be calculated as the maximum value plus two. The two bytes are for the number of bytes for the binary length of the field. In reality you can save much space because trailing blanks are not kept. The logical record will assume the maximum and add two bytes as a length field per column. VARCHAR(8) VARCHAR(10) 10 bytes 12 bytes
Indicator columns: As explained earlier, the indicators utilize a single bit for each field. If your record has 8 fields (which require 8 bits), then you add one extra byte to the total length of all the fields. If your record has 9-16 fields, then add two bytes.
BTEQ Return Codes Return codes are two-digit values that BTEQ returns to the user after completing each job or task. The value of the return code indicates the completion status of the job or task as follows: Return Code Descirption 00 Job completed with no errors. 02 User alert to log on to the Teradata DBS. 04 Warning error. 08 User error. 12 Severe internal error. You can over-ride the standard error codes at the time you terminate BTEQ. This might be handy for debugging purposes. The error code or "return code" can be any number you specify using one of the following: Override Code Description .QUIT 15 .EXIT 15
BTEQ Commands
The BTEQ commands in Teradata are designed for flexibility. These commands are not used directly on the data inside the tables. However, these 60 different BTEQ commands are utilized in four areas. Session Control Commands File Control Commands Sequence Control Commands Format Control Commands
sessions. Specifies a disposition of warnings issued in response to violations of ANSI syntax. The SQL will still run, but a warning message will be provided. The four settings are FULL, INTERMEDIATE, ENTRY, and NONE. Specifies whether transaction boundaries are determined by Teradata SQL or ANSI SQL semantics. Displays all of the BTEQ control command options currently configured. Displays the BTEQ software release versions.
Used to specify the correct Teradata server for logons for a particular session. Open table as spreadsheet
Figure 2-11
Execute an MVS TSO command from inside the BTEQ environment. TSO Open table as spreadsheet
Figure 2-12
Submit the next request a certain amount of times. REPEAT Open table as spreadsheet Figure 2-13
NULL OMIT PAGEBREAK PAGELENGTH QUIET RECORDMODE RETCANCEL RETLIMIT RETRY RTITLE SEPARATOR SHOWCONTROLS SIDETITLES SKIPLINE SUPPRESS TITLEDASHES UNDERLINE WIDTH
Specifies a character or string of characters to represent null values returned from Teradata. Omit specific columns from a report. Ejects a page whenever a specified column changes values. Specifies the page length of printed reports based on lines per page. Limit BTEQ output displays to all error messages and request processing statistics. One of multiple data mode options for data selected from Teradata. (INDICDATA, FIELD, or RECORD). Cancel a request when the specified value of the RETLIMIT command option is exceeded. Specifies the maximum number of rows to be displayed or written from a Teradata SQL request. Retry requests that fail under specific error conditions. Specify a header appearing at the top of all pages of a report. Specifies a character string or specific width of blank characters separating columns of a report. Displays all of the BTEQ control command options currently configured. Place titles to the left or side of the report instead of on top. Inserts blank lines in a report when the value of a column changes specified values. Replace each and every consecutively repeated value with completely-blank character strings. Display dash characters before each report line summarized by a WITH clause. Display a row of dash characters when the specified column changes values.
Specifies the width of screen displays and printed reports, based on characters per line. Open table as spreadsheet
Chapter 3: FastLoad
"Where there is no patrol car, there is no speed limit." - Al Capone
What makes FastLoad perform so well when it is loading millions or even billions of rows? It is because FastLoad assembles data into 64K blocks (64,000 bytes) to load it and can use multiple sessions simultaneously, taking further advantage of Teradata's parallel processing. This is different from BTEQ and TPump, which load data at the row level. It has been said, "If you have it, flaunt it!" FastLoad does not like to brag, but it takes full advantage of Teradata's parallel architecture. In fact, FastLoad will create a Teradata session for each AMP (Access Module Processor the software processor in Teradata responsible for reading and writing data to the disks) in order to maximize parallel processing. This advantage is passed along to the FastLoad user in terms of awesome performance. Teradata is the only data warehouse loads data, processes data and backs up data in parallel.
BEGIN LOADING
CREATE TABLE
END LOADING
has been transmitted. It tells FastLoad to proceed to Phase II. As mentioned earlier, it can be used as a way to partition data loads to the same table by omitting if from the script. This is true because the table remains empty until after Phase II. You can also use .End Loading to go to Phase 2. Instead of then being finished, Fastload will instead be paused. ERRLIMIT Specifies the maximum number of rejected ROWS allowed in error table 1 (Phase I). This handy command can be a lifesaver when you are not sure how corrupt the data in the input file is. The more corrupt it is, the greater the clean up effort required after the load finishes. ERRLIMIT provides you with a safety valve. You may specify a particular number of error rows beyond which FastLoad will precede to the abort. This provides the option to restart the FastLoad or to scrub the input data more before loading it. Remember, all the rows in the error table are not in the data table. That becomes your responsibility. Designed for online use, the Help command provides a list of all possible FastLoad commands along with brief, but pertinent tips for using them. Builds the table columns list for use in the FastLoad DEFINE statement when the data matches the Create Table statement exactly. In real life this does not happen very often. This is FastLoad's favorite command! It inserts rows into the target table. No, this is not the WAX ON / WAX OFF from the movie, The Karate Kid! LOGON simply begins a session. LOGOFF ends a session. QUIT is the same as LOGOFF. Just like it sounds, the NOTIFY command used to inform the job that follows that some event has occurred. It calls a user exit or predetermined activity when such events occur. NOTIFY is often used for detailed reporting on the FastLoad job's success. Specifies the beginning record number (or with THRU, the ending record number) of the Input data source, to be read by FastLoad. Syntactically, This command is placed before the INSERT keyword. Why would it be used? Well, it enables FastLoad to bypass input records that are not needed such as tape headers, manual restart, etc. When doing a partition data load, RECORD is used to over-ride the checkpoint. Used only in the LAN environment, this command states in what format the data from the Input file is coming: FastLoad, Unformatted, Binary, Text, or Variable Text. The default is the Teradata RDBMS standard, FastLoad. This command specifies the number of FastLoad sessions to establish with Teradata. It is written in the script just before the logon. The default is 1 session per available AMP. The purpose of multiple sessions is to enhance throughput when loading large volumes of data. Too few sessions will stifle throughput. Too many will preclude availability of system resources to other users. You will need to find the proper balance for your configuration.
HELP
HELP TABLE
RECORD
SET RECORD
SESSIONS
SLEEP
Working in conjunction with TENACITY, the SLEEP command specifies the amount of time in minutes to wait before retrying to logon and establish all sessions. This situation can occur if all of the loader slots are used or if the number of requested sessions are not available. The default is 6 minutes. For example, suppose that Teradata sessions are already maxed-out when your job is set to run. If TENACITY were set at 4 and SLEEP at 10, then FastLoad would attempt to logon every 10 minutes for up to 4 hours. If there were no success by that time, all efforts to logon would cease.
TENACITY
Sometimes there are too many sessions already established with Teradata for a FastLoad to obtain the number of sessions it requested to perform its task or all of the loader slots are currently used. TENACITY specifies the amount of time, in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The default for FastLoad is "no tenacity", meaning that it will not retry at all. If several FastLoad jobs are executed at the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon for the number of sessions requested for up to four hours. Open table as spreadsheet
Figure 4-1 Two Error Tables: Each FastLoad requires two error tables. These are error tables that will only be populated should errors occur during the load process. These are required by the FastLoad utility, which will automatically create them for you; all you must do is to name them. The first error table is for any translation errors or constraint violations. For example, a row with a column containing a wrong data type would be reported to the first error table. The second error table is for errors caused by duplicate values for Unique Primary Indexes (UPI). FastLoad will load just one occurrence for every UPI. The other occurrences will be stored in this table. However, if the entire row is a duplicate, FastLoad counts it but does not store the row. These tables may be analyzed later for troubleshooting should errors occur during the load. For specifics on how you can troubleshoot, see the section below titled, "What Happens When FastLoad Finishes."
Maximum of 15 Loads
The Teradata RDBMS will only run a maximum number of fifteen FastLoads, MultiLoads, or FastExports at the same time. This maximum is determined by a value stored in the DBS Control record. It can be any value from 0 to 15. When Teradata is first installed, this value is set to 5 concurrent jobs. Since these utilities all use the large blocking of rows, it hits a saturation point where Teradata will protect the amount of system resources available by queuing up the extra load. For example, if the maximum number of jobs are currently running on the system and you attempt to run one more, that job will not be started. You should view this limit as a safety control. Here is a tip for remembering how the load limit applies: If the name of the load utility contains either the word "Fast" or the word "Load", then there can be only a total of fifteen of them running at any one time.
Phase 1: Acquisition
The primary function of Phase 1 is to transfer data from the host computer to the Access Module Processors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradata does not take the time to hash each row of data based on the Primary Index. That will be done later. Instead, it does the following: When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the SQL just once. The PE is the Teradata software processor responsible for parsing syntax and generating a plan to execute the request. It then opens a Teradata session from the FastLoad client directly to the AMPs. By default, one session is created for each AMP. Therefore, on large systems, it is normally a good idea to limit the number of sessions using the SESSIONS command. This capability is shown below. Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for transfer to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. To accomplish this, the rows are packed, unhashed, into large blocks and sent to the AMPs without any concern for which AMP gets the block. The result is that data rows arrive on different AMPs than those they would live, had they been hashed. So how do the rows get to the correct AMPs where they will permanently reside? Following the receipt of every data block, each AMP hashes its rows based on the Primary Index, and redistributes them to the proper AMP. At this point, the rows are written to a worktable on the AMP but remain unsorted until Phase 1 is complete. Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shipping industry today. How do the key players in this industry handle a parcel? When the shipping company receives a parcel, that parcel is not immediately sent to its final destination. Instead, for the sake of speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from that hub it is sent to the destination city. FastLoad's Phase 1 uses the AMPs in much the same way that the shipper uses its hubs. First, all the data blocks in the load get rushed randomly to any AMP. This just gets them to a "hub" somewhere in Teradata country. Second, each AMP forwards them to their true destination. This is like the shipping parcel being sent from a hub city to its destination city!
Phase 2: Application
Following the scenario described above, the shipping vendor must do more than get a parcel to the destination city. Once the packages arrive at the destination city, they must then be sorted by street and zip code, placed onto local trucks and be driven to their final, local destinations. Similarly, FastLoad's Phase 2 is mission critical for getting every row of data to its final address (i.e., where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then it writes the rows into the table space on disks where they will permanently reside. Rows of a table are stored on the disks in data blocks. The AMP uses the block size as defined when the target table was created. If the table is Fallback protected, then the Fallback will be loaded after
the Primary table has finished loading. This enables the Primary table to become accessible as soon as possible. FastLoad is so ingenious, no wonder it is the darling of the Teradata load utilities!
FastLoad Commands
Here is a table of some key FastLoad commands and their definitions. They are used to provide flexibility in control of the load process. Consider this your personal redireference guide! You will notice that there are only a few SQL commands that may be used with this utility (Create Table, Drop Table, Delete and Insert). This keeps FastLoad from becoming encumbered with additional functions that would slow it down.
Fastload Sample
"Mistakes are a part of being human. Appreciate your mistakes for what they are: precious life lessons that can only be learned the hard way. Unless it's a fatal mistake, which, at least, others can learn from." Al Franken Fastload is a utility we can use to populate empty tables. Make no mistake about how useful Fastload can be or how fatal errors can occur. The next 2 slides illustrate the essentials needed when constructing your fastload script. The first will highlight the important areas about the FastLoad script, and the second slide is a blank copy of the script that you can use to create your own FastLoad script. Use the flat file we created in the BTEQ chapter to help run the script.
Simply copy the following text into notepad, then save it with a name and location that you can easily remember (we saved ours as c:\temp\Fastload_First_Script.txt).
This script is going to create a table called Employee_Table02. After the table is created, it's going to take the information from our flat file and insert it into the new table. Afterwards, the Employee_Table and Employee_Table02 should look identical.
The load utilities often scare people because there are many things that appear complicated. In actuality, the load scripts are very simple. Think of FastLoad as: Logging onto Teradata Defining the Teradata table that you want to load (target table) Defining the INPUT data file Telling the system to start loading
/* Setup the FastLoad Parameters */ SESSIONS 100; /*or, the number of sessions supportable*/ Specify the number of sessions to logon.
TENACITY 4; /* the default is no tenacity, means no retry */ SLEEP 10; /* the default is 6, means retry in 6 minutes */ LOGON CW/SQL01,SQL01; SHOW VERSIONS; /* Shows the Utility's release number */ /* Set the Record type to a comma delimited for FastLoad */ RECORD 2; SET RECORD VARTEXT ","; /* Define the Text File Layout and Input File */ DEFINE Employee_No (VARCHAR(10)) , Last_name (VARCHAR(20)) , First_name (VARCHAR(12)) , Salary , Dept_No (VARCHAR(5)) (VARCHAR(6))
Starts with the second record. Specifies if record layout is vartext with a comma delimiter. Notice that all fields are defined as VARCHAR. When using VARTEXT, the fields do not contain the length field like in these formats: text, FastLoad, or unformatted.
FILE= EMPS.TXT; /* Optional to show the layout of the input */ SHOW; /* Begin the Load and Insert Process into the */ /* Employee_Profile Table */ BEGIN LOADING SQL01. Employee Profile ERRORFILESSQLOLEmp Err1.SQL01.Emp Err2 CHECKPOINT 100000; Defines the insert statement to use for loading the INSERT INTO SQL01.Employee_Profile VALUES rows ( :Employee_No ,:Last_name Names the error tables Sets the number of rows at which to pause & record progress in the restart log before loading further. Specifies table to load and lock
,:First_name ,:Salary ,:Dept_No ); END LOADING; LOGOFF; Open table as spreadsheet Continues loading process with Phase 2. Logs off of Teradata.
Step One: Before logging onto Teradata, it is important to specify how many sessions you need. The syntax is [SESSIONS {n}]. Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commands in FastLoad are similar to those in BTEQ. FastLoad commands were designed from the underlying commands in BTEQ. However, unlike BTEQ, most of the FastLoad commands do not allow a dot ["."] in front of them and therefore need a semicolon. At this point we chose to have Teradata tell us which version of FastLoad is being used for the load. Why would we recommend this? We do because as FastLoad's capabilities get enhanced with newer versions, the syntax of the scripts may have to be revisited. Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structure in the DEFINE statement, you must first set the RECORD layout type for the file being passed by FastLoad. We have used VARTEXT in our example with a comma delimiter. The other options are FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need to know this about your input file ahead of time. Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of the flat file to be used as the input FILE, or source file for the load. Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to what you want loaded. In the BEGIN LOADING statement, the script must name the target table and the two error tables for the load. Did you notice that there is no CREATE TABLE statement for the error tables in this script? FastLoad will automatically create them for you once you name them in the script. In this instance, they are named "Emp_Err1" and "Emp_Err2". Phase 1 uses "Emp_Err1" because it comes first and Phase 2 uses "Emp_Err2". The names are arbitrary, of course. You may call them whatever you like. At the same time, they must be unique within a database, so using a combination of your userid and target table name helps insure this uniqueness between multiple FastLoad jobs occurring in the same database. In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter. We included [CHECKPOINT 100000]. Although not required, this optional parameter performs a vital task with regard to the load. In the old days, children were always told to focus on the three "R's' in grade school ("reading, 'riting, and 'rithmatic"). There are two very different, yet equally important, R's to consider whenever you run FastLoad. They are RERUN and RESTART. RERUN means that the job is capable of running all the processing again from the beginning of the load. RESTART means that the job is capable of running the processing again from the point where it left off when the job was interrupted, causing it to fail. When CHECKPOINT is requested, it allows FastLoad to resume loading from the first row following the last successful CHECKPOINT. We will learn more about CHECKPOINT in the section on Restarting FastLoad. Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier's do when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will not proceed to Phase 2 without the END LOADING command.
In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at the start of the job, it prevents loading rows as they arrive from different time zones. However, to accomplish this processing, simply omit the END LOADING on the load job. Then, you can run the same FastLoad multiple times and continue loading the worktables until the last file is received. Then run the last FastLoad job with an END LOADING and you have partitioned your load jobs into smaller segments instead of one huge job. This makes FastLoad even faster! Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP or CREATE commands within the script. Additionally, every script is exactly the same with the exception of the last one, which contains the END LOADING causing FastLoad to proceed to Phase 2. That's a pretty clever way to do a partitioned type of data load. Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be the last utility command in your script. At this point the table lock is released and if there are no rows in the error tables, they are dropped automatically. However, if a single row is in one of them, you are responsible to check it, take the appropriate action and drop the table manually.
Checkpoints
"Once the game is over, the king and the pawn go back in the same box." - Italian Proverb Fastload has the ability to save checkpoints during the loading process. Checkpoints are what enable utilities to pick up from where they left off if the loading process was interrupted in any way. Choosing a correct checkpoint can be easily calculated: Determining a Checkpoint Add up the approximate byte count of 1 row. The row below adds up to: Employee_No: Dept_No: Last_Name: First_Name: Salary: Total: Integer Smallint Char(20) VarChar(12) Decimal(8,2) = = = = = = 4 bytes 2 bytes 20 bytes 14 bytes 5 bytes 45 bytes
Now take the total number of bytes per row (45 bytes in our case) and divide 64,000 by that number. (64,000 / 45 = 1422.2) The number you come up with is the number of rows that will be bundled together in each data block set.
Setting the checkpoint to 1000 would be pointless because the computer would take a checkpoint every data block! A 1,000,000 checkpoint would work well here, sending approximately 703 data blocks between checkpoints.
FastLoad allows six kinds of data conversions. Here is a chart that displays them: Open table as spreadsheet IN FASTLOAD YOU MAY CONVERT CHARACTER DATA TO NUMERIC DATA VARIABLE LENGTH DATA DATE DECIMALS INTEGERS CHARACTER DATA CHARACTER DATA FIXED LENGTH DATA TO CHARACTER DATA INTEGERS DECIMALS DATE NUMERIC DATA TO TO TO TO TO
Figure 4-5 When we said that converting data is easy, we meant that it is easy for the user. It is actually quite resource intensive, thus increasing the amount of time needed for the load. Therefore, if speed is important, keep the number of columns being converted to a minimum!
Dept_No INTEGER ,Dept_Name CHAR(20) ,Dept_Start_Date DATE ,Dept_Finish_Date DATE ,Dept_Name CHAR(20) ) UNIQUE PRIMARY INDEX ( Dept_No ); DEFINE Department_No (CHAR(4)) ,Department Name (CHAR(20)) ,SDate (CHAR(10)) ,FDate (CHAR(10)) FILE= Dept_Flat.txt; BEGIN LOADING SQL01.Department ERRORFILES SQL01.Dept_Err1, SQL01.Dept_Err2 CHECKPOINT 15000; INSERT INTO SQL01.Department VALUES ( :Department_No ,:Department_Name ,:SDate ,:FDate(DATE, FORMAT 'mm/dd/yyyy') ); END LOADING;
date columns are DATE data type will be converted from CHAR(10)
CHAR(4) converts to INTEGER Character dates in different style in the file: CHAR(10) comes in as YYYY-MM-DD CHAR(10) comes in as MM/DD/YYYY
DEFINES THE FLAT FILE AND NAME INPUT FILE Names the target table and error tables, don't let the word "errorfiles" fool you, they are tables. Will check point every 15000 rows The INSERT does automatic conversion:
Converts character to integer Converts character from ANSI date to DATE Converts character as other date to DATE by describing the input format in the file. Without the format, this row goes into the error table.
Figure 4-5
Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience a system reset or some glitch that stops the job one half way through it. Maybe the mainframe went down. Well, it is not really a big deal because FastLoad is so lightning-fast that you could probably just RERUN the job for small data loads. However, when you are loading a billion rows, this is not a good idea because it wastes time. So the most common way to deal with these situations is simply to RESTART the job. But what if the normal load takes 4 hours, and the glitch occurs when you already have two thirds of the data rows loaded? In that case, you might want to make sure that the job is totally restartable. Let's see how this is done.
First, you ensure that the target table and error tables, if they existed previously, are blown away. If there had been no errors in the error tables, they would be automatically dropped. If these tables did not exist, you have not lost anything. Next, if needed, you create the empty table structure needed to receive a FastLoad.
Figure 4-9 The first line displays the total number of records read from the input file. Were all of them loaded? Not really. The second line tells us that there were fifty rows with constraint violations, so they were not loaded. Corresponding to this, fifty entries were made in the first error table. Line 3 shows that there were zero entries into the second error table, indicating that there were no duplicate Unique Primary Index violations. Line 4 shows that there were 999950 rows successfully loaded into the empty target table. Finally, there were no duplicate rows. Had there been any duplicate rows, the duplicates would only have been counted. They are not stored in the error tables anywhere. When FastLoad reports on its efforts, the number of rows in lines 2 through 5 should always total the number of records read in line 1. Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally be duplicate rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint (quiet or quiescent point) when nothing is happening within FastLoad. Therefore, some number of rows will be sent to the AMPs again because the restart starts on the next record after the value stored in the checkpoint. Hence, when a restart occurs, the first row after the checkpoint and some of the consecutive rows are sent a second time. These will be caught as duplicate rows after the sort. This restart logic is the reason that FastLoad will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic.
CHECKPOINT option defines the points in a load job where the FastLoad utility pauses to record that Teradata has processed a specified number of rows. When the parameter "CHECKPOINT [n]" is included in the BEGIN LOADING clause the system will stop loading momentarily at increments of [n] rows. At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading smoothly. Then FastLoad sends a checkpoint report (entry) to the SYSADMIN.Fastlog table. This log contains a row for all currently running FastLoad jobs with the last successfully reached checkpoint for each job. Should an error occur that requires the load to restart, FastLoad will merely go back to the last successfully reported checkpoint prior to the error. It will then restart from the record immediately following that checkpoint and start building the next block of data to load. If such an error occurs in Phase 1, with CHECKPOINT 0, FastLoad will always restart from the very first row. If this is not desirable, the RECORD statement can be used to force a restart at the next record after the failure.
appropriate programming languages are used. However, INMODs replace the normal mainframe DDNAME or LAN defined FILE name with the following statement: DEFINE INMOD=<INMODname>. For a more indepth discussion of INMODs, see the chapter of this book titled, "INMOD Processing".
Chapter 4: Multiload
"In the end we'll remember not the sound of our enemies, but the silence of our friends." - Martin Luther King Jr.
The other factor that makes a DELETE mode operation so good is that it examines an entire block of rows at a time. Once all the eligible rows have been removed, the block is written one time and a checkpoint is written. So, if a restart is necessary, it simply starts deleting rows from the next block without a checkpoint. This is a smart way to continue. Remember, when using the TJ all deleted rows are put back into the table from the TJ as a rollback. A rollback can take longer to finish then the delete. MultiLoad does not do a rollback; it does a restart.
In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limited to four months, monthly data is rotated in and out. At the end of every month, the oldest month of data is removed and the new month is added. The cycle is "add a month, delete a month, add a month, delete a month." In our illustration, that means that January data must be deleted to make room for May's data. Here is a question for you: What if there was another way to accomplish this same goal without consuming all of these extra resources? To illustrate, let's consider the following scenario: Suppose you have TableA that contains 12 billion rows. You want to delete a range of rows based on a date and then load in fresh data to replace these rows. Normally, the process is to perform a MultiLoad DELETE to DELETE FROM TableA WHERE <date-column> < '2002-02-01'. The final step would be to INSERT the new rows for May using MultiLoad IMPORT.
Here is a key difference to note between MultiLoad and FastLoad. Sometimes an AMP (Access Module Processor) fails and the system administrators say that the AMP is "down" or "offline." When using FastLoad, you must restart the AMP to restart the job. MultiLoad, however, can continue running when an AMP fails, if the table is fallback protected. As the same time, you can use the AMPCHECK option to make it work like FastLoad if you want.
Two Error Tables: Here is another place where FastLoad and MultiLoad are similar. Both require the use of two error tables per target table. MultiLoad will automatically create these tables. Rows are inserted into these tables only when errors occur during the load process. The first error table is the acquisition Error Table (ET). It contains all translation and constraint errors that may occur while the data is being acquired from the source(s). The second is the Uniqueness Violation (UV) table that stores rows with duplicate values for Unique Primary Indexes (UPI). Since a UPI must be unique, MultiLoad can only load one occurrence into a table. Any duplicate value will be stored in the UV error table. For example, you might see a UPI error that shows a second employee number "99." In this case, if the name for employee "99" is Kara Morgan, you will be glad that the row did not load since Kara Morgan is already in the Employee table. However, if the name showed up as David Jackson, then you know that further investigation is needed, because employee numbers must be unique. Each error table does the following: Identifies errors Provides some detail about the errors Stores the actual offending row for debugging You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if you do not name them, they default to ET_<target_table_name> and UV_<target_table_name>. In either case, MultiLoad will not accept error table names that are the same as target table names. It does not matter what you name them. It is recommended that you standardize on the naming convention to make it easier for everyone on your team. For more details on how these error tables can help you, see the subsection in this chapter titled, "Troubleshooting MultiLoad Errors." Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART. There is one LOGTABLE for each run. Since MultiLoad will not resubmit a command that has been run previously, it will use the LOGTABLE to determine the last successfully completed step. Work Table(s): MultiLoad will automatically create one worktable for each target table. This means that in IMPORT mode you could have one or more worktables. In the DELETE mode, you will only have one worktable since that mode only works on one target table. The purpose of worktables is to hold two things: 1. The Data Manipulation Language (DML) tasks 2. The input data that is ready to APPLY to the AMPs The worktables are created in a database using PERM space. They can become very large. If the script uses multiple SQL statements for a single data record, the data is sent to the AMP once for each SQL statement. This replication guarantees fast performance and that no SQL statement will ever be done more than once. So, this is very important. However, there is no such thing as a free lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this disk space by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad run is in your hands.
This format is the same as Binary, plus a marker (X '0A' or X '0D') that specifies the end of the record. Each record has a random number of bytes and is followed by an end of the record marker. The format for these input records is defined in the LAYOUT statement of the MultiLoad script using the components FIELD, FILLER and TABLE.
This is variable length text RECORD format separated by delimiters such as a comma. For this format you may only use VARCHAR, LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad LAYOUT. Note that two delimiter characters in a row will result in a null value between them. Open table as spreadsheet
Figure 5-1
What about the extra two sessions? Well, the first one is a control session to handle the SQL and logging. The second is a back up or alternate for logging. You may have to use some trial and error to find what works best on your system configuration. If you specify too few sessions it may impair performance and increase the time it takes to complete load jobs. On the other hand, too many sessions will reduce the resources available for other important database activities. Third, the required support tables are created. They are the following:
Open table as spreadsheet Type of
Table Details
Table ERRORTABLES MultiLoad requires two error tables per target table. The first error table contains constraint violations, while the second error table stores Unique Primary Index violations. Work Tables hold two things: the DML tasks requested and the input data that is ready to APPLY to the AMPs. The LOGTABLE keeps a record of the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART.
WORKTABLES LOGTABLE
Figure 5-2 The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access locks are placed on all target tables, allowing other users to read or write to the table for the time being. However, this lock does prevent the opportunity for a user to request an exclusive lock. Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP or ALTER a target table while it is locked for loading. This leads us to Phase 2.
At this point, Teradata does not care about which AMP receives the data block. The blocks are simply sent, one after the other, to the next AMP in line. For their part, each AMP begins to deal with the blocks that they have been dealt. It is like a game of cards - you take the cards that you have received and then play the game. You want to keep some and give some away. Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMP hashes each row on the primary index and sends it over the BYNET to the proper AMP where it will ultimately be used. But the row does not get inserted into its target table, just yet. The receiving AMP must first do some preparation before that happens. Don't you have to get ready before company arrives at your house? The AMP puts all of the hashed rows it has received from other AMPs into the worktables where it assembles them into the SQL. Why? Because once the rows are reblocked, they can be sorted into the proper order for storage in the target table. Now the utility places a load lock on each target table in preparation for the Application Phase. Of course, there is no Acquisition Phase when you perform a MultiLoad DELETE task, since no data is being acquired.
Sequence number that identifies the IMPORT command where the error occurred Sequence number for the DML statement involved with the error Sequence number of the DML statement being carried out when the error was discovered Sequence number that tells which APPLY clause was running when the error occurred The number of the data row in the client file that was being built when the error took place
Figure 5-3 Remember, MultiLoad allows for the existence of NUSI processing during a load. Every hashsequence sorted block from Phase 3 and each block of the base table is read only once to reduce I/O operations to gain speed. Then, all matching rows in the base block are inserted, updated or deleted before the entire block is written back to disk, one time. This is why the match tags are so important. Changes are made based upon corresponding data and DML (SQL) based on the match tags. They guarantee that the correct operation is performed for the rows and blocks with no duplicate operations, a block at a time. And each time a table block is written to disk successfully, a record is inserted into the LOGTABLE. This permits MultiLoad to avoid starting again from the very beginning if a RESTART is needed.
What happens when several tables are being updated simultaneously? In this case, all of the updates are scripted as a multi-statement request. That means that Teradata views them as a single transaction. If there is a failure at any point of the load process, MultiLoad will merely need to be RESTARTed from the point where it failed. No rollback is required. Any errors will be written to the proper error table.
Type Support
What does the MLOAD Command do? This command communicates directly with Teradata to specify if the MultiLoad mode is going to be IMPORT or DELETE. Note that the word IMPORT is optional in the syntax because it is the DEFAULT, but DELETE is required. We recommend using the word IMPORT to make the coding consistent and easier for others to read. Any parameters for the load, such as error limits or checkpoints will be included under the .BEGIN command, too. It is important to know which commands or parameters are optional ince, if you do not include them, MultiLoad may supply defaults that may impact your load. The DML LABEL defines treatment options and labels for the application (APPLY) of data for the INSERT, UPDATE, UPSERT and DELETE operations. A LABEL is simply a name for a requested SQL activity. The LABEL is defined first, and then referenced later in the APPLY clause. This instructs MultiLoad to finish the APPLY operations with the changes to the designated databases and tables.
.END MLOAD
Task
Type Task
What does the MLOAD Command do? This defines a column of the data source record that will be sent to the Teradata database via SQL. When writing the script, you must include a FIELD for each data field you need in SQL. This command is used with the LAYOUT command. Do not assume that MultiLoad has somehow uncovered much of what you used in your term papers at the university! FILLER defines a field that is accounted for as part of the data source's row format, but is not sent to the Teradata DBS. It is used with the LAYOUT command. LAYOUT defines the format of the INPUT DATA record so Teradata knows what to expect. If one record is not large enough, you can concatenate multiple data records by using the LAYOUT parameter CONTINUEIF to tell which value to perform for the concatenation. Another option is INDICATORS, which is used to represent nulls by using the bitmap (1 bit per field) at the front of the data record. This specifies the username or LOGON string that will establish sessions for MultiLoad with Teradata. This support command names the name of the Restart Log that will be used for storing CHECKPOINT data pertaining to a load. The LOGTABLE is then used to tell MultiLoad where to RESTART, should that be necessary. It is recommended that this command be placed before the .LOGON command. This command terminates any sessions established by the LOGON command. This command defines the INPUT DATA FILE, file type, file usage, the LAYOUT to use and where to APPLY the data to SQL. Optionally, you can SET utility variables. An example would be {.SET DBName TO 'CDW_Test'}. This interrupts the operation of MultiLoad in order to issue commands to the local operating system. This is a command that may be used with the .LAYOUT command. It identifies a table whose columns (both their order and data types) are to be used as the field names and data descriptions of the data source records.
.FILLER
Task
.LAYOUT
Task
.LOGON .LOGTABLE
Support Support
.LOGOFF .IMPORT
Support Task
Figure 5-4
Here is a list of components or parameters that may be used in the .BEGIN IMPORT command. Note: The parameters do not require the usual dot prior to the command since they are actually sub-commands.
Open table as spreadsheet
WHAT IT DOES NONE specifies that MLOAD starts even with one down AMP per cluster if all tables are Fallback. APPLY (DEFAULT) specifies MLOAD will not start or finish Phase 4 with a down AMP. ALL specifies not to proceed if any AMPs are down, just like FastLoad.
AXSMOD
Optional
Short for Access Module, this command specifies input protocol like OLE-DB or reading a tape from REEL Librarian. This parameter is for network-attached systems only. When used, it must precede the DEFINE command in the script. You have two options: CHECKPOINT refers to the number of minutes, or frequency, at which you wish a CHECKPOINT to occur if the number is 60 or less. If the number is greater than 60, it designates the number of rows at which you want the CHECKPOINT to occur. This command is NOT valid in DELETE mode. You may specify the maximum number of errors, or the percentage, that you will tolerate during the processing of a load job. Names the two error tables, two per target table. Note there is no comma separator. If you opt to use NOTIFY for a any event during a load, you may designate the priority of that notification: LOW for level events, MEDIUM for important events, HIGH for events at operational decision points, and OFF to eliminate any notification at all for a given phase.
CHECKPOINT
Optional
Optional
Optional
Optional
Optional
This refers to the number of SESSIONS that should be established with Teradata. For MultiLoad, the optimal number of sessions is the number of AMPs in the system, plus two more. You can also use MAX or MIN, which automatically use the maximum or minimum
PARAMETER
REQUIRED OR NOT
WHAT IT DOES number of sessions to complete the job. If you pecify nothing, it will default to MAX.
Tells MultiLoad how frequently, in minutes, to try logging on to the system. Names up to 5 target tables. Tells MultiLoad how many hours to try logging on when its initial effort to do so is rebuffed. Names the worktable(s), one per target table.
Figure 5-5
Remember, we'll still use the BTEQ utility to create our flat file.
"If you don't know where you're going, any road will take you there." - Lewis Carrol Creating our Multiload script
Executing Multiload
"Ambition is a dream with a V8 Engine." - Elvis Presley You will feel like the King after executing your first Multiload script. Multiload is the Elvis Presley of data warehousing because nobody knows how make more records then Multiload. If you have the ambition to learn, this book will give you what it takes to steer through these utilities. We initialize the Multiload utility like we do with BTEQ, except that the keyword with Multiload Is mload. Remember that this Multiload is going to double the salaries of our employees. Let's execute our Multiload script
you can either use the .FILLER (like above) to position to the cursor to the next field, or the "*" on the Dept_No field could have been replaced with the number 132 (CHAR(11)+CHAR(20)+CHAR(100)+1). Then, the .FILLER is not needed. Also, if the input record fields are exactly the same as the table, the .TABLE can be used to automatically define all the .FIELDS for you. The LAYOUT name will be referenced later in the .IMPORT command. If the input file is created with INDICATORS, it is specified in the LAYOUT. Step Four: Defining the DML activities to occur The .DML LABEL names and defines the SQL that is to execute. It is like setting up executable code in a programming language, but using SQL. In our example, MultiLoad is being told to INSERT a row into the SQL01.Employee_Dept table. The VALUES come from the data in each FIELD because it is preceded by a colon (:). Are you allowed to use multiple labels in a script? Sure! But remember this: Every label must be referenced in an APPLY clause of the .IMPORT clause. Step Five: Naming the INPUT file and its format type This step is vital! Using the .IMPORT command, we have identified the INFILE data as being contained in a file called "CDW_Join_Export.txt". Then we list the FORMAT type as TEXT. Next, we referenced the LAYOUT named FILEIN to describe the fields in the record. Finally, we told MultiLoad to APPLY the DML LABEL called INSERTS that is, to INSERT the data rows into the target table. This is still a sub-component of the .IMPORT MLOAD command. If the script is to run on a mainframe, the INFILE name is actually the name of a JCL Data Definition (DD) statement that contains the real name of the file. Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continues until it finds the semi-colon to define the end of the command. This is how it determines one operation from another. Therefore, it is very important or it would have attempted to process the END LOADING as part of the IMPORT it wouldn't work. Step Six: Finishing loading and logging off of Teradata This is the closing ceremonies for the load. MultiLoad to wrap things up, closes the curtains, and logs off of the Teradata system. Important note: Since the script above in Figure 5-6 does not DROP any tables, it is completely capable of being restarted if an error occurs. Compare this to the next script in Figure 5-7. Do you think it is restartable? If you said no, pat yourself on the back.
Open table as spreadsheet
WHAT IT DOES Names the Target table. Names the worktable one per target table. Names the two error tables, two per target table and there is no comma separator between them. Tells MultiLoad how many hours to try establishing sessions when its initial effort to do so is rebuffed.
Optional
Figure 5-6 /* Simple Mload script */ Sets Up a Logtable and Logs on to Teradata
.LOGTABLE SQL01.CDW_Log; .LOGON TDATA/SQL01,SQL0; .BEGIN IMPORT MLOAD TABLES SQL01.Employee_Dept1 WORKTABLES SQL01.CDW_WT ERRORTABLES SQL01.CDW_ET SQL01.CDW_UV; .LAYOUT FILEIN; .FIELD Employee_No * CHAR(11); .FIELD Last_Name * CHAR(20); .FILLER Junk_stuff * CHAR(100); .FIELD Dept_No CHAR(6); * Names the DML Label Tells MultiLoad to INSERT a row into the target table and defines the row format. Names the LAYOUT of the INPUT record and defines its structure; Notice the dots before the FIELD and FILLER and the semi-colons after each definition. Begins the Load Process by naming the Target Table, Work table and error tables; Notice NO comma between the error tables
.DML LABEL INSERTS; INSERT INTO SQL01.Employee_Dept1 (Employee_No ,Last_Name ,Dept_No ) VALUES (:Employee_No ,:Last_Name ,:Dept_No ); .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS; .END MLOAD; .LOGOFF; Open table as spreadsheet
Names the Import File and its Format type; Cites the LAYOUT file to use tells Mload to APPLY the INSERTs.
Figure 5-7
Any words between /* */ are /* +++++++++++++++++++++++++++++++++++++*/ comments only and are not processed by Teradata. /* MultiLoad SCRIPT */ /*This script is designed to change the */ /*EMPLOYEE_DEPT1 table using the data found */ /* in IMPORT INFILE CDW_Join_Export.txt */ /* Version 1.1 */ /* Created by Coffing Data Warehousing */ /* +++++++++++++++++++++++++++++++++++++*/ .LOGTABLE SQL01.CDW_Log; .RUN FILE LOGON.TXT; /*Drop Error Tables caution, this script cannot be restarted because these tables would be needed */ DROP TABLE SQL01.CDW_ET; DROP TABLE SQL01.CDW_UV; /* Begin Import and Define Work and Error Tables */ .BEGIN IMPORT MLOAD TABLES SQL01.Employee_Dept1 WORKTABLES SQL01.CDW_WT ERRORTABLES SQL01.CDW_ET SQL01.CDW_UV; /* Define Layout of Input File */ .LAYOUT FILEIN; .FIELD Employee_No * CHAR(11); Names the LAYOUT of the INPUT file. Defines the structure of the INPUT file. Notice the dots before the FIELD Begins the Load Process by telling us first the names of the target table, Work table and error tables; note NO comma between the names of the error tables Secures the logon by storing userid and password in a separate file, then reads it. Drops Existing error tables and cancels the ability for the script to restart DON'T ATTEMPT THIS AT HOME! Also, SQL does not use a dot (.) Names and describes the purpose of the script; names the author
.FIELD First_Name * CHAR(14); .FIELD Last_Name * CHAR(20); .FIELD Dept_No * CHAR(6); .FIELD Dept_Name * CHAR(20); /* Begin INSERT Process on Table */ .DML LABEL INSERTS; INSERT INTO SQL01.Employee_Dept1 ( Employee_No ,First_Name ,Last_Name ,Dept_No ,Dept_Name ) VALUES ( :Employee_No ,:First_Name ,:Last_Name ,:Dept_No ,:Dept_Name ); /* Specify IMPORT File and Apply Parameters */ .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS; .END MLOAD; .LOGOFF; Open table as spreadsheet Figure 5-8
Names the DML Label Tells MultiLoad to INSERT a row into the target table and defines the row format. Note that we place comma separators in front of the following column or value for easier debugging. Lists, in order, the VALUES to be INSERTed.
Names the Import File and States its Format type; Names the Layout file to use And tells MultiLoad to APPLY the INSERTs.
ERROR TREATMENT OPTIONS FOR .DML LABEL DO INSERT FOR [MISSING UPDATE] ROWS ; Figure 5-9 In IMPORT mode, you may specify as many as five distinct error-treatment options for one .DML statement. For example, if there is more than one instance of a row, do you want MultiLoad to IGNORE the duplicate row, or to MARK it (list it) in an error table? If you do not specify IGNORE, then MultiLoad will MARK, or record all of the errors. Imagine you have a standard INSERT load that you know will end up recording about 20,000 duplicate row errors. Using the following syntax "IGNORE DUPLICATE INSERT ROWS;" will keep them out of the error table. By ignoring those errors, you gain three benefits: 1. You do not need to see all the errors. 2. The error table is not filled up needlessly. 3. MultiLoad runs much faster since it is not conducting a duplicate row check. When doing an UPSERT, there are two rules to remember: The default is IGNORE MISSING UPDATE ROWS. Mark is the default for all operations. When doing an UPSERT, you anticipate that some rows are missing, otherwise, why do an UPSERT. So, this keeps these rows out of your error table. The DO INSERT FOR MISSING UPDATE ROWS is mandatory. This tells MultiLoad to insert a row from the data source if that row does not exist in the target table because the update didn't find it. The table that follows shows you, in more detail, how flexible your options are:
Open table as spreadsheet ERROR TREATMENT OPTIONS IN DETAIL
.DML LABEL OPTION MARK DUPLICATE INSERT ROWS IGNORE DUPLICATE INSERT ROWS MARK DUPLICATE UPDATE ROWS IGNORE DUPLICATE UPDATE ROWS MARK MISSING UPDATE ROWS IGNORE MISSING UPDATE ROWS MARK MISSING DELETE ROWS IGNORE MISSING DELETE ROWS
WHAT IT DOES This option logs an entry for all duplicate INSERT rows in the UV_ERR table. Use this when you want to know about the duplicates. This tells MultiLoad to IGNORE duplicate INSERT rows because you do not want to see them. This logs the existence of every duplicate UPDATE row. This eliminates the listing of duplicate update row errors. This option ensures a listing of data rows that had to be INSERTed since there was no row to UPDATE. This tells MultiLoad NOT to list UPDATE rows as an error. This is a good option when doing an UPSERT since UPSERT will INSERT a new row. This option makes a note in the ET_Error Table that a row to be deleted is missing. This option says, "Do not tell me that a row to be deleted is missing.
.DML LABEL OPTION DO INSERT for MISSING UPDATE ROWS Figure 5-10
WHAT IT DOES This is required to accomplish an UPSERT. It tells MultiLoad that if the row to be updated does not exist in the target table, then INSERT the entire row from the data source.
Any words between /* */ are /* +++++++++++++++++++++++++++++++++++++*/ COMMENTS ONLY and are not processed by Teradata. /* MultiLoad SCRIPT */ /*This script is designed to change the */ /*EMPLOYEE_DEPT table using the data from */ /* the IMPORT INFILE CDW_Join_Export.txt */ /* Version 1.1 */ /* Created by Coffing Data Warehousing*/ /* +++++++++++++++++++++++++++++++++++++ */ /* Setup the MulitLoad Logtables, Logon Statements*/ .LOGTABLE SQL01.CDW_Log; .LOGON TDATA/SQL01,SQL01; DATABASE SQL01; /*Drop Error Tables */ DROP TABLE WORKDB.CDW_ET; DROP TABLE WORKDB.CDW_UV; /* Begin Import and Define Work and Error Tables */ .BEGIN IMPORT MLOAD TABLES Employee_Dept WORKTABLES Begins the Load Process by telling us first the names of the Target Table, Work table and error tables are in a work database. Note there is no comma between the names of the error tables (pair). Drops Existing error tables in the work database. Sets up a Logtable and then logs on to Teradata. Specifies the database in which to find the target table. Names and describes the purpose of the script; names the author
WORKDB.CDW_WT ERRORTABLES WORKDB.CDW_ET WORKDB.CDW_UV; /* Define Layout of Input File */ .LAYOUT FILEIN; .FIELD Employee_No * CHAR(11); .FIELD First_Name * CHAR(14); .FIELD Last_Name * CHAR(20); .FIELD Dept_No * CHAR(6); .FIELD Dept_Name * CHAR(20); /* Begin INSERT Process on Table */ .DML LABEL INSERTS IGNORE DUPLICATE INSERT ROWS; INSERT INTO SQL01.Employee_Dept ( Employee_No ,First_Name ,Last_Name ,Dept_No ,Dept_Name) VALUES ( :Employee_No ,:First_Name, ,:Last_Name, ,:Dept_No, ,:Dept_Name); /* Specify IMPORT File and Apply Parameters */ .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS; .END MLOAD; .LOGOFF; Open table as spreadsheet Figure 5-11 Ends MultiLoad and logs off of Teradata Names the Import File and States its Format type; names the Layout file to use and tells MultiLoad to APPLY the INSERTs. Names the DML Label Tells MultiLoad NOT TO LIST duplicate INSERT rows in the error table; notice the option is placed AFTER the LABEL identification and immediately BEFORE the DML function. Lists, in order, the VALUES to be INSERTed. Names the LAYOUT of the INPUT file. Defines the structure of the INPUT file. Notice the dots before the FIELD command and the semi-colons after each FIELD definition.
/* !/bin/ksh*
*/
*/ are /*MultiLoad IMPORT SCRIPT with two INPUT files Any words between /* comments only and are not processed */ by Teradata. */ /*This script INSERTs new rows into the */ /* Employee_table and UPDATEs the Dept_Name */ /*in the Department_table. /* Version 1.1 */ */ */
/* +++++++++++++++++++++++++++++++++++++*/ .LOGTABLE SQL01.EMPDEPT_LOG; .RUN FILE c:\mydir\logon.txt; Sets up a Logtable and logs on with .RUN. The logon.txt file contains: .logon TDATA/SQL01,SQL01; DROP TABLE SQL01.EMP_WT; DROP TABLE SQL01.DEPT_WT; DROP TABLE SQL01.EMP_ET; DROP TABLE SQL01.EMP_UV; DROP TABLE SQL01.DEPT_ET; DROP TABLE SQL01.DEPT_UV; /* the following defines 2 tables for loading */ .BEGIN IMPORT MLOAD TABLES SQL01.Employee_Table, SQL01.Department_Table WORKTABLES SQL01.EMP_WT, SQL01.DEPT_WT ERRORTABLES SQL01.EMP_ET SQL01.EMP_UV, SQL01.DEPT_ET SQL01.DEPT_UV; /* these next 2 LAYOUTs define 2 different records */ .LAYOUT FILEIN1; Names and Defines the LAYOUT of the st 1 INPUT file Identifies the 2 target tables with a comma between them. Names the worktable and error tables for each target table; Note there are NO commas between the pair of names, but there is a comma between this pair and the next pair. Drops the worktables and error tables, in case they existed from a prior load; NOTE: Do NOT include IF you want to RESTART using CHECKPOINT.
* DECIMAL (10,2);
.FIELD Dept_Num * INTEGER; .LAYOUT FILEIN2; .FIELD DeptNo * CHAR(6); .FIELD DeptName * CHAR(20); .DML LABEL EMP_INS IGNORE DUPLICATE INSERT ROWS; INSERT INTO SQL01.Employee_Table VALUES (:Emp_No ,:FName ,:LName ,:Sal ,:Dept_Num); .DML LABEL DEPT_UPD; UPDATE Department_Table SET Dept_Name = :DeptName WHERE Dept_No = :DeptNo; .IMPORT INFILE Emp_Data LAYOUT FILEIN1 APPLY EMP_INS; .IMPORT INFILE Dept_Data LAYOUT FILEIN2 APPLY DEPT_UPD; .END MLOAD; .LOGOFF; Open table as spreadsheet Figure 5-12 Ends MultiLoad and logs off of Teradata. Names the 2nd DML Label; Tells MultiLoad to UPDATE when it finds Deptno (record) equal to the Dept_No in the Department_table and change the Dept_name column with the DeptName from the INPUT file. Names the TWO Import Files Names the TWO Layouts that define the structure of the INPUT DATA files and tells MultiLoad to APPLY the INSERTs to target table 1 and the UPDATEs to target table 2. Names the 1st DML Label; Tells MultiLoad to IGNORE duplicate INSERT rows because you do not want to see them. INSERT a row into the table, but does NOT name the columns. So all VALUES are passed IN THE ORDER they are defined in the Employee table. Names and Defines the LAYOUT of the 2nd INPUT file
this code and know which layout to use for using different names in the same layout. To do this you will need to REDEFINE the INPUT. You do this by redefining a field's position in the .FIELD or .FILLER section of the LAYOUT. Unlike the asterisk (*), which means that a field simply follows the previous one, redefining will cite a number that tells MultiLoad to take a certain portion of the INPUT file and jump to the redefined position to back toward the beginning of the record.
Any words between /* */ are /* +++++++++++++++++++++++++++++++++++++*/ comments only and are not processed /* MultiLoad IMPORT SCRIPT with multiple target by Teradata. */ /*tables and DML labels */ /*This script INSERTs new rows into the */ /* Employee_table and UPDATEs the Dept_Name */ /*in the Department_table /* Version 1.1 */ */
/* Created by Coffing Data Warehousing */ /* +++++++++++++++++++++++++++++++++++++*/ .LOGTABLE SQL01.EmpDept_Log; .LOGON TDATA/SQL01,SQL01; /* 2 target tables, 2 work tables, 2 error tables per target table, defined in pairs BEGIN IMPORT MLOAD TABLES SQL01.Employee_Table, SQL01.Department_Table WORKTABLES SQL01.EMP_WT, SQL01.DEPT_WT ERRORTABLES SQL01.EMP_ET SQL01.EMP_UV, SQL01.DEPT_ET SQL01 .DEPT_UV; */ Sets Up a Logtable and Logs on to Teradata; Optionally, specifies the database to work in. Identifies the 2 target tables; Names the worktable and error tables for each target tables; Note there is no comma between the names of the error tables but there is a comma between the pair of error tables.
.LAYOUT FILEIN; .FILLER Trans .FIELD Emp_No .FIELD LName .FIELD FName .FIELD Sal .FIELD DeptNo * CHAR (1); * INTEGER; * CHAR(20); * VARCHAR(20); * DECIMAL (10,2); 2 INTEGER;
Names and defines the LAYOUT of the INPUT record. The FILLER is for a field that tells what type of record has been read. Here that field contains an "E" or a "D". The "E" tells MLOAD use the Employee data and the "D" is for department data. The definition for Dept_Num tells MLOAD to jump backward to byte 2. Where as the * for Emp_Num defaulted to byte 2. So, Emp_No and Dept_Num both start at byte 2, but in different types of records. When Trans (byte position 1) contains a "D", the APPLY uses the dept names and for an "E" the APPLY uses the employee data. Names the 1st DML Label; Tells MultiLoad to IGNORE duplicate INSERT rows because you do not want to see them. Tells MultiLoad to INSERT a row into the 1st target table but optionally does NOT define the target table row format. All the VALUES are passed to the columns of the Employee table IN THE ORDER of that table's row format.
.DML LABEL EMPIN IGNORE DUPLICATE INSERT ROWS; INSERT INTO SQL01.Employee_Table VALUES ( :Emp_No ,:FName ,:LName ,:Sal ,:Dept_Num ); .DML LABEL DEPTIN; UPDATE Department_Table SET Dept_Name = :DeptName WHERE Dept_No = :DeptNo;
Names the 2
nd
DML Label;
nd
Tells MultiLoad to UPDATE the 2 target table but optionally does NOT define that table's row format. When the VALUE of the DeptNo equals that of the Dept_No column of the Department, then update the Dept_Name column with the DeptName from the INPUT file. Ends MultiLoad and logs off of Teradata.
.IMPORT INFILE UPLOAD.dat LAYOUT FILEIN APPLY EMPIN WHERE Trans = 'E' APPLY DEPTIN WHERE Trans = 'D' ; .END MLOAD; .LOGOFF; Open table as spreadsheet Figure 5-13
LOGOFF; Open table as spreadsheet Figure 5-14 How many differences from a MultiLoad IMPORT script readily jump off of the page at you? Here are a few that we saw: At the beginning, you must specify the word "DELETE" in the .BEGIN MLOAD command. You need not specify it in the .END MLOAD command. You will readily notice that this mode has no .DML LABEL command. Since it is focused on just one absolute function, no APPLY clause is required so you see no .DML LABEL. Notice that the DELETE with a WHERE clause is an SQL function, not a MultiLoad command, so it has no dot prefix. Since default names are available for worktables (WT_<target_tablename>) and error tables (ET_<target_tablename> and UV_<target_tablename>), they need not be specifically named, but be sure to define the Logtable. Do not confuse the DELETE MLOAD task with the SQL delete task that may be part of a MultiLoad IMPORT. The IMPORT delete is used to remove small volumes of data rows based upon the Primary Index. On the other hand, the MultiLoad DELETE does global deletes on tables, bypassing the Transient Journal. Because there is no Transient Journal, there are no rollbacks when the job fails for any reason. Instead, it may be RESTARTed from a CHECKPOINT. Also, the MultiLoad DELETE task is never based upon the Primary Index. Because we are not importing any data rows, there is neither a need for worktables nor an Acquisition Phase. One DELETE statement is sent to all the AMPs with a match tag parcel. That statement will be applied to every table row. If the condition is met, then the row is deleted. Using the match tags, each target block is read once and the appropriate rows are deleted.
This illustration demonstrates how passing the values of a data row rather than a hard coded value may be used to help meet the conditions stated in the WHERE clause. When you are passing values, you must add some additional commands that were not used in the DELETE example with hard coded values. .LOGTABLE RemoveLog; .LOGON TDATA/SQL01,SQL01; .BEGIN DELETE MLOAD TABLES Order_Table; .LAYOUT OldMonth Names the LAYOUT and defines the column whose value will be passed as a single row to MultiLoad. In this case, all of the .FIELD OrdDate * DATE; order dates in the Order_Table will be tested against this OrdDate value. The condition in the WHERE clause is that the data rows with orders placed prior to the date value (:OrdDate) passed from the LAYOUT OldMonth will be DELETEd from the Order_Table. Note that this time there is no dot in front of LAYOUT in this clause since it is only being referenced. Ends loading and logs off of Teradata. Begins the DELETE task and names only one table, but still uses TABLES option. Identifies the Logtable and logs onto Teradata with a valid logon string.
DELETE FROM Order_Table WHERE Order_Date < :OrdDate; .IMPORT INFILE LAYOUT OldMonth ; .END MLOAD;
/* +++++++++++++++++++++++++++++++++++++++++++++++++ */ /* MultiLoad UPSERT SCRIPT /*This script Updates the Student_Profile Table /* if the row to be updated does not exist. /* Version 1.1 */ */ */ */
/* with new data and Inserts a new row into the table */
*/
/* ++++++++++++++++++++++++++++++++++++++++++++++++++*/ /* Setup Logtable, Logon Statements*/ .LOGTABLE SQL01.CDW_Log; .LOGON CDW/SQL01,SQL01; /* Begin Import and Define Work and Error Tables */ .BEGIN IMPORT MLOAD TABLES SQL01.Student_Profile WORKTABLES SQL01.SWA_WT ERRORTABLES SQL01.SWA_ET SQL01.SWA_UV; /* Define Layout of Input File */ .LAYOUT FILEIN; .FIELD Student_ID * INTEGER; .FIELD Last_Name * CHAR (20); .FIELD First_Name * VARCHAR (12); .FIELD Class_Code * CHAR (2); .FIELD Grade_Pt * DECIMAL(5,2); Names the LAYOUT of the INPUT file; An ALL CHARACTER based flat file. Defines the structure of the INPUT file; Notice the dots before the FIELD command and the semi-colons after each FIELD definition; Names the DML Label Begins the Load Process by telling us first the names of the target table, work table and error tables. Sets Up a Logtable and then logs on to Teradata.
/* Begin INSERT and UPDATE Process on Table */ .DML LABEL UPSERTER DO INSERT FOR MISSING UPDATE ROWS; /* Without the above DO, one of these is guaranteed to fail on this same table. If the UPDATE fails because rows is missing, it corrects by doing the INSERT */ UPDATE SQL01.Student_Profile SET Last_Name = :Last_Name ,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_Pt WHERE Student_ID = :Student_ID; INSERT INTO SQL01.Student_Profile
Tells MultiLoad to INSERT a row if there is not one to be UPDATED, i.e., UPSERT. Defines the UPDATE. Qualifies the UPDATE. Defines the INSERT. We recommend placing comma separators in front of the following column or value for easier debugging.
VALUES (:Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt); .IMPORT INFILE CDW_IMPORT.DAT LAYOUT FILEIN APPLY UPSERTER; .END MLOAD; .LOGOFF; Open table as spreadsheet Figure 5-16 Names the Import File and it names the Layout file to use and tells MultiLoad to APPLY the UPSERTs. Ends MultiLoad and logs off of Teradata
****08:06:38 UTY0818 Statistics for table Employee_Table INSERTS: UPDATES: DELETES: 25000 25000 0
****08:06:41 UTY0818 Statistics for table Department_Table INSERTS: UPDATES: DELETES: Figure 5-17 0 0 20000
if desired. Earlier on, we noted that MultiLoad generates two error tables, the Acquisition Error and the Application error table. You may select from these tables to discover the problem and research the issues. For the most part, the Acquisition error table logs errors that occur during that processing phase. The Application error table lists Unique Primary Index violations, field overflow errors on non-PI columns, and constraint errors that occur in the APPLY phase. MultiLoad error tables not only list the errors they encounter, they also have the capability to STORE those errors. Do you remember the MARK and IGNORE parameters? This is where they come into play. MARK will ensure that the error rows, along with some details about the errors are stored in the error table. IGNORE does neither; it is as if the error never occurred.
Open table as spreadsheet THREE COLUMNS SPECIFIC TO THE ACQUISITION ERROR
System code that identifies the error. Name of the column in the target table where the error happened; is left blank if the offending column cannot be identified. The data row that contains the error.
TABLE Uniqueness DBCErrorCode DBCErrorField Contains a certain value that disallows duplicate row errors in this table; can be ignored, if desired. System code that identifies the error. Name of the column in the target table where the error happened; is left blank if the offending column cannot be identified. NOTE: A copy of the target table column immediately follows this column.
Figure 5-20
RESTARTing Multiload
Who hasn't experienced a failure at some time when attempting a load? Don't take it personally! Failures can and do occur on the host or Teradata (DBC) for many reasons. MultiLoad has the impressive ability to RESTART from failures in either environment. In fact, it requires almost no effort to continue or resubmit the load job. Here are the factors that determine how it works: First, MultiLoad will check the Restart Logtable and automatically resume the load process from the last successful CHECKPOINT before the failure occurred. Remember, the Logtable is essential for restarts. MultiLoad uses neither the Transient Journal nor rollbacks during a failure. That is why you must designate a Logtable at the beginning of your script. MultiLoad either restarts by itself or waits for the user to resubmit the job. Then MultiLoad takes over right where it left off. Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the host program will restart MultiLoad after Teradata is back up and running. You do not have to do a thing!
Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you may simply resubmit the script without changing a thing. MultiLoad will find out where it stopped and start again from that very spot. Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to run until complete. Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOAD clause will be enacted. The results are stored in the Logtable. During the Application Phase, CHECKPOINTs are logged each time a data block is successfully written to its target table. HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT as 60 or less, minutes are assumed. If you specify the checkpoint at 61 or above, the number of records is assumed.
You should be very cautious using the RELEASE command. It could potentially leave your table half updated. Therefore, it is handy for a test environment, but please don't become too reliant on it for production runs. They should be allowed to finish to guarantee data integrity.
You will find a more detailed discussion on how to write INMODs for MultiLoad in the chapter of this book titled, "INMOD Processing".
FastLoad Yes
MultiLoad Optional. 2 Error Tables have to exist for each target table and will automatically be assigned. Optional. 1 Work Table has to exist for each target table and will automatically be assigned. Yes No No Yes No Five INSERT, UPDATE, DELETE, and "UPSERT" DROP TABLE Yes Five Yes, in all 5 phases (auto CHECKPOINT) Yes Yes
No
Logtable must be defined Allows Referential Integrity Allows Unique Secondary Indexes Allows Non-Unique Secondary Indexes Allows Triggers Loads a maximum of n number of tables DML Statements Supported DDL Statements Supported Transfers data in 64K blocks Number of Phases Is RESTARTable Stores UPI Violation Rows Allows use of Aggregated, Arithmetic calculations or Conditional Exponentiation Allows Data Conversion NULLIF function Figure 5-21
No No No No No One INSERT CREATE and DROP TABLE Yes Two Yes Yes No
Yes Yes
Chapter 5: TPump
"Diplomacy is the art of saying "Nice Doggie" until you can find a rock." Will Rogers
Overview
The chemistry of relationships is very interesting. Frederick Buechner once stated, "My assumption is that the story of any one of us is in some measure the story of us all." In this chapter, you will find that TPump has similarities with the rest of the family of Teradata utilities. But this newer utility has been designed with fewer limitations and many distinguishing abilities that the other load utilities do not have. Do you remember the first Swiss Army knife you ever owned? Aside from its original intent as a compact survival tool, this knife has thrilled generations with its multiple capabilities. TPump is the Swiss Army knife of the Teradata load utilities. Just as this knife was designed for small tasks, TPump was developed to handle batch loads with low volumes. And, just as the Swiss Army knife easily fits in your pocket when you are loaded down with gear, TPump is a perfect fit when you have a large, busy system with few resources to spare. Let's look in more detail at the many facets of this amazing load tool.
clearly defined load windows, as the other utilities require. You can have TPump running in the background all the time, and just control its flow rate. DML Functions: Like MultiLoad, TPump does DML functions, including INSERT, UPDATE and DELETE. These can be run solo, or in combination with one another. Note that it also supports UPSERTs like MultiLoad. But here is one place that TPump differs vastly from the other utilities: FastLoad can only load one table and MultiLoad can load five tables. But, when it pulls data from a single source, TPump can load more than 60 tables at a time! And the number of concurrent instances in such situations is unlimited. That's right, not 15, but unlimited for Teradata! Well OK, maybe by your computer. I cannot imagine my laptop running 20 TPumps, but Teradata does not care. How could you use this ability? Well, imagine partitioning a huge table horizontally into multiple smaller tables and then performing various DML functions on all of them in parallel. Keep in mind that TPump places no limit on the number of jobs that may be established. Now, think of ways you might use this ability in your data warehouse environment. The possibilities are endless. More benefits: Just when you think you have pulled out all of the options on a Swiss Army knife, there always seems to be just one more blade or tool you had not noticed. Similar to the knife, TPump always seems to have another advantage in its list of capabilities. Here are several that relate to TPump requirements for target tables. TPump allows both Unique and Non-Unique Secondary Indexes (USIs and NUSIs), unlike FastLoad, which allows neither, and MultiLoad, which allows just NUSIs. Like MultiLoad, TPump allows the target tables to either be empty or to be populated with data rows. Tables allowing duplicate rows (MULTISET tables) are allowed. Besides this, Referential Integrity is allowed and need not be dropped. As to the existence of Triggers, TPump says, "No problem!" Support Environment compatibility: The Support Environment (SE) works in tandem with TPump to enable the operator to have even more control in the TPump load environment. The SE coordinates TPump activities, assists in managing the acquisition of files, and aids in the processing of conditions for loads. The Support Environment aids in the execution of DML and DDL that occur in Teradata, outside of the load utility. Stopping without Repercussions: Finally, this utility can be stopped at any time and all of locks may be dropped with no ill consequences. Is this too good to be true? Are there no limits to this load utility? TPump does not like to steal any thunder from the other load utilities, but it just might become one of the most valuable survival tools for businesses in today's data warehouse environment.
Rule #5: Dates before 1900 or after 1999 must be represented by the yyyy format for the year portion of the date, not the default format of yy. This must be specified when you create the table. Any dates using the default yy format for the year are taken to mean 20th century years. Rule #6: On some network attached systems, the maximum file size when using TPump is 2GB. This is true for a computer running under a 32-bit operating system. Rule #7: TPump performance will be diminished if Access Logging is used. The reason for this is that TPump uses normal SQL to accomplish its tasks. Besides the extra overhead incurred, if you use Access Logging for successful table updates, then Teradata will make an entry in the Access Log table for each operation. This can cause the potential for row hash conflicts between the Access Log and the target tables.
This is variable length text RECORD format separated by delimiters such as a comma. For this format you may only use VARCHAR, LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad LAYOUT. Note that two delimiter characters in a row denote a null value between them. Open table as spreadsheet
Figure 6-1
WHAT IT DOES
PARAMETER ERRLIMIT errcount [errpercent] You may specify the maximum number of errors, or the percentage, that you will tolerate during the processing of a load job. The key point here is that you should set the ERRLIMIT to a number greater than the PACK number. The reason for this is that sometimes, if the PACK factor is a smaller number than the ERRLIMIT, the job will terminate, telling you that you have gone over the ERRLIMIT. When this happens, there will be no entries in the error tables. In TPump, the CHECKPOINT refers to the number of minutes, or frequency, at which you wish a checkpoint to occur. This is unlike Mulitload which allows either minutes or the number of rows. This refers to the number of SESSIONS that should be established with Teradata. TPump places no limit on the number of SESSIONS you may have. For TPump, the optimal number of sessions is dependent on your needs and your host computer (like a laptop). Tells TPump how many hours to try logging on when less than the requested number of sessions is available. Tells TPump how frequently, in minutes, to try establishing additional sessions on the system.
CHECKPOINT (n)
SESSIONS (n)
TENACITY SLEEP
Figure 6-2
NOMONITOR
PACK (n)
RATE
ROBUST ON/OFF
If you specify ROBUST OFF, you are telling TPump to utilize "simple" RESTART logic: Just start from the last successful CHECKPOINT. Be aware that if some statements are reprocessed, such as those processed after the last CHECKPOINT, then you may end up with extra rows in your error tables. Why? Because some of the statements in the original run may have found errors, in which case they would have recorded those errors in an error table. SERIALIZE OFF/ON You only use the SERIALIZE parameter when you are going to specify a PRIMARY KEY in the .FIELD command. For example, ".FIELD Salaryrate * DECIMAL KEY." If you specify SERIALIZE TPump will ensure that all operations on a row will occur serially. If you code "SERIALIZE", but do not specify ON or OFF, the default is ON. Otherwise, the default is OFF unless doing an UPSERT. Open table as spreadsheet
Figure 6-3
TPUMP Example
"Don't use a big word where a diminutive one will suffice." - Unknown Don't use a big utility where TPump will suffice. TPump is great when you just want to trickle information into a table at all times. Think of it as a water hose filling up a bucket. Instead of filling the bucket up a glass of water a time (Fastload), we can just trickle the information in using a hose (TPUMP). The great thing about Tpump is that like a pump we can trickle in data or we can fire hose it in. If users are not on the system then we want to crank up the fire hose. If users are on the system and many of them are accessing a table we should trickle in the rows. For our TPUMP exercise, let's create an empty table:
Tpump is irreplaceable because no other utility works like it. Tpump can also use flat files to populate a table. While the script is somewhat different compared to other utilities, TPUMPs structure isn't completely foreign. Let's create our flat file to populate our empty table
Much of the TPump command structure should look quite familiar to you. It is quite similar to MultiLoad. In this example, the Student_Names table is being loaded with new data from the university's registrar. It will be used as an associative table for linking various tables in the data warehouse. /* This script inserts rows into Sets Up a Logtable and then logs on with .RUN. a table called student_names from a single file */ .LOGTABLE WORK_DB.LOG_PUMP; .RUN FILE C:\mydir\logon.txt; DATABASE SQL01; .BEGIN LOAD ERRLIMIT 5 CHECKPOINT 1 SESSIONS 64 Also specifies the database to find the necessary tables. Begins the Load Process; Specifies optional parameters. Names the error table for this run. The logon.txt file contains: .logon TDATA/SQL01,SQL01;.
TENACITY 2 PACK 40 RATE 1000 ERRORTABLE SQL01.ERR_PUMP; .LAYOUT FILELAYOUT; .FIELD Student_ID * INTEGER; .FIELD Last_Name * CHAR(20); .FILLER More_Junk * CHAR(20); .FIELD First_Name * CHAR(14); /* start comment - this could also be coded as: .FIELD Student_ID * INTEGER; .FIELD Last_Name * CHAR(20); .FIELD First_Name 45 CHAR(14); end of the comment */ .DML LABEL INSREC; INSERT INTO SQL01.Student_Names ( Student_ID ,Last_Name ,First_Name ) VALUES (:Student_ID ,:Last_Name ,:First_Name ); .IMPORT INFILE CDW_import.txt FORMAT TEXT LAYOUT FILELAYOUT APPLY INSREC; Names the IMPORT file; Names the LAYOUT to be called from above; tells TPump which DML Label to APPLY. Names the DML Label Tells TPump to INSERT a row into the target table and defines the row format; Comma separators are placed in front of the following column or value for easier debugging. Lists, in order, the VALUES to be INSERTed. Colons precede VALUEs. Names the LAYOUT of the INPUT record; Notice the dots before the .FIELD and .FILLER commands and the semi-colons after each FIELD definition. Also, the more_junk field moves the field pointer to the start of the First_name data. Notice the comment in the script.
Step One: Setting up a Logtable and Logging onto Teradata First, you define the Logtable using the .LOGTABLE command. We have named it LOG_PUMP in the WORK_DB database. The Logtable is automatically created for you. It may be placed in any database by qualifying the table name with the name of the database by using syntax like this: <databasename>.<tablename> Next, the connection is made to Teradata. Notice that the commands in TPump, like those in MultiLoad, require a dot in front of the command key word. Step Two: Begin load process, add parameters, naming the Error Table Here, the script reveals the parameters requested by the user to assist in managing the load for smooth operation. It also names the one error table, calling it SQL01.ERR_PUMP. Now let's look at each parameter: ERRLIMIT 5 says that the job should terminate after encountering five errors. You may set the limit that is tolerable for the load. CHECKPOINT 1 tells TPump to pause and evaluate the progress of the load in increments of one minute. SESSIONS 64 tells TPump to establish 64 sessions with Teradata. TENACITY 2 says that if there is any problem establishing sessions, then to keep on trying for a period of two hours. PACK 40 tells TPump to "pack" 40 data rows and load them at one time. RATE 1000 means that 1,000 data rows will be sent per minute. Step Three: Defining the INPUT flat file structure TPump, like MultiLoad, needs to know the structure the INPUT flat file record. You use the .LAYOUT command to name the layout. Following that, you list the columns and data types of the INPUT file using the .FIELD, .FILLER or .TABLE commands. Did you notice that an asterisk is placed between the column name and its data type? This means to automatically calculate the next byte in the record. It is used to designate the starting location for this data based on the previous field's length. If you are listing fields in order and need to skip a few bytes in the record, you can either use the .FILLER with the correct number of bytes as character to position to the cursor to the next field, or the "*" can be replaced by a number that equals the lengths of all previous fields added together plus 1 extra byte. When you use this technique, the .FILLER is not needed. In our example, this says to begin with Student_ID, continue on to load Last_Name, and finish when First_Name is loaded. Step Four: Defining the DML activities to occur At this point, the .DML LABEL names and defines the SQL that is to execute. It also names the columns receiving data and defines the sequence in which the VALUES are to be arranged. In our example, TPump is to INSERT a row into the SQL01.Student_NAMES. The data values coming in from the record are named in the VALUES with a colon prior to the name. This provides the PE with information on what substitution is to take place in the SQL. Each LABEL used must also be referenced in an APPLY clause of the .IMPORT clause. Step Five: Naming the INPUT file and defining its FORMAT Using the .IMPORT INFILE command, we have identified the INPUT data file as "CDW_import.txt". The file was created using the TEXT format.
Step Six: Associate the data with the description Next, we told the IMPORT command to use the LAYOUT called, "FILELAYOUT." Step Seven: Telling TPump to start loading Finally, we told TPump to APPLY the DML LABEL called INSREC that is, to INSERT the data rows into the target table. Step Seven: Finishing loading and logging off of Teradata The .END LOAD command tells TPump to finish the load process. Finally, TPump logs off of the Teradata system.
/* +++++++++++++++++++++++++++++++++++++ */ /* Setup the TPUMP Logtables, Logon Statements and Database Default */ .LOGTABLE SQL01.LOG_PUMP; .LOGON CDW/SQL01,SQL01; DATABASE SQL01; /* Begin Load and Define TPUMP Parameters and Error Tables */ .BEGIN LOAD ERRLIMIT 5 CHECKPOINT 1 SESSIONS 1 TENACITY 2 PACK 40 RATE 1000 ERRORTABLE SQL01.ERR_PUMP; .LAYOUT FILELAYOUT; .FIELD Student_ID * VARCHAR (11); Names the LAYOUT of the INPUT file. Defines the structure of the INPUT file; BEGINS THE LOAD PROCESS SPECIFIES MULTIPLE PARAMETERS TO AID IN PROCESS CONTROL NAMES THE ERRROR TABLE; TPump HAS ONLY ONE ERROR TABLE. Sets up a Logtable and then logs on to Teradata. Specifies the database containing the table.
.FIELD Last_Name * VARCHAR (20); .FIELD First_Name * VARCHAR (14); .FIELD Class_Code * VARCHAR (2); .FIELD Grade_Pt .DML LABEL INSREC IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS; INSERT INTO Student_Profile4 ( Student_ID ,Last_Name ,First_Name ,Class_Code ,Grade_Pt ) VALUES ( :Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt ); .IMPORT INFILE Cdw_import.txt FORMAT LAYOUT APPLY .END LOAD; .LOGOFF; Open table as spreadsheet Figure 6-5 VARTEXT "," FILELAYOUT INSREC; * VARCHAR (8);
here, all Variable CHARACTER data and the file has a comma delimiter. See .IMPORT below for file type and the declaration of the delimiter.
Names the DML Label; SPECIFIES 3 ERROR TREATMENT OPTIONS with the ; after the last option. Tells TPump to INSERT a row into the target table and defines the row format. Note that we place comma separators in front of the following column or value for easier debugging. Lists, in order, the VALUES to be INSERTed. A colon always precedes values.
Names the IMPORT file; Names the LAYOUT to be called from above; Tells TPump which DML Label to APPLY. Notice the FORMAT with a comma in the quotes to define the delimiter between fields in the input record. Tells TPump to stop loading and Logs Off all sessions.
/* ++++++++++++++++++++++++++++++++++ */ /* TPUMP SCRIPT using 2 Input Files CDW /* It loads STUDT_CONTACT Target Table CDW
*/ /*This script loads SQL01. Student_Profile3 */ /* Version 1.1 */ */ /* Created by Coffing Data Warehousing
/* ++++++++++++++++++++++++++++++++++++++++ */ .LOGTABLE SQL01.LOG_TPMP; .LOGON CDW/SQL01,SQL01; DATABASE SQL01; .BEGIN LOAD ERRLIMIT 5 CHECKPOINT 1 SESSIONS 1 TENACITY 2 PACK 40 RATE 1000 ERRORTABLE WORK_DB.ERR_TPMP ; .LAYOUT REC_LAYOUT1 INDICATORS; .FIELD Student_ID * INTEGER; .FIELD Last_name * CHAR(20); .FIELD First_name * VARCHAR(14); .FIELD Class_code * CHAR(2); .FIELD Grade_Pt * DECIMAL(8,2); Defines the LAYOUT for the 2 INPUT file with a different arrangement of fields
nd
Sets Up a Logtable and then logs on to Teradata. Specifies the database to work in (optional). Begins the load process Specifies multiple parameters to aid in load management Names the error table; TPump HAS ONLY ONE ERROR TABLE PER TARGET TABLE
Defines the LAYOUT for the 1st INPUT file also has the indicators for NULL data.
.LAYOUT REC_LAYOUT2; .FILLER Rec_Type * CHAR(1); .FIELD Last_name * CHAR(20); .FIELD First_name * VARCHAR(14); .FIELD Student_ID * INTEGER; .FIELD Class_code * CHAR(2); .FIELD Grade_Pt * DECIMAL(8,2);
.DML LABEL INSREC1 IGNORE DUPLICATE ROWS IGNORE EXTRA ROWS; INSERT INTO Student_Profile_OLD ( Student_ID ,Last_Name ,First_Name
Names the 1 DML Label and specifies 2 Error Treatment options. Tells TPump to INSERT a row into the target table and defines the row format. Lists, in order, the VALUES to be INSERTed. A colon always precedes values.
st
,Class_Code ,Grade_Pt ) VALUES ( :Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,: Grade_Pt ); .DML LABEL INSREC2 IGNORE DUPLICATE ROWS; INSERT INTO Student_Profile_NEW ( Student_ID ,Last_Name ,First_Name ,Class_Code ,Grade_Pt ) VALUES (:Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt ); .IMPORT INFILE FILE-REC1.DAT FORMAT FASTLOAD LAYOUT REC_LAYOUT1 APPLY INSREC1; .IMPORT INFILE FILE-REC2.DAT FORMAT TEXT LAYOUT REC_LAYOUT2 APPLY INSREC2 ; .END LOAD; .LOGOFF; Open table as spreadsheet Figure 6-7 Tells TPump to stop loading and logs off all sessions. Names the TWO Import Files as FILE-REC1.DAT and FILEREC2.DAT. The file name is under Windows so the "-"is fine. Names the TWO Layouts that define the structure of the INPUT DATA files; Names the TWO INPUT data files Names the 2 DML Label and specifies 1 Error Treatment options. Tells TPump to INSERT a row into the target table and defines the row format. Lists, in order, the VALUES to be INSERTed. A colon always precedes values.
nd
/* this is an UPSERT TPump script */ .LOGTABLE SQL01.CDW_LOG; .LOGON CDW/SQL01,SQL01; .BEGIN LOAD ERRLIMIT 5 CHECKPOINT 10 SESSIONS 10 TENACITY 2 PACK 10 RATE 10 ERRORTABLE SQL01.SWA_ET; .LAYOUT INREC INDICATORS; .FIELD StudentID INTEGER; .FIELD Last_name CHAR(20); .FIELD First_name VARCHAR(14); .FIELD Class_code CHAR(2); .FIELD Grade_Pt DECIMAL(8,2); * * * * *
Begins the load process Specifies multiple parameters to aid in load management Names the error table; TPump HAS ONLY ONE ERROR TABLE PER TARGET TABLE
Defines the LAYOUT for the 1st INPUT file; also has the indicators for NULL data.
.DML LABEL UPSERTER DO INSERT FOR MISSING UPDATE ROWS; UPDATE Student_Profile SET Last_Name = :Last_Name ,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_Pt WHERE Student_ID = :StudentID ; INSERT INTO Student_Profile VALUES ( :StudentID ,:Last_Name ,:First_Name
Names the 1st DML Label and specifies 2 Error Treatment options. Tells TPump to INSERT a row into the target table and defines the row format. Lists, in order, the VALUES to be INSERTed. A colon always precedes values.
,:Class_Code ,:Grade_Pt ); .IMPORT INFILE UPSERTFILE.DAT FORMAT FASTLOAD LAYOUT INREC APPLY UPSERTER ; .END LOAD; .LOGOFF; Open table as spreadsheet Figure 6-8 NOTE: The above UPSERT uses the same syntax as MultiLoad. This continues to work. However, there might soon be another way to accomplish this task. NCR has built an UPSERT and we have tested the following statement, without success: UPDATE SQL01.Student_Profile SET Last_Name =:Last_Name ,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_Pt WHERE Student_ID = :Student_ID; ELSE INSERT INTO SQL01.Student_Profile VALUES (:Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt); We are not sure if this will be a future technique for coding a TPump UPSERT, or if it is handled internally. For now, use the original coding technique. Tells TPump to stop loading and logs off all sessions. Names the Import File as UPSERT-FILE.DAT. The file name is under Windows so the "-"is fine. The file type is FASTLOAD.
Monitoring TPump
TPump comes with a monitoring tool called the TPump Monitor. This tool allows you to check the status of TPump jobs as they run and to change (remember "throttle up" and "throttle down?") the statement rate on the fly. Key to this monitor is the "SysAdmin.TpumpStatusTbl" table in the Data Dictionary Directory. If your Database Administrator creates this table, TPump will update it on a minute-by-minute basis when it is running. You may update the table to change the statement rate for an IMPORT. If you want TPump to run unmonitored, then the table is not needed. You can start a monitor program under UNIX with the following command: tpumpmon [-h] [TDPID/] <UserName>,<Password> [,<AccountID>]
Below is a chart that shows the Views and Macros used to access the "SysAdmin.TpumpStatusTbl" table. Queries may be written against the Views. The macros may be executed.
Open table as spreadsheet Views and Macros to access the table
SysAdmin.TpumpStatusTbl View View Macro Macro Figure 6-9 SysAdmin.TPumpStatus SysAdmin.TPumpStatusX Sysadmin.TPumpUpdateSelect TPumpMacro.UserUpdateSelect
Handling Errors in TPump Using the Error Table One Error Table
Unlike FastLoad and MultiLoad, TPump uses only ONE Error Table per target table, not two. If you name the table, TPump will create it automatically. Entries are made to these tables whenever errors occur during the load process. Like MultiLoad, TPump offers the option to either MARK errors (include them in the error table) or IGNORE errors (pay no attention to them whatsoever). These options are listed in the .DML LABEL sections of the script and apply ONLY to the DML functions in that LABEL. The general default is to MARK. If you specify nothing, TPump will assume the default. When doing an UPSERT, this default does not apply. The error table does the following: Identifies errors Provides some detail about the errors Stores a portion the actual offending row for debugging When compared to the error tables in MultiLoad, the TPump error table is most similar to the MultiLoad Acquisition error table. Like that table, it stores information about errors that take place while it is trying to acquire data. It is the errors that occur when the data is being moved, such as data translation problems that TPump will want to report on. It will also want to report any difficulties compiling valid Primary Indexes. Remember, TPump has less tolerance for errors than FastLoad or Multiload.
Open table as spreadsheet COLUMNS IN THE TPUMP ERROR TABLE
Sequence number that identifies the IMPORT command where the error occurred Sequence number for the DML statement involved with the error Sequence number of the DML statement being carried out when the error was discovered Sequence number that tells which APPLY clause was running when the error occurred The number of the data row in the client file that was being built when the error took place Identifies the INPUT data source where the error row came from
System code that identifies the error Generic description of the error Number of the column in the target table where the error happened; is left blank if the offending column cannot be identified; This is different from MultiLoad, which supplies the column name. The data row that contains the error, limited to the first 63,728 bytes related to the error
HostData
Figure 6-10
RESTARTing TPump
Like the other utilities, a TPump script is fully restartable as long as the log table and error tables are not dropped. As mentioned earlier you have a choice of setting ROBUST either ON (default) or OFF. There is more overhead using ROBUST ON, but it does provide a higher degree of data integrity, but lower performance.
MultiLoad Optional, 2 per target table Optional, 1 per target table Yes No No Yes No Five 15 Table INSERT, UPDATE, DELETE, "UPSERT" Runs actual DML commands All Yes Yes Yes, with MARK option No
TPump Optional, 1 per target table No Yes Yes Yes Yes Yes 60 Unlimited Row Hash INSERT, UPDATE, DELETE, "UPSERT" Compiles DML into MACROS and executes All No, moves data at row level Yes Yes, with MARK option No
Function Error Tables must be defined Work Tables must be defined Logtable must be defined Allows Referential Integrity Allows Unique Secondary Indexes Allows Non-Unique Secondary Indexes Allows Triggers Loads a maximum of n number of tables Maximum Concurrent Load Instances Locks at this level DML Statements Supported How DML Statements are Performed DDL Statements Supported Transfers data in 64K blocks RESTARTable Stores UPI Violation Rows Allows use of Aggregated, Arithmetic calculations or Conditional Exponentiation Allows Data Conversion Performance Improvement Table Access During Load
Yes By using multi-statement requests Allows simultaneous READ and WRITE access due to
MultiLoad Phase
TPump Row Hash Locking No repercussions Allows consumption management via Parameters
Function
Chapter 6: FastExport
"An invasion of armies can be resisted, but not an idea whose time has come." - Victor Hugo
If the output data is sorted, FastExport may be required to redistribute the selected data two times across the AMP processors in order to build the blocks in the correct sequence. Remember, a lot of rows fit into a 64K block and both the rows and the blocks must be sequenced. While all of this redistribution is occurring, BTEQ continues to send rows. FastExport is getting behind in the processing. However, when FastExport starts sending the rows back a block at a time, it quickly overtakes and passes BTEQ's row at time processing. The other advantage is that if BTEQ terminates abnormally, all of your rows (which are in SPOOL) are discarded. You must rerun the BTEQ script from the beginning. However, if FastExport terminates abnormally, all the selected rows are in worktables and it can continue sending them where it left off in a very smart and very fast manner! Also, if there is a requirement to manipulate the data before storing it on the computer's hard drive, an OUTMOD routine can be written to modify the result set after it is sent back to the client on either the mainframe or LAN. Just like the BASF commercial states, "We don't make the products you buy, we make the products you buy better". FastExport is designed off the same premise, it does not make the SQL SELECT statement faster, but it does take the SQL SELECT statement and processes the request with lighting fast parallel processing!
FastExport Fundamentals
#1: FastExport EXPORTS data from Teradata. The reason they call it FastExport is because it takes data off of Teradata (Exports Data). FastExport does not import data into Teradata. Additionally, like BTEQ it can output multiple files in a single run. #2: FastExport only supports the SELECT statement. The only DML statement that FastExport understands is SELECT. You SELECT the data you want exported and FastExport will take care of the rest. #3: Choose FastExport over BTEQ when Exporting Data of more than half a million+ rows. When a large amount of data is being exported, FastExport is recommended over BTEQ Export. The only drawback is the total number of FastLoads, FastExports, and MultiLoads that can run at the same time, which is limited to 15. BTEQ Export does not have this restriction. Of course, FastExport will work with less data, but the speed may not be much faster than BTEQ. #4: FastExport supports multiple SELECT statements and multiple tables in a single run. You can have multiple SELECT statements with FastExport and each SELECT can join information up to 64 tables. #5: FastExport supports conditional logic, conditional expressions, arithmetic calculations, and data conversions. FastExport is flexible and supports the above conditions, calculations, and conversions. #6: FastExport does NOT support error files or error limits. FastExport does not record particular error types in a table. The FastExport utility will terminate after a certain number of errors have been encountered. #7: FastExport supports user-written routines INMODs and OUTMODs. FastExport allows you write INMOD and OUTMOD routines so you can select, validate and preprocess the exported data
The FastExport utility is supported on either the mainframe or on LAN. The information below illustrates which operating systems are supported for each environment: The LAN environment supports the following Operating Systems: UNIX MP-RAS Windows 2000 Windows 95/98/XP Windows NT/2000 UNIX HP-UX AIX Solaris SPARC Solaris Intel The Mainframe (Channel Attached) environment supports the following Operating Systems: MVS VM
Maximum of 15 Loads
The Teradata RDBMS will only support a maximum of 15 simultaneous FastLoad, MultiLoad, or FastExport utility jobs. This maximum value is determined and configured in the DBS Control record. This value can be set from 0 to 15. When Teradata is initially installed, this value is set at 5. The reason for this limitation is that FastLoad, MultiLoad, and FastExport all use large blocks to transfer data. If more then 15 simultaneous jobs were supported, a saturation point could be reached on the availability of resources. In this case, Teradata does an excellent job of protecting system resources by queuing up additional FastLoad, MultiLoad, and FastExport jobs that are attempting to connect. For example, if the maximum number of utilities on the Teradata system is reached and another job attempts to run, that job will not start. This limitation should be viewed as a safety control feature. A tip for remembering how the load limit applies is this, "If the name of the load utility contains either the word "Fast" or the word "Load", then there can be only a total of fifteen of them running at any one time". BTEQ does not have this load limitation. FastExport is clearly the better choice when exporting data. However, if two many load jobs are running. BTEQ is an alternate choice for exporting data.
DISPLAY ELSE ENDIF IF LOGOFF LOGON LOGTABLE ROUTE MESSAGES RUN FILE
Writes messages to the specific location. Used in conjunction with the IF statement. ELSE commands and statements will execute when a proceeding IF condition is false. Used in conjunction with the IF or ELSE statements. Delimits the commands that were subject to previous IF or ELSE conditions. Introduces a conditional expression. If true then execution of subsequent commands will happen. Disconnects all FastExport active sessions and terminates FastExport. LOGON command or string used to connect sessions established through the FastExport utility. FastExport utilizes this to specify a restart log table. The purpose is for FastExport checkpoint information. Will route FastExport messages to an alternate destination. Used to point to a file that FastExport is to use as standard input. This will Invoke the specified external file as the current source of utility and Teradata SQL commands. Assigns a data type and value to a variable.
SET SYSTEM
Suspends the FastExport utility temporarily and executes any valid local operating system command before returning. Open table as spreadsheet
Figure 3-1
Task Commands
BEGIN EXPORT END EXPORT EXPORT Begins the export task and sets the specifications for the number of sessions with Teradata. Ends the export task and initiates processing by Teradata. Provides two things which are: FIELD FILLER The client destination and file format specifications for the export data retrieved from Teradata A generated MultiLoad script file that can be used later to reload the export data back into Teradata
Constitutes a field in the input record section that provides data values for the SELECT statement. Specifies a field in the input record that will not be sent to Teradata for processing. It is part of the input record to provide data values for the SELECT statement. Defines the file that provides the USING data values for the SELECT. Specifies the data layout for a file. It contains a sequence of FIELD and FILLER commands. This is used to describe the import file that can optionally
IMPORT LAYOUT
SQL Commands
ALTER TABLE CHECKPOINT COLLECT STATISTICS COMMENT CREATE DATABASE CREATE TABLE CREATE VIEW CREATE MACRO DATABASE DELETE DELETE DATABASE DROP DATABASE GIVE GRANT MODIFY DATABASE RENAME REPLACE MACRO REPLACE VIEW REVOKE SET SESSION COLLATION Change a column or table options of a table. Add a checkpoint entry in the journal table. Collect statistics for one or more columns or indexes in a table. Store or retrieve a comment string for a particular object. Creates a new database. Creates a new table. Creates a new view. Creates a new macro. Specify a default database for the session. Delete rows from a table. Removes all tables, views, macros, and stored procedures from a database. Drops a database. Transfer ownership of a database or user to another user. Grant access privileges to an object. Change the options for a database. Change the name of a table, view, or macro. Change a macro. Change a view. Revoke privileges to an object. Override the collation specification during the current session.
UPDATE Change a column value of an existing row or rows in a table. Open table as spreadsheet Figure 3-3
,EMP.First_Name (CHAR(14)) ,EMP.Last_Name (CHAR(20)) ,DEPT.Dept_No (CHAR(6)) ,DEPT.Dept_name (CHAR(20)) FROM SQL_CLASS.Employee_table AS EMP INNER JOIN SQL_CLASS.Department_Table AS
DEPT ON EMP.Dept_No = DEPT.Dept_No ; /* Finish the Export Job and Write to File */ .END EXPORT; .LOGOFF; Open table as spreadsheet Figure 3-5 END THE JOB AND LOGOFF TERADATA;
Formats
FastExport has many possible formats in the UNIX or LAN environment. The FORMAT statement specifies the format for each record being exported which are: FASTLOAD BINARY TEXT UNFORMAT The default FORMAT is FASTLOAD in a UNIX or LAN environment. FASTLOAD format has a two-byte integer, followed by the data, followed by an end-of-record marker. It is called FASTLOAD because the data is exported in a format ready for FASTLOAD. BINARY format is a two-byte integer, followed by data. TEXT format is an arbitrary number of bytes followed by an end-of-record marker. UNFORMAT format is exactly as received from CLIv2 without any client modifications.
/*@(#)FASTEXPORT SCRIPT - SWA */ /*@(#)Version 1.1 */ /*@(#)Created by CoffingDW */ /*-------------------------------------------------- */ /* Setup the Fast Export Parameters */ .LOGTABLE SQL01.SWA_LOG; .LOGON CDW/sql01,whynot; .BEGIN EXPORT SESSIONS 12; .EXPORT OUTFILE Cdw_import.txt FORMAT BINARY; /* Get Data From the Student Table */ SELECT Student_ID Last_name First_name Class_code Grade_Pt (CHAR(11)), (CHAR(20)), (CHAR(14)), (CHAR(2)), (CHAR(8)) GET THE DATA FROM THE STUDENT TABLE; NAME THE OUTPUT FILE AND SET THE FORMAT TO BINARY; CREATE LOGTABLE AND LOGON TO TERADATA; BEGIN EXPORT STATEMENT;
FROM SQL_CLASS.Student_Table; /* Finish the Export Job and Write to File */ .END EXPORT; .LOGOFF; Open table as spreadsheet END THE JOB
What is an INMOD
When data is being loaded or incorporated into the Teradata Relational Database Management System (RDBMS), the processing of the data is performed by the utility. All of the NCR Teradata RDBMS utilities are able to read files that contain a variety of formatted and unformatted data. They are able to read from disk and from tape. These files and devices must support a sequential access method. Then, the utility is responsible for incorporating the data into SQL for use by
Teradata. However, there are times when it is advantageous to use a different access technique or a special device. When special input processing is desired, then an INMOD (acronym for INput MODule) is a potential approach to solving the problem. An INMOD is written to perform the input of the data from a data source. It removes the responsibility of performing input data from the utility. Many times an INMOD is written because the utility is not capable of performing the particular input processing. Other times, it is written for convenience. The INMOD is a user written routine to do the specialized access from the file system, device or database. The INMOD does not replace the utility; it becomes a part of and an extension of the utility. The major difference is that instead of the utility receiving the data directly, it receives the data from the INMOD. An INMOD can be written to work with FastLoad, MultiLoad, TPump and FastExport. As an example, an INMOD might be written to access the data directly from another RDBMS besides Teradata. It would be written to perform the following steps: 1. Connect to the RDBMS 2. Retrieve a row using a SELECT or DECLARE CURSOR 3. Pass the row to the utility 4. Loop back and do steps 2 & 3 until there is no more data 5. When there is no more data, disconnect from the RDBMS
The following diagram illustrates the logic flow when using an INMOD with the utility:
As seen in the above diagrams, there is an extra step involved with the processing of an INMOD. On the other hand, it can eliminate the need to create an intermediate file by literally using another RDBMS as its data source. However, the user still scripts and executes the utility, like when using a file, that portion does not change. The following chart shows the appropriate languages for mainframe and network-attached systems: written in.
Open table as spreadsheet Operating
Programming Language
System
Programming Language Assembler, COBOL, SAS/C or IBM PL/I C (although not supported, MicroFocus COBOL can be used)
Figure 7-1
Writing an INMOD
The writing of an INMOD is primarily concerned with processing an input data source. However, it cannot do the processing haphazardly. It must wait for the utility to tell it what and when to perform every operation. It has been previously stated that the INMOD returns data to the utility. At the same time, the utility needs to know that it is expecting to receive the data. Therefore, a high degree of handshake processing is necessary for the two components (INMOD and utility) to know what is expected. As well as passing the data, a status code is sent back and forth between the utility and the INMOD. As with all processing, we hope for a successful completion. Earlier in this book, it was shown that a zero status code indicates a successful completion. That same situation is true for communications between the utility and the INMOD. Therefore, a memory area must be allocated that is shared between the INMOD and the utility. The area contains the following elements: 1. The return or status code 2. The length of the data that follows 3. The data area
Assembler
DSECT DS DS DS F F CL<data-length>
COBOL
01 PARM-REC. 03 RETCODE 03 RETLENGTH 03 RETDATA PIC S9(9) COMP. PIC 9(9) COMP. PIC X(<data-length>).
PL/I
DCL
Figure 7-3
Return/status codes from FastLoad to the INMOD
Indicates that . . .
FastLoad is calling the INMOD for the first time. The INMOD should open/connect to the data source, read the first record and return it to FastLoad. FastLoad is calling for the next record. The INMOD should read the next record and return it to FastLoad.
Indicates that . . .
FastLoad and the INMOD failed and have been restarted. The INMOD should use the saved record count to reposition in the input data source to where it left off. Since checkpoint is optional in FastLoad, it must be requested in the script. This also means that for values 0 and 1, the INMOD must count each record and save the record count for use if needed. Do not return a record to FastLoad. FastLoad has written a checkpoint. The INMOD should guarantee that the record count has been written to disk. Do not return a record to FastLoad. The Teradata RDBMS failed. The INMOD should use the saved record count to reposition in the input data source to where it left off. Do not return a record to FastLoad. FastLoad has finished loading the data to Teradata. The INMOD should cleanup and end.
Figure 7-4
Return/status codes for the INMOD to FastLoad
Indicates that . . . The INMOD is returning data to the utility. The utility is at end of file.
Figure 7-5
Entry point for FastLoad used in the DEFINE:
SAS/C
<dynamic-name-by-user>
Figure 7-6
NCR Corporation provides two examples for writing a FastLoad INMOD. The first is called BLKEXIT.C, which does not contain the checkpoint and restart logic, and the other is BLKEXITR.C that does contain both checkpoint and restart logic.
RETLENGTH DS
DCL 1 PARM-REC, 10 RETCODE FIXED BINARY(31,0) PIC X(<data-length>) 10 RETLENGTH FIXED BINARY(15,0)
Second Parameter definition for INMOD to MultiLoad, TPump and FastExport Assembler IPARM ISEQNUM ILENGTH IDATA C Struct { long iseqnum; short ilength; char ibuffer(<data-length>); COBOL 01 PARM-REC. 03 ISEQNUM PIC 9(9) COMP. DSECT DS DS DS F H CL<data-length>
PIC X(<data-length>)
Indicates that . . .
Value 0 The utility is calling the INMOD for the first time. The INMOD should open/connect to the data source, read the first record and return it to the utility. The utility is calling for the next record. The INMOD should read the next record and return it to the utility. The utility and the INMOD failed and have been restarted. The INMOD should use the saved record count to reposition in the input data source to where it left off. Since checkpoint is optional in The utility, it must be requested in the script. This also means that for values 0 and 1, the INMOD must count each record and save the record count for use if needed. Do not return a record to the utility. The utility needs to write a checkpoint. The INMOD should guarantee that the record count has been written to disk and return it to the utility in the second parameter to be stored in the LOGTABLE. Do not return a record to the utility. The Teradata RDBMS failed. The INMOD should receive the record count from the utility in the second parameter for use in repositioning in the input data source to where it left off. Do not return a record to the utility. The utility has finished loading the data to Teradata. The INMOD should cleanup and end. The INMOD should initialize prepare to receive the first data record from the utility. The INMOD should receive the next data record from the utility.
1 2
5 6 7
Figure 7-9 The following diagram shows how to use the return codes of 6 and 7:
Indicates that . . . The INMOD is returning data to the utility. The utility is at end of file.
Entry point for MultiLoad, TPump and FastExport: All languages <dynamic-name-byuser>
Figure 7-11
Migrating an INMOD
As seen in figures 7-4 and 7-9, many of the return codes are the same. However, it should also be noted that FastLoad must remember the record count in case a restart is needed, whereas, the other utilities send the record count to the INMOD. If the INMOD fails to accept the record count when sent to it, the job will abort or hang and never finish successfully. This means that if a FastLoad INMOD is used in one of the other utilities, it will work as long as the utility never requests that a checkpoint take place. Remember that unlike FastLoad, the newer utilities default to a checkpoint every 15 minutes. The only way to turn it off is to set the CHECKPOINT option of the .BEGIN to a number than is higher than the number of records being processed. Therefore, it is not the best practice to simply use a FastLoad INMOD as if it is interchangeable. It is better to modify the INMOD logic for the restart and checkpoint processing necessary to receive the record count and use it for the repositioning operation.
As seen earlier in this book, there is a NOTIFY statement. If the standard values are acceptable, you should use them. However, if they are not, you may write your own NOTIFY routine. If you chose to do this, refer to the NCR Utilities manual for guidance for writing this processing. We just want you to know here that it is something you can do.
Sample INMOD
Below is and example of the PROCEDURE DIVISION commands that might be used for MultiLoad, TPump or FastExport. PROCEDURE DIVISION USING PARM-1, PARM-2. BEGIN. MAIN. { specific user processing goes here, followed by: } IF RETCODE= 0 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 0 - INITIALIZE & READ " PERFORM 100-OPEN-FILES PERFORM 200-READ-INPUT GOBACK ELSE IF RETCODE= 1 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 1- READ" PERFORM 200-READ-INPUT GOBACK ELSE IF RETCODE= 2 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 2 - RESTART " PERFORM 900-GET-REC-COUNT PERFORM 950-FAST-FORWARD-INPUT GOBACK ELSE IF RETCODE= 3 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 3 - CHECKPOINT " PERFORM 600-SAVE-REC-COUNT GOBACK ELSE IF RETCODE= 5 THEN DISPLAY "INMOD RECEIVED - RETURN CODE 5 - DONE " MOVE 0 TO RETLENGTH MOVE 0 TO RETCODE
GOBACK ELSE DISPLAY "INMOD RECEIVED INVALID RETURN CODE " MOVE 0 TO RETLENGTH MOVE 16 TO RETCODE GOBACK. 100-OPEN-FILES. OPEN INPUT DATA-FILE. MOVE 0 TO RETCODE. 200-READ-INPUT. READ INMOD-DATA-FILE INTO DATA-AREA1 AT END GO TO END-DATA. ADD 1 TO NUMIN. MOVE 80 TO RETLENGTH. MOVE 0 TO RETCODE. ADD 1 TO NUMOUT. END-DATA. CLOSE DATA-FILE. DISPLAY "NUMBER OF INPUT RECORDS = " NUMIN. DISPLAY "NUMBER OF OUTPUT RECORDS = " NUMOUT. MOVE 0 TO RETLENGTH. MOVE 0 TO RETCODE. GOBACK.
What is of an OUTMOD
The FastExport utility is able to write a file that contains a variety of formatted and unformatted data. It can write the data to disk and to tape. This works because these files and devices all support a sequential access method. However, there are times when it is necessary or even advantageous to use some other technique or a special device. When special output processing is desired, an OUTMOD (acronym for OUTput MODule) could be a potential solution. It is a user written routine to do the specialized access to the file system, device or database. The OUTMOD does not replace the utility. Instead, it becomes like a part of the utility. An OUTMOD can be only written to work with FastExport. As an example, an OUTMOD might be written to move the data from Teradata and directly into an RDBMS or test database. Therefore, it must be written to do the following steps: 1. Connect to the RDBMS
2. 3. 4. 5.
Receive a row from the FastExport Send the row to another database as an INSERT Loop back and do steps 2 & 3 until there is no more data When there is no more data, disconnect from the database
As seen in the above diagram, there is an extra step involved with the processing of an OUTMOD. On the other hand, it eliminates the need to create an intermediate file. The data destination can be another RDBMS. However, the user still executes the utility, that portion does not change. The following chart shows the available languages for mainframe and network-attached systems:
Open table as spreadsheet Operating
Programming Language Assembler, COBOL, or SAS/C C (although not supported, MicroFocus COBOL can be used)
Figure 8-1
Writing an OUTMOD
The writing of an OUTMOD is primarily concerned with processing the output data destination. However, it cannot do the processing haphazardly. It must wait for the utility to tell it what and when to perform every operation. It has been previously stated that the OUTMOD receives data from the utility. At the same time, the utility needs to know that it is expecting to receive the data. Therefore, a handshake degree of processing is necessary for the two components (OUTMOD and FastExport) to know what is expected. As well as passing the data, a status code is sent back and forth between them. Just like all processing, we hope for a successful completion. Earlier in this book, it was shown that a zero status code indicates a successful completion. A memory area must be allocated that is shared between the OUTMOD and the utility. The area contains the following elements: 1. The return or status code 2. The sequence number of the SELECT within FastExport 3. The length of the data area in bytes 4. The response row from Teradata 5. The length of the output data record 6. The output data record Cart of the various programming language definitions for the parameters Assembler OUTCODE DS F F F <as-needed> F CL<data-length>
OutLength, OutData) int *OutCode; int *SeqNum; int *OutRecLen; struct tranlog*Outrecord; int *OutLength; char *OutData; COBOL 01 OUTCODE PIC S9(5) COMP.
01 OUTSEQNUM PIC S9(5) COMP. 01 OUTRECLEN PIC S9(5) COMP. 01 OUTRECORD. 03 OUTDATA 01 OUTDATA Open table as spreadsheet Figure 8-3 Return/status codes from FastExport to the OUTMOD
Open table as spreadsheet
Indicates that . . .
Value 1 FastExport is calling the OUTMOD for the first time before sending the SELECT to Teradata. The OUTMOD should open/connect to the data destination and wait for the first record. FastExport is calling after the last record has been sent to the OUTMOD. It should close/disconnect from the data destination. FastExport is calling with the next output record. OUTMOD should write it to the data destination. FastExport has written a checkpoint. The OUTMOD should guarantee that it can handle a restart if needed. Does not receive a record from FastExport. Teradata RDBMS has restarted. The OUTMOD should reposition itself to receive and write the next record when it arrives. FastExport and the OUTMOD failed and have been restarted. The OUTMOD should use the saved record count to reposition in the output data destination to where it left off. Does not receive a record from FastExport.
2 3 4
5 6
Indicates that . . . The OUTMOD successful wrote the output data. The utility failed to write the output data.
0 Not 0
Sample OUTMOD
Below is an example of the PROCEDURE DIVISION commands that might be used for MultiLoad, TPump or FastExport. LINKAGE SECTION. 01 OUTCODE PIC S9(5) COMP. 01 OUTSEQNUM S9(5) COMP. 01 OUTRECLEN PIC S9(5) COMP. 01 OUTRECORD. 05 INDICATORS PIC 9. 05 REGN PIC XXX. 05 PRODUCT PIC X(8). 05 QTY PIC S9(8) COMP. 05 PRICE PIC S9(8) COMP. 01 OUTRECLEN PIC S9(5) COMP. 01 OUTDATA PIC XXXX. PROCEDURE DIVISION USING OUTCODE, STATEMENT-NO, OUTRECLEN, OUTRECORD, OUTRECLEN, OUTDATA. BEGIN. MAIN. IF OUTCODE = 1 THEN OPEN OUTPUT SALES-DROPPED-FILE OPEN OUTPUT BAD-REGN-SALES-FILE
GOBACK. IF OUTCODE = 2 THEN CLOSE SALES-DROPPED-FILE CLOSE BAD-REGN-SALES-FILE GOBACK. IF OUTCODE = 3 THEN PERFORM TYPE-3 GOBACK. IF OUTCODE = 4 THEN GOBACK. IF OUTCODE = 5 THEN CLOSE SALES-DROPPED-FILE OPEN OUTPUT SALES-DROPPED-FILE CLOSE BAD-REGN-SALES-FILE OPEN OUTPUT BAD-REGN-SALES-FILE GOBACK. IF OUTCODE = 6 THEN OPEN OUTPUT SALES-DROPPED-FILE OPEN OUTPUT BAD-REGN-SALES-FILE GOBACK. DISPLAY "Invalid entry code = " OUTCODE. GOBACK. TYPE-3. IF QTY IN OUTRECORD * PRICE IN OUTRECORD < 100 THEN MOVE 0 TO OUTRECLEN WRITE DROPPED-TRANLOG FROM OUTRECORD ELSE PERFORM TEST-NULL-REGN. TEST-NULL-REGN. IF REGN IN OUTRECORD = SPACES MOVE 999 TO REGN IN OUTRECORD WRITE BAD-REGN-OUTRECORD FROM OU
Overview
As seen in the many of the Teradata Utilities, the introduction of the capabilities of the Support Environment (SE) is a valuable asset. It is an inherit part of the utilities and acts as a front-end to FastExport, MultiLoad, and TPump. The purpose of the SE is to provide a feature rich scripting tool. As the newer load and extract functionalities were being proposed for use with the Teradata RDBMS, it became obvious that certain capabilities were going to be needed by all the utilities. Rather than writing these capabilities over and over again into multiple programs, it was written once into a single module/environment called the SE. This environment/module is included with the newer utilities.
Functionality Read an input record that provides one or more parameter values for variables Invoke one of the utilities Define the acceptable or desired format for a date in this execution as either (YY/MM/DD) or (YYYY-MM-DD) Write an output message to a specified file Exit the utility Define the scope of a .IF command, allows multiple operations based on a conditional comparison Optionally, perform an operation when a condition is not true Compare variables and values to conditionally perform one or more operations Specify the restart log Establish a Teradata session Terminate a Teradata session Write output to a specified file Read and run commands stored in an external script file Establish or change a value stored in a variable Allows for the execution of a command at the computers operating system level from within the script
.ACCEPT .BEGIN .DATEFORM .DISPLAY .END .ENDIF .ELSE .IF .LOGTABLE .LOGON .LOGOFF .ROUTE .RUN .SET .SYSTEM
Figure 9-1
The SE allows the writer of the script to perform housekeeping chores prior to calling the desired utility with a .BEGIN. At a minimum, these chores include the specification of the restart log table and logging onto Teradata. Yet, it brings to the party the ability to perform any Data Definition Language (DDL) and Data Control Language (DCL) command available to the user as defined in the Data Dictionary. In addition, all Data Manipulation Language (DML) commands except a SELECT are allowed within the SE.
Once a session is established, based on privileges, the user can perform any of the following: DDL DCL Any DML (with the exception of SELECT) Establish system variables Accept parameter values from a file Perform dynamic substitution of values including object names
Beginning a Utility
Once the script has connected to Teradata and established all needed environmental conditions, it is time to run the desired utility. This is accomplished using the .BEGIN command. Beyond running the utility, it is used to define most of the options used within the execution of the utility. As an example, setting the number of sessions is requested here. See each of the individual utilities for the names, usage and any recommendations for the options specific to it. The syntax for writing a .BEGIN command: .BEGIN <utility-task> [ <utility-options> ] ; The utility task is defined as one of the following: FastExport MultiLoad to load or modify rows MultiLoad to delete rows TPump Open table as spreadsheet Figure 9-2 .BEGIN EXPORT .BEGIN [ IMPORT ] MLOAD .BEGIN DELETE MLOAD .BEGIN LOAD
Ending a Utility
Once the utility finishes its task, it needs to be ended. To request the termination, use the .END command. The syntax for writing a .END command: .END <utility-task> ;
When the utility ends, control is returned to the SE. It can then check the return code (see Figure 9-4) status and verify that the utility finished the task successfully. Based on the status value in the return code, the SE can be used to determine what processing should occur next.
Optionally, the user may request a specific return code be sent to the host computer that was used to start the utility. The script might be executed from the job control language (JCL) on a mainframe, the shell script for a UNIX system, or bat file on DOS. This value can then be checked by that system to determine conditional processing as a result of the completion code specified.
To not use the integer data, the above .ACCEPT would be written as: .ACCEPT char_parm, dec_num_parm FILE parm-record IGNORE 39 THRU 42; Note: if the system is a mainframe, the FILE is used to name the DD statement in the Job Control Language (JCL). For example, for the above .ACCEPT, the following JCL would be required: //PARM-RECORD DD DSN=<pds-member-name>, DISP=(old, keep)
[ THEN ] <operation-to-perform> [ ,<operation-to-perform> ] [ ELSE { {<variable-name> | <literal>} <comparison> {<literal> | <variable-name>} |<operation-to-perform> [ ,<operation-to-perform> ] } .ENDIF ; The comparison symbols are normally one of the following:
Open table as spreadsheet Equal
= Figure 9-3
Routing Messages
The .ROUTE command is used to write messages to an output file. This is normally system information generated by the SE during the execution of a utility. The default file is SYSPRINT on a mainframe and STDOUT on other platforms. The syntax for writing a .ROUTE command: .ROUTE <messages> [ TO ] FILE <file-name> [ [WITH] ECHO { OFF | [ TO ] FILE <file-name> ] ; Note: If the system is a mainframe, the FILE is used to name the DD statement in the JCL. The JCL must also contain any names, space requirements, record and block size, or disposition information needed by the system.
The syntax for writing a .SET command: .SET <variable-name> [ TO ] <expression> ; Note: The expression can be a literal value based on the data type of the variable or a mathematical operation for numeric data. The math can use one or more variables and one or more literals.
&SYSUPDCNT[n]
&SYSDELCNT[n]
number to identify the table from the TABLES portion of the .BEGIN in MultiLoad Open table as spreadsheet Figure 9-4
//* //* //* //*----------------------------------------------------------------------------------------------------//* JOB INFORMATION AND COMMENTS //*-------------------------------------------------------------//JOBLIB DD DSN=C309.B0SNCR.NM.R60.APPLOAD,DISP=SHR //DD DSN=C309.B0SNCR.NM.R60.TRLOAD,DISP=SHR //*----------------------------------------------------------------------------------------------------//BTEQ1 EXEC PGM=BTQMAIN //LOGON DD DSN=B09XXZ.APPLUTIL.CLASS.JCL(ILOGON),DISP=SHR // IDBENV DD DSN=B09XXZ.APPLUTIL.CLASS.JCL(IDBENV),DISP=SHR // SYSIN DD // SYSPRINT DD SYSOUT=* DSN=B09XXZ.APPLUTIL.CLASS.JCL(BTEQSCPT),DISP=SHR +JBS BIND TDP0.UP
/*----------------------------------------------------------------*/ /*------------------ PROGRAM DESCRIPTION -------------------------*/ /*----------------------------------------------------------------*/ /* PURPOSE & FLOW: /* PARM - NONE /* ABEND CODES: */ */ */ */ /* SPECIAL OR UNUSUAL LOGIC:
/* XXXX -
*/
/*----------------------------------------------------------------*/ .SESSIONS 1 .RUN FILE=ILOGON; */ .RUN FILE=IDBENV; SQL_CLASS; */ .EXPORT DATA DDNAME=REPORT SELECT EMPLOYEE_NO, LAST_NAME, FIRST_NAME, SALARY, DEPT_NO FROM EMPLOYEE_TABLE .IF ERRORCODE > 0 THEN .GOTO Done .EXPORT RESET .LABEL Done .QUIT BTEQ MAINFRAME IMPORT EXAMPLE /*JCL IDBENV DATABASE /*JCL ILOGON - .LOGON CDW/SQL01,WHYNOT;
// //
//*-----------------------------------------------------------------------------------------------------// // // // // // BTEQ1 LOGON IDBENV SYSIN EXEC PGM=BTQMAIN DD DD DD DSN=B09XXZ.APPLUTIL.CLASS.JCL(ILOGON),DISP=SHR DSN=B09XXZ.APPLUTIL.CLASS.JCL(IDBENV),DISP=SHR DSN=B09XXZ.APPLUTIL.CLASS.JCL(BTEQSCPT),DISP=SHR
SYSPRINT DD SYSOUT=*
/*--------------------------------------------------------------------*/ /*------------------ PROGRAM DESCRIPTION -----------------------------*/ /*--------------------------------------------------------------------*/ /* PURPOSE & FLOW: */ /* SPECIAL OR UNUSUAL LOGIC: */ /* PARM - NONE */ /* ABEND CODES: */ /* XXXX */ /*--------------------------------------------------------------------*/
.SESSIONS 1 .RUN FILE=ILOGON; .RUN FILE=IDBENV; /*JCL ILOGON - .LOGON CDW/SQL01,WHYNOT; */ /*JCL IDBENV DATABASE SQL08; */
.IMPORT DATA DDNAME=REPORT .QUIET ON .REPEAT * USING EMPLOYEE_NO LAST_NAME FIRST_NAME SALARY DEPT_NO (INTEGER), (CHAR(20)), (VARCHAR(12)), (DECIMAL(8,2)), (SMALLINT)
INSERT INTO EMPLOYEE_TABLE VALUES (:EMPLOYEE_NO, :LAST_NAME , :FIRST_NAME , :SALARY , :DEPT_NO); .QUIT
// // // // // // // //
OUTDATA DD DSN=B09XXZ.OUTPUT_DATASET_NAME DISP=(NEW,CATLG,DELETE), UNIT=SYSDA,SPACE=(CYL,(1,1),RLSE), DCB=(RECFM=FB,LRECL=80,BLKSIZE=0) SYSPRINT DD SYSOUT=* SYSABEND DD SYSOUT=* SYSTERM DD SYSOUT=* SYSDEBUG DD DUMMY
/*--------------------------------------------------------------------*/ /*------------------ PROGRAM DESCRIPTION -----------------------------*/ /*--------------------------------------------------------------------*/ /* /* */ /* */ /* */ /* */ PURPOSE & FLOW: */ SPECIAL OR UNUSUAL LOGIC: PARM - NONE ABEND CODES: XXXX -
/*--------------- PROGRAM MODIFICATION -------------------------------*/ /* MAINTENANCE LOG - ADD LATEST CHANGE TO THE TOP*/ /* MOD-DATE */ AUTHOR MOD DESCRIPTION
/*--------------------------------------------------------------------*/
.LOGTABLE SQL08.SQL08_RESTART_LOG; .RUN FILE ILOGON; */ .RUN FILE IDBENV; SQL_CLASS; */ .BEGIN EXPORT SESSIONS 1; .EXPORT OUTFILE OUTDATA MODE RECORD FORMAT TEXT; SELECT STUDENT_ID LAST_NAME FIRST_NAME CLASS_CODE GRADE_PT (CHAR(11)), (CHAR(20)), (CHAR(14)), (CHAR(2)), (CHAR(7)) /*JCL IDBENV DATABASE /*JCL ILOGON - .LOGON CDW/SQL01,WHYNOT;
//*--------------------------------------------------------------
//* FAST LOAD SCRIPT FILE // // // // SYSIN DD DSN=B09XXZ.APPLUTIL.CLASS.JCL(FLODSCPT),DISP=SHR SYSPRINT DD SYSOUT=* SYSUDUMP DD SYSOUT=* SYSTERM DD SYSOUT=*
/*------------------------------------------------------------------*/ /*------------------ PROGRAM DESCRIPTION ---------------------------*/ /*------------------------------------------------------------------*/ /* /* /* /* /* PURPOSE & FLOW: PARM - NONE ABEND CODES: XXXX */ */ */ */ */ SPECIAL OR UNUSUAL LOGIC:
/*------------------------------------------------------------------*/ /*--------------- PROGRAM MODIFICATION -----------------------------*/ /*------------------------------------------------------------------*/ /*MAINTENANCE LOG-ADD LATEST CHANGE TO THE TOP /* MOD-DATE AUTHOR MOD DESCRIPTION */ */
/*------------------------------------------------------------------*/ .SESSIONS 1; LOGON TDP0/SQL08,SQL08; DROP TABLE SQL08.ERROR_ET; DROP TABLE SQL08.ERROR_UV; DELETE FROM SQL08.EMPLOYEE_PROFILE; DEFINE EMPLOYEE_NO (INTEGER),
BEGIN LOADING SQL08.EMPLOYEE_PROFILE ERRORFILES SQL08.ERROR_ET, SQL08.ERROR_UV CHECKPOINT 5; INSERT INTO SQL08.EMPLOYEE_PROFILE VALUES (:EMPLOYEE_NO, :LAST_NAME, :FIRST_NAME, :SALARY, :DEPT_NO); END LOADING; LOGOFF;
SPECIFY THE MULTILOAD INPUT DATA FILE (LOAD FILE) INPTFILE DD DSN=XXXXXX.YYYYYYY.INPUT.FILENAME,DISP=SHR SYSPRINT DD SYSOUT=* SYSABEND DD SYSOUT=* SYSTERM DD SYSOUT=* SYSDEBUG DD DUMMY SPECIFY THE MULTILOAD SCRIPT TO EXECUTE
//
SYSIN
DD
DSN=B09XXZ.APPLUTIL.CLASS.JCL(MLODSCPT),DISP=SHR
/*-------------------------------------------------------------------*/ /*------------------ PROGRAM DESCRIPTION ----------------------------*/ /*-------------------------------------------------------------------*/ /* /* /* /* /* PURPOSE & FLOW: PARM - NONE ABEND CODES: XXXX */ */ */ */ */ SPECIAL OR UNUSUAL LOGIC:
/*--------------- PROGRAM MODIFICATION ------------------------------*/ /*-------------------------------------------------------------------*/ /*MAINTENANCE LOG-ADDED CHANGE TO THE TOP /* MOD-DATE AUTHOR MOD DESCRIPTION */ */
/*-------------------------------------------------------------------*/ .LOGTABLE SQL08.UTIL_RESART_LOG; .RUN FILE ILOGON; */ .RUN FILE IDBENV; .BEGIN MLOAD TABLES Student_Profile1 ERRLIMIT 1 SESSIONS 1; .LAYOUT INPUT_FILE; .FIELD STUDENT_ID .FIELD LAST_NAME .FIELD FIRST_NAME .FIELD CLASS_CODE .FIELD GRADE_PT 1 CHAR(11) * CHAR(20) * CHAR(14) * CHAR(2) ; ; ; ; /*JCL IDBENV DATABASE SQL08; */ /*JCL ILOGON - .LOGON CDW/SQL01,WHYNOT;
* CHAR(7) ;
.FIELD FILLER
* CHAR(26) ;
.DML LABEL INPUT_INSERT; INSERT INTO Student_Profile1 VALUES (:STUDENT_ID :LAST_NAME :FIRST_NAME :CLASS_CODE :GRADE_PT ); .IMPORT INFILE INPTFILE LAYOUT INPUT_FILE APPLY INPUT_INSERT; .END MLOAD; .LOGOFF; (INTEGER), (CHAR(20)), (VARCHAR(12)), (CHAR(2)), (DECIMAL(5,2))