Sunteți pe pagina 1din 35

Teradata Utilities: MultiLoad

Reprinted for KV Satish Kumar, IBM kvskumar@in.ibm.com Reprinted with permission as a subscription benefit of Books24x7, http://www.books24x7.com/

Table of Contents
Chapter 4: Multiload........................................................................................................................1 Why it is Called "Multi" Load..................................................................................................1 Two MultiLoad Modes: IMPORT and DELETE................................................................1 Block and Tackle Approach.............................................................................................2 MultiLoad Imposes Limits .................................................................................................3 Error Tables, Work Tables and Log Tables...........................................................................3 Supported Input Formats.......................................................................................................4 MultiLoad Has Five IMPORT Phases....................................................................................5 Phase 1: Preliminary Phase .............................................................................................5 Phase 2: DML Transaction Phase...................................................................................6 Phase 3: Acquisition Phase.............................................................................................6 Phase 4: Application Phase.............................................................................................7 Phase 5: Clean Up Phase ................................................................................................7 MultiLoad Commands............................................................................................................8 Two Types of Commands................................................................................................8 Parameters for .BEGIN IMPORT MLOAD.............................................................................9 Parameters for .BEGIN DELETE MLOAD...........................................................................12 A Simple Multiload IMPORT Script......................................................................................12 Building our Multiload Script................................................................................................13 Executing Multiload..............................................................................................................14 Another Simple MultiLoad IMPORT Script ..........................................................................15 . MultiLoad IMPORT Script....................................................................................................18 Error Treatment Options for the .DML LABEL Command ....................................................19 An IMPORT Script with Error Treatment Options................................................................21 A IMPORT Script that Uses Two Input Data Files...............................................................22 Redefining the INPUT..........................................................................................................24 A Script that Uses Redefining the Input...............................................................................24 DELETE MLOAD Script Using a Hard Coded Value...........................................................26 A DELETE MLOAD Script Using a Variable........................................................................27 An UPSERT Sample Script.................................................................................................28 . What Happens when MultiLoad Finishes .............................................................................29 MultiLoad Statistics........................................................................................................29 Troubleshooting Multiload Errors.........................................................................................30 RESTARTing Multiload........................................................................................................31 RELEASE MLOAD: When You DON'T Want to Restart MultiLoad.....................................31 MultiLoad and INMODs ........................................................................................................32 How Multiload Compares with FastLoad.............................................................................32

Chapter 4: Multiload
"In the end we'll remember not the sound of our enemies, but the silence of our friends." - Martin Luther King Jr.

Why it is Called "Multi" Load


If we were going to be stranded on an island with a Teradata Data Warehouse and we could only take along one Teradata load utility, clearly, MultiLoad would be our choice. MultiLoad has the capability to load multiple tables at one time from either a LAN or Channel environment. This is in stark contrast to its fleet-footed cousin, FastLoad, which can only load one table at a time. And it gets better, yet! This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE, DELETE and UPSERT on up to five (5) empty or populated target tables at a time. These DML functions may be run either solo or in combinations, against one or more tables. For these reasons, MultiLoad is the utility of choice when it comes to loading populated tables in the batch environment. As the volume of data being loaded or updated in a single block, the performance of MultiLoad improves. MultiLoad shines when it can impact more than one row in every data block. In other words, MultiLoad looks at massive amounts of data and says, "Bring it on!" Leo Tolstoy once said, "All happy families resemble each other." Like happy families, the Teradata load utilities resemble each other, although they may have some differences. You are going to be pleased to find that you do not have to learn all new commands and concepts for each load utility. MultiLoad has many similarities to FastLoad. It has even more commands in common with TPump. The similarities will be evident as you work with them. Where there are some quirky differences, we will point them out for you.

Two MultiLoad Modes: IMPORT and DELETE


MultiLoad provides two types of operations via modes: IMPORT and DELETE. In MultiLoad IMPORT mode, you have the freedom to "mix and match" up to twenty (20) INSERTs, UPDATEs or DELETEs on up to five target tables. The execution of the DML statements is not mandatory for all rows in a table. Instead, their execution hinges upon the conditions contained in the APPLY clause of the script. Once again, MultiLoad demonstrates its user-friendly flexibility. For UPDATEs or DELETEs to be successful in IMPORT mode, they must reference the Primary Index in the WHERE clause. The MultiLoad DELETE mode is used to perform a global (all AMP) delete on just one table. The reason to use .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can be RESTARTed if an error causes it to terminate prior to finishing. When performing in DELETE mode, the DELETE SQL statement cannot reference the Primary Index in the WHERE clause. This due to the fact that a primary index access is to a specific AMP; this is a global operation. The other factor that makes a DELETE mode operation so good is that it examines an entire block of rows at a time. Once all the eligible rows have been removed, the block is written one time and a checkpoint is written. So, if a restart is necessary, it simply starts deleting rows from the next block without a checkpoint. This is a smart way to continue. Remember, when using the TJ all deleted rows are put back into the table from the TJ as a rollback. A rollback can take longer to finish then the delete. MultiLoad does not do a rollback; it does a restart.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limited to four months, monthly data is rotated in and out. At the end of every month, the oldest month of data is removed and the new month is added. The cycle is "add a month, delete a month, add a month, delete a month." In our illustration, that means that January data must be deleted to make room for May's data. Here is a question for you: What if there was another way to accomplish this same goal without consuming all of these extra resources? To illustrate, let's consider the following scenario: Suppose you have TableA that contains 12 billion rows. You want to delete a range of rows based on a date and then load in fresh data to replace these rows. Normally, the process is to perform a MultiLoad DELETE to DELETE FROM TableA WHERE <date-column> < '2002-02-01'. The final step would be to INSERT the new rows for May using MultiLoad IMPORT.

Block and Tackle Approach


MultiLoad never loses sight of the fact that it is designed for functionality, speed, and the ability to restart. It tackles the proverbial I/O bottleneck problem like FastLoad by assembling data rows into 64K blocks and writing them to disk on the AMPs. This is much faster than writing data one row at a time like BTEQ. Fallback table rows are written after the base table has been loaded. This allows users to access the base table immediately upon completion of the MultiLoad while fallback rows are being loaded in the background. The benefit is reduced time to access the data. Amazingly, MultiLoad has full RESTART capability in all of its five phases of operation. Once again, this demonstrates its tremendous flexibility as a load utility. Is it pure magic? No, but it almost seems so. MultiLoad makes effective use of two error tables to save different types of errors and a LOGTABLE that stores built-in checkpoint information for restarting. This is why MultiLoad does not use the Transient Journal, thus averting time-consuming rollbacks when a job halts prematurely. Here is a key difference to note between MultiLoad and FastLoad. Sometimes an AMP (Access Module Processor) fails and the system administrators say that the AMP is "down" or "offline." When using FastLoad, you must restart the AMP to restart the job. MultiLoad, however, can continue running when an AMP fails, if the table is fallback protected. As the same time, you can use the AMPCHECK option to make it work like FastLoad if you want.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

MultiLoad Imposes Limits


Rule #1: Unique Secondary Indexes are not supported on a Target Table. Like FastLoad, MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support the use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on the same AMP as the data row. MultiLoad uses every AMP independently and in parallel. If two AMPs must communicate, they are not independent. Therefore, a NUSI (same AMP) is fine, but a USI (different AMP) is not. Rule #2: Referential Integrity is not supported. MultiLoad will not load data into tables that are defined with Referential Integrity (RI). Like a USI, this requires the AMPs to communicate with each other. So, RI constraints must be dropped from the target table prior to using MultiLoad. Rule #3: Triggers are not supported at load time. Triggers cause actions on related tables based upon what happens in a target table. Again, this is a multi-AMP operation and to a different table. To keep MultiLoad running smoothly, disable all Triggers prior to using it. Rule #4: No concatenation of input files is allowed. MultiLoad does not want you to do this because it could impact are restart if the files were concatenated in a different sequence or data was deleted between runs. Rule #5: The host will not process aggregates, arithmetic functions or exponentiation. If you need data conversions or math, you might be better off using an INMOD to prepare the data prior to loading it.

Error Tables, Work Tables and Log Tables


Besides target table(s), MultiLoad requires the use of four special tables in order to function. They consist of two error tables (per target table), one worktable (per target table), and one log table. In essence, the Error Tables will be used to store any conversion, constraint or uniqueness violations during a load. Work Tables are used to receive and sort data and SQL on each AMP prior to storing them permanently to disk. A Log Table (also called, "Logtable") is used to store successful checkpoints during load processing in case a RESTART is needed. HINT: Sometimes a company wants all of these load support tables to be housed in a particular database. When these tables are to be stored in any database other than the user's own default database, then you must give them a qualified name (<databasename>.<tablename>) in the script or use the DATABASE command to change the current database. Where will you find these tables in the load script? The Logtable is generally identified immediately prior to the .LOGON command. Worktables and error tables can be named in the BEGIN MLOAD statement. Do not underestimate the value of these tables. They are vital to the operation of MultiLoad. Without them a MultiLoad job can not run. Now that you have had the "executive summary", let's look at each type of table individually. Two Error Tables: Here is another place where FastLoad and MultiLoad are similar. Both require the use of two error tables per target table. MultiLoad will automatically create these tables. Rows are inserted into these tables only when errors occur during the load process. The first error table is the acquisition Error Table (ET). It contains all translation and constraint errors that may occur while the data is being acquired from the source(s). The second is the Uniqueness Violation (UV) table that stores rows with duplicate values for Unique Primary Indexes (UPI). Since a UPI must be unique, MultiLoad can only load one occurrence into a table. Any duplicate value will be stored in the UV error table. For example, you might see a UPI error that shows a second employee number "99." In this case, if the name for employee "99" is Kara Morgan, you will be glad that the row did not load since Kara Morgan is already in the Employee table. However, if the name showed up as David Jackson, then you know that further investigation is needed, because employee numbers must be unique.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

Each error table does the following: Identifies errors Provides some detail about the errors Stores the actual offending row for debugging

You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if you do not name them, they default to ET_<target_table_name> and UV_<target_table_name>. In either case, MultiLoad will not accept error table names that are the same as target table names. It does not matter what you name them. It is recommended that you standardize on the naming convention to make it easier for everyone on your team. For more details on how these error tables can help you, see the subsection in this chapter titled, "Troubleshooting MultiLoad Errors." Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART. There is one LOGTABLE for each run. Since MultiLoad will not resubmit a command that has been run previously, it will use the LOGTABLE to determine the last successfully completed step. Work Table(s): MultiLoad will automatically create one worktable for each target table. This means that in IMPORT mode you could have one or more worktables. In the DELETE mode, you will only have one worktable since that mode only works on one target table. The purpose of worktables is to hold two things: 1. The Data Manipulation Language (DML) tasks 2. The input data that is ready to APPLY to the AMPs

The worktables are created in a database using PERM space. They can become very large. If the script uses multiple SQL statements for a single data record, the data is sent to the AMP once for each SQL statement. This replication guarantees fast performance and that no SQL statement will ever be done more than once. So, this is very important. However, there is no such thing as a free lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this disk space by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad run is in your hands.

Supported Input Formats


Data input files come in a variety of formats but MultiLoad is flexible enough to handle many of them. MultiLoad supports the following five format options: BINARY, FASTLOAD, TEXT, UNFORMAT and VARTEXT.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

BINARY

Each record is a 2-byte integer, n, that is followed by n bytes of data. A byte is the smallest means of storage of for Teradata. FASTLOAD This format is the same as Binary, plus a marker (X '0A' or X '0D') that specifies the end of the record. TEXT Each record has a random number of bytes and is followed by an end of the record marker. UNFORMAT The format for these input records is defined in the LAYOUT statement of the MultiLoad script using the components FIELD, FILLER and TABLE. VARTEXT This is variable length text RECORD format separated by delimiters such as a comma. For this format you may only use VARCHAR, LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad LAYOUT. Note that two delimiter characters in a row will result in a null value between them. Figure 5-1

MultiLoad Has Five IMPORT Phases


MultiLoad IMPORT has five phases, but don't be fazed by this! Here is the short list: Phase 1: Preliminary Phase Phase 2: DML Transaction Phase Phase 3: Acquisition Phase Phase 4: Application Phase Phase 5: Cleanup Phase

Let's take a look at each phase and see what it contributes to the overall load process of this magnificent utility. Should you memorize every detail about each phase? Probably not. But it is important to know the essence of each phase because sometimes a load fails. When it does, you need to know in which phase it broke down since the method for fixing the error to RESTART may vary depending on the phase. And if you can picture what MultiLoad actually does in each phase, you will likely write better scripts that run more efficiently.

Phase 1: Preliminary Phase


The ancient oriental proverb says, "Measure one thousand times; Cut once." MultiLoad uses Phase 1 to conduct several preliminary set-up activities whose goal is to provide a smooth and successful climate for running your load. The first task is to be sure that the SQL syntax and MultiLoad commands are valid. After all, why try to run a script when the system will just find out during the load process that the statements are not useable? MultiLoad knows that it is much better to identify any syntax errors, right up front. All the preliminary steps are automated. No user intervention is required in this phase. Second, all MultiLoad sessions with Teradata need to be established. The default is the number of available AMPs. Teradata will quickly establish this number as a factor of 16 for the basis regarding the number of sessions to create. The general rule of thumb for the number of sessions to use for smaller systems is the following: use the number of AMPs plus two more. For larger systems with hundreds of AMP processors, the SESSIONS option is available to lower the default. Remember,
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

these sessions are running on your poor little computer as well as on Teradata. Each session loads the data to Teradata across the network or channel. Every AMP plays an essential role in the MultiLoad process. They receive the data blocks, hash each row and send the rows to the correct AMP. When the rows come to an AMP, it stores them in worktable blocks on disk. But, lest we get ahead of ourselves, suffice it to say that there is ample reason for multiple sessions to be established. What about the extra two sessions? Well, the first one is a control session to handle the SQL and logging. The second is a back up or alternate for logging. You may have to use some trial and error to find what works best on your system configuration. If you specify too few sessions it may impair performance and increase the time it takes to complete load jobs. On the other hand, too many sessions will reduce the resources available for other important database activities. Third, the required support tables are created. They are the following: Type of Table Table Details ERRORTABLES MultiLoad requires two error tables per target table. The first error table contains constraint violations, while the second error table stores Unique Primary Index violations. WORKTABLES Work Tables hold two things: the DML tasks requested and the input data that is ready to APPLY to the AMPs. LOGTABLE The LOGTABLE keeps a record of the results from each phase of the load so that MultiLoad knows the proper point from which to RESTART.

Figure 5-2 The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access locks are placed on all target tables, allowing other users to read or write to the table for the time being. However, this lock does prevent the opportunity for a user to request an exclusive lock. Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP or ALTER a target table while it is locked for loading. This leads us to Phase 2.

Phase 2: DML Transaction Phase


In Phase 2, all of the SQL Data Manipulation Language (DML) statements are sent ahead to Teradata. MultiLoad allows the use of multiple DML functions. Teradata's Parsing Engine (PE) parses the DML and generates a step-by-step plan to execute the request. This execution plan is then communicated to each AMP and stored in the appropriate worktable for each target table. In other words, each AMP is going to work off the same page. Later, during the Acquisition phase the actual input data will also be stored in the worktable so that it may be applied in Phase 4, the Application Phase. Next, a match tag is assigned to each DML request that will match it with the appropriate rows of input data. The match tags will not actually be used until the data has already been acquired and is about to be applied to the worktable. This is somewhat like a student who receives a letter from the university in the summer that lists his courses, professor's names, and classroom locations for the upcoming semester. The letter is a "match tag" for the student to his school schedule, although it will not be used for several months. This matching tag for SQL and data is the reason that the data is replicated for each SQL statement using the same data record.

Phase 3: Acquisition Phase


With the proper set-up complete and the PE's plan stored on each AMP, MultiLoad is now ready to receive the INPUT data. This is where it gets interesting! MultiLoad now acquires the data in large, unsorted 64K blocks from the host and sends it to the AMPs.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

At this point, Teradata does not care about which AMP receives the data block. The blocks are simply sent, one after the other, to the next AMP in line. For their part, each AMP begins to deal with the blocks that they have been dealt. It is like a game of cards - you take the cards that you have received and then play the game. You want to keep some and give some away. Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMP hashes each row on the primary index and sends it over the BYNET to the proper AMP where it will ultimately be used. But the row does not get inserted into its target table, just yet. The receiving AMP must first do some preparation before that happens. Don't you have to get ready before company arrives at your house? The AMP puts all of the hashed rows it has received from other AMPs into the worktables where it assembles them into the SQL. Why? Because once the rows are reblocked, they can be sorted into the proper order for storage in the target table. Now the utility places a load lock on each target table in preparation for the Application Phase. Of course, there is no Acquisition Phase when you perform a MultiLoad DELETE task, since no data is being acquired.

Phase 4: Application Phase


The purpose of this phase is to write, or APPLY, the specified changes to both the target tables and NUSI subtables. Once the data is on the AMPs, it is married up to the SQL for execution. To accomplish this substitution of data into SQL, when sending the data, the host has already attached some sequence information and five (5) match tags to each data row. Those match tags are used to join the data with the proper SQL statement based on the SQL statement within a DMP label. In addition to associating each row with the correct DML statement, match tags also guarantee that no row will be updated more than once, even when a RESTART occurs. The following five columns are the matching tags: MATCHING TAGS ImportSeq Sequence number that identifies the IMPORT command where the error occurred DMLSeq Sequence number for the DML statement involved with the error SMTSeq Sequence number of the DML statement being carried out when the error was discovered ApplySeq Sequence number that tells which APPLY clause was running when the error occurred SourceSeq The number of the data row in the client file that was being built when the error took place

Figure 5-3 Remember, MultiLoad allows for the existence of NUSI processing during a load. Every hash-sequence sorted block from Phase 3 and each block of the base table is read only once to reduce I/O operations to gain speed. Then, all matching rows in the base block are inserted, updated or deleted before the entire block is written back to disk, one time. This is why the match tags are so important. Changes are made based upon corresponding data and DML (SQL) based on the match tags. They guarantee that the correct operation is performed for the rows and blocks with no duplicate operations, a block at a time. And each time a table block is written to disk successfully, a record is inserted into the LOGTABLE. This permits MultiLoad to avoid starting again from the very beginning if a RESTART is needed. What happens when several tables are being updated simultaneously? In this case, all of the updates are scripted as a multi-statement request. That means that Teradata views them as a single transaction. If there is a failure at any point of the load process, MultiLoad will merely need to be RESTARTed from the point where it failed. No rollback is required. Any errors will be written to the proper error table.

Phase 5: Clean Up Phase


Those of you reading these paragraphs that have young children or teenagers will certainly appreciate this final phase! MultiLoad actually cleans up after itself. The utility looks at the final Error
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

Code (&SYSRC). MultiLoad believes the adage, "All is well that ends well." If the last error code is zero (0), all of the job steps have ended successfully (i.e., all has certainly ended well). This being the case, all empty error tables, worktables and the log table are dropped. All locks, both Teradata and MultiLoad, are released. The statistics for the job are generated for output (SYSPRINT) and the system count variables are set. After this, each MultiLoad session is logged off. So what happens if the final error code is not zero? Stay tuned. Restarting MultiLoad is a topic that will be covered later in this chapter.

MultiLoad Commands
Two Types of Commands
You may see two types of commands in MultiLoad scripts: tasks and support functions. MultiLoad tasks are commands that are used by the MultiLoad utility for specific individual steps as it processes a load. Support functions are those commands that involve the Teradata utility Support Environment (covered in Chapter 9), are used to set parameters, or are helpful for monitoring a load. The chart below lists the key commands, their type, and what they do. MLOAD Command Type What does the MLOAD Command do? Support This command communicates directly with Teradata .BEGIN to specify if the MultiLoad mode is going to be [IMPORT] IMPORT or DELETE. Note that the word IMPORT MLOAD is optional in the syntax because it is the DEFAULT, but DELETE is required. We .BEGIN recommend using the word IMPORT to make the DELETE coding consistent and easier for others to read. Any MLOAD parameters for the load, such as error limits or checkpoints will be included under the .BEGIN command, too. It is important to know which commands or parameters are optional ince, if you do not include them, MultiLoad may supply defaults that may impact your load. Task The DML LABEL defines treatment options and .DML LABEL labels for the application (APPLY) of data for the INSERT, UPDATE, UPSERT and DELETE operations. A LABEL is simply a name for a requested SQL activity. The LABEL is defined first, and then referenced later in the APPLY clause. Task This instructs MultiLoad to finish the APPLY .END MLOAD operations with the changes to the designated databases and tables. Task This defines a column of the data source record that .FIELD will be sent to the Teradata database via SQL. When writing the script, you must include a FIELD for each data field you need in SQL. This command is used with the LAYOUT command. Task Do not assume that MultiLoad has somehow .FILLER uncovered much of what you used in your term papers at the university! FILLER defines a field that is accounted for as part of the data source's row format, but is not sent to the Teradata DBS. It is used with the LAYOUT command. Task LAYOUT defines the format of the INPUT DATA .LAYOUT record so Teradata knows what to expect. If one record is not large enough, you can concatenate
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

Support
.LOGON

Support
.LOGTABLE

Support
.LOGOFF

Task
.IMPORT

Support
.SET

multiple data records by using the LAYOUT parameter CONTINUEIF to tell which value to perform for the concatenation. Another option is INDICATORS, which is used to represent nulls by using the bitmap (1 bit per field) at the front of the data record. This specifies the username or LOGON string that will establish sessions for MultiLoad with Teradata. This support command names the name of the Restart Log that will be used for storing CHECKPOINT data pertaining to a load. The LOGTABLE is then used to tell MultiLoad where to RESTART, should that be necessary. It is recommended that this command be placed before the .LOGON command. This command terminates any sessions established by the LOGON command. This command defines the INPUT DATA FILE, file type, file usage, the LAYOUT to use and where to APPLY the data to SQL. Optionally, you can SET utility variables. An example would be {.SET DBName TO 'CDW_Test'}. This interrupts the operation of MultiLoad in order to issue commands to the local operating system. This is a command that may be used with the .LAYOUT command. It identifies a table whose columns (both their order and data types) are to be used as the field names and data descriptions of the data source records.

Support
.SYSTEM

Task
.TABLE

Figure 5-4

Parameters for .BEGIN IMPORT MLOAD


Here is a list of components or parameters that may be used in the .BEGIN IMPORT command. Note: The parameters do not require the usual dot prior to the command since they are actually sub-commands.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

10

PARAMETER
AMPCHECK {NONE|APPLY|ALL}

REQUIRED OR NOT WHAT IT DOES Optional NONE specifies that MLOAD starts even with one down AMP per cluster if all tables are Fallback. APPLY (DEFAULT) specifies MLOAD will not start or finish Phase 4 with a down AMP. ALL specifies not to proceed if any AMPs are down, just like FastLoad. Short for Access Module, this command specifies input protocol like OLE-DB or reading a tape from REEL Librarian. This parameter is for network-attached systems only. When used, it must precede the DEFINE command in the script. You have two options: CHECKPOINT refers to the number of minutes, or frequency, at which you wish a CHECKPOINT to occur if the number is 60 or less. If the number is greater than 60, it designates the number of rows at which you want the CHECKPOINT to occur. This command is NOT valid in DELETE mode. You may specify the maximum number of errors, or the percentage, that you will tolerate during the

Optional
AXSMOD

Optional
CHECKPOINT

Optional
ERRLIMIT errcount [errpercent]

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

11

Optional
ERRORTABLES ET_ERR UV_ERR

Optional
NOTIFY {LOW|MEDIUM|HIGH|OFF

processing of a load job. Names the two error tables, two per target table. Note there is no comma separator. If you opt to use NOTIFY for a any event during a load, you may designate the priority of that notification: LOW for level events, MEDIUM for important events, HIGH for events at operational decision points, and OFF to eliminate any notification at all for a given phase. This refers to the number of SESSIONS that should be established with Teradata. For MultiLoad, the optimal number of sessions is the number of AMPs in the system, plus two more. You can also use MAX or MIN, which automatically use the maximum or minimum number of sessions to complete the job. If you pecify nothing, it will default to MAX. Tells MultiLoad how frequently, in minutes, to try logging on to the system. Names up to 5 target tables. Tells MultiLoad how many hours to try logging on

Optional
SESSIONS <MAX> <MIN>

Optional
SLEEP

Required
TABLES Tablename1, Tablename2, Tablename5

Optional
TENACITY

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

12

Optional
WORKTABLES Tablename1, Tablename2, Tablename5

when its initial effort to do so is rebuffed. Names the worktable(s), one per target table.

Figure 5-5

Parameters for .BEGIN DELETE MLOAD


Here is a list of components or parameters that may be used in the BEGIN DELETE command. Note: The parameters do not require the usual dot prior to the command since parameters are actually sub-commands.

A Simple Multiload IMPORT Script


"We must use time as a tool, not as a crutch." John F. Kennedy Ask Not What your Multiload can do for you. Ask what you can do for your Multiload. Multiload is a great tool when you're short on time. Multiload can update, insert, delete or upsert on Teradata tables that are already populated. It can even do all four in one script. Our flatfile will contain Employee_numbers and Salaries * 2. We are giving a big raise. We're going to create a flat file to use with Multiload, as shown below: Let's create a flat file for our Multiload

Let's Execute it:

Remember, we'll still use the BTEQ utility to create our flat file.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

13

Building our Multiload Script


"I can accept failure, but I can't accept not trying." - Michael Jordan Getting these scripts down is a very hard process, so don't be discouraged if you have a couple of mistakes. The next two slides will show you a blank copy of the basic Multiload script, as well as a marked slide illustrating the important parts of the script:

"If you don't know where you're going, any road will take you there." - Lewis Carrol Creating our Multiload script
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

14

Executing Multiload
"Ambition is a dream with a V8 Engine." - Elvis Presley You will feel like the King after executing your first Multiload script. Multiload is the Elvis Presley of data warehousing because nobody knows how make more records then Multiload. If you have the ambition to learn, this book will give you what it takes to steer through these utilities. We initialize the Multiload utility like we do with BTEQ, except that the keyword with Multiload Is mload. Remember that this Multiload is going to double the salaries of our employees. Let's execute our Multiload script

Here is a before and after image of our Employee_table02:

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

15

Another Simple MultiLoad IMPORT Script


"Those who dance are considered insane by those who cannot hear the music." - George Carlin MultiLoad can be somewhat intimidating to the new user because there are many commands and phases. In reality, the load scripts are understandable when you think through what the IMPORT mode does: Setting up a Logtable Logging onto Teradata Identifying the Target, Work and Error tables Defining the INPUT flat file Defining the DML activities to occur Naming the IMPORT file Telling MultiLoad to use a particular LAYOUT

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

16

Telling the system to start loading Finishing loading and logging off of Teradata

This first script example is designed to show MultiLoad IMPORT in its simplest form. It depicts the loading of a three-column Employee table. The actual script is in the left column and our comments are on the right. Below the script is a step-by-step description of how this script works. Step One: Setting up a Logtable and Logging onto Teradata MultiLoad requires you specify a log table right at the outset with the .LOGTABLE command. We have called it CDW_Log. Once you name the Logtable, it will be automatically created for you. The Logtable may be placed in the same database as the target table, or it may be placed in another database. Immediately after this you log onto Teradata using the .LOGON command. The order of these two commands is interchangeable, but it is recommended to define the Logtable first and then to Log on, second. If you reverse the order, Teradata will give a warning message. Notice that the commands in MultiLoad require a dot in front of the command key word. Step Two: Identifying the Target, Work and Error tables In this step of the script you must tell Teradata which tables to use. To do this, you use the .BEGIN IMPORT MLOAD command. Then you will preface the names of these tables with the sub-commands TABLES, WORKTABLES AND ERROR TABLES. All you must do is name the tables and specify what database they are in. Work tables and error tables are created automatically for you. Keep in mind that you get to name and locate these tables. If you do not do this, Teradata might supply some defaults of its own! At the same time, these names are optional. If the WORKTABLES and ERRORTABLES had not specifically been named, the script would still execute and build these tables. They would have been built in the default database for the user. The name of the worktable would be WT_EMPLOYEE_DEPT1 and the two error tables would be called ET_EMPLOYEE_DEPT1 and UV_EMPLOYEE_DEPT1, respectively. Sometimes, large Teradata systems have a work database with a lot of extra PERM space. One customer calls this database CORP_WORK. This is where all of the logtables and worktables are normally created. You can use a DATABASE command to point all table creations to it or qualify the names of these tables individually. Step Three: Defining the INPUT flat file record structure MultiLoad is going to need to know the structure the INPUT flat file. Use the .LAYOUT command to name the layout. Then list the fields and their data types used in your SQL as a .FIELD. Did you notice that an asterisk is placed between the column name and its data type? This means to automatically calculate the next byte in the record. It is used to designate the starting location for this data based on the previous fields length. If you are listing fields in order and need to skip a few bytes in the record, you can either use the .FILLER (like above) to position to the cursor to the next field, or the "*" on the Dept_No field could have been replaced with the number 132 (CHAR(11)+CHAR(20)+CHAR(100)+1). Then, the .FILLER is not needed. Also, if the input record fields are exactly the same as the table, the .TABLE can be used to automatically define all the .FIELDS for you. The LAYOUT name will be referenced later in the .IMPORT command. If the input file is created with INDICATORS, it is specified in the LAYOUT. Step Four: Defining the DML activities to occur The .DML LABEL names and defines the SQL that is to execute. It is like setting up executable code in a programming language, but using SQL. In our example, MultiLoad is being told to INSERT a row into the SQL01.Employee_Dept table. The VALUES come from the data in each FIELD because it is preceded by a colon (:). Are you allowed to use multiple labels in a script? Sure! But remember this: Every label must be referenced in an APPLY clause of the .IMPORT clause. Step Five: Naming the INPUT file and its format type This step is vital! Using the .IMPORT command, we have identified the INFILE data as being contained in a file called "CDW_Join_Export.txt". Then we list the FORMAT type as TEXT. Next, we referenced the LAYOUT named FILEIN to describe the fields in the record. Finally, we told MultiLoad to APPLY the
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

17

DML LABEL called INSERTS that is, to INSERT the data rows into the target table. This is still a sub-component of the .IMPORT MLOAD command. If the script is to run on a mainframe, the INFILE name is actually the name of a JCL Data Definition (DD) statement that contains the real name of the file. Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continues until it finds the semi-colon to define the end of the command. This is how it determines one operation from another. Therefore, it is very important or it would have attempted to process the END LOADING as part of the IMPORT it wouldn't work. Step Six: Finishing loading and logging off of Teradata This is the closing ceremonies for the load. MultiLoad to wrap things up, closes the curtains, and logs off of the Teradata system. Important note: Since the script above in Figure 5-6 does not DROP any tables, it is completely capable of being restarted if an error occurs. Compare this to the next script in Figure 5-7. Do you think it is restartable? If you said no, pat yourself on the back. REQUIRED OR NOT Required Optional
WORKTABLES Tablename1

PARAMETER
TABLES Tablename1

WHAT IT DOES Names the Target table. Names the worktable one per target table. Names the two error tables, two per target table and there is no comma separator between them. Tells MultiLoad how many hours to try establishing sessions when its initial effort to do so is rebuffed.

Optional
ERRORTABLES ET_ERR UV_ERR

Optional
TENACITY

Figure 5-6
/* Simple Mload script .LOGTABLE SQL01.CDW_Log; .LOGON TDATA/SQL01,SQL0; */

Sets Up a Logtable and Logs on to Teradata

Begins the Load Process by naming the Target Table, Work table and error tables; Notice NO comma between the error tables Names the LAYOUT of the .LAYOUT FILEIN; INPUT record and defines .FIELD Employee_No * CHAR(11); its structure; Notice the .FIELD Last_Name * CHAR(20); .FILLER Junk_stuff * CHAR(100); dots before the FIELD and FILLER and the .FIELD Dept_No * CHAR(6); semi-colons after each definition. Names the DML Label
.BEGIN IMPORT MLOAD TABLES SQL01.Employee_Dept1 WORKTABLES SQL01.CDW_WT ERRORTABLES SQL01.CDW_ET SQL01.CDW_UV; .DML LABEL INSERTS; INSERT INTO SQL01.Employee_Dept1 (Employee_No ,Last_Name ,Dept_No )

Tells MultiLoad to INSERT a row into the target table and defines the row format.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

18

VALUES (:Employee_No ,:Last_Name ,:Dept_No ); .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS; .END MLOAD; .LOGOFF;

Lists, in order, the VALUES (each one preceded by a colon) to be INSERTed. Names the Import File and its Format type; Cites the LAYOUT file to use tells Mload to APPLY the INSERTs. Ends MultiLoad and Logs off all MultiLoad sessions

Figure 5-7

MultiLoad IMPORT Script


Let's take a look at MultiLoad IMPORT script that comes from real life. This sample script will look much more like what you might encounter at your workplace. It is more detailed. The notes to the right are brief and too the point. They will help you can grasp the essence of what is happening in the script.
/* !/bin/ksh* */

/* +++++++++++++++++++++++++++++++++++++*/ /* MultiLoad SCRIPT */ /*This script is designed to change the */ /*EMPLOYEE_DEPT1 table using the data found */ /* in IMPORT INFILE CDW_Join_Export.txt */ /* Version 1.1 */ /* Created by Coffing Data Warehousing */ /* +++++++++++++++++++++++++++++++++++++*/

Load Runs from a Shell Script Any words between /* */ are comments only and are not processed by Teradata.

Names and describes the purpose of the script; names the author Secures the logon by .LOGTABLE SQL01.CDW_Log; storing userid and .RUN FILE LOGON.TXT; password in a separate /*Drop Error Tables caution, this script cannot be file, then reads it.
restarted because these tables would be needed */ DROP TABLE SQL01.CDW_ET; DROP TABLE SQL01.CDW_UV;

Drops Existing error tables and cancels the ability for the script to restart DON'T ATTEMPT THIS AT HOME! Also, SQL does not use a dot (.) Begins the Load /* Begin Import and Define Work and Error Tables */ Process by telling us .BEGIN IMPORT MLOAD TABLES first the names of the SQL01.Employee_Dept1 target table, Work table WORKTABLES and error tables; note SQL01.CDW_WT ERRORTABLES NO comma between SQL01.CDW_ET the names of the error SQL01.CDW_UV; tables Names the LAYOUT of /* Define Layout of Input File */ the INPUT file.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

19

.LAYOUT FILEIN; .FIELD Employee_No .FIELD First_Name .FIELD Last_Name .FIELD Dept_No .FIELD Dept_Name

* * * * *

CHAR(11); CHAR(14); CHAR(20); CHAR(6); CHAR(20);

Defines the structure of the INPUT file. Notice the dots before the FIELD command and the semi-colons after each FIELD definition. Names the DML Label

/* Begin INSERT Process on Table */ .DML LABEL INSERTS; INSERT INTO SQL01.Employee_Dept1 ( Employee_No ,First_Name ,Last_Name ,Dept_No ,Dept_Name ) VALUES ( :Employee_No ,:First_Name ,:Last_Name ,:Dept_No ,:Dept_Name );

Tells MultiLoad to INSERT a row into the target table and defines the row format. Note that we place comma separators in front of the following column or value for easier debugging. Lists, in order, the VALUES to be INSERTed. Names the Import File and States its Format type; Names the Layout file to use And tells MultiLoad to APPLY the INSERTs. Ends MultiLoad and Logs off of Teradata

/* Specify IMPORT File and Apply Parameters */ .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS;

.END MLOAD; .LOGOFF;

Figure 5-8

Error Treatment Options for the .DML LABEL Command


MultiLoad allows you to tailor how it deals with different types of errors that it encounters during the load process, to fit your needs. Here is a summary of the options available to you:

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

20

ERROR TREATMENT OPTIONS FOR .DML LABEL


.DML LABEL {labelname} {MARK | IGNORE} DUPLICATE [INSERT |UPDATE] ROWS {MARK | IGNORE} MISSING [INSERT |UPDATE] ROWS DO INSERT FOR [MISSING UPDATE] ROWS ;

Figure 5-9 In IMPORT mode, you may specify as many as five distinct error-treatment options for one .DML statement. For example, if there is more than one instance of a row, do you want MultiLoad to IGNORE the duplicate row, or to MARK it (list it) in an error table? If you do not specify IGNORE, then MultiLoad will MARK, or record all of the errors. Imagine you have a standard INSERT load that you know will end up recording about 20,000 duplicate row errors. Using the following syntax "IGNORE DUPLICATE INSERT ROWS;" will keep them out of the error table. By ignoring those errors, you gain three benefits: 1. You do not need to see all the errors. 2. The error table is not filled up needlessly. 3. MultiLoad runs much faster since it is not conducting a duplicate row check.

When doing an UPSERT, there are two rules to remember:

The default is IGNORE MISSING UPDATE ROWS. Mark is the default for all operations. When doing an UPSERT, you anticipate that some rows are missing, otherwise, why do an UPSERT. So, this keeps these rows out of your error table. The DO INSERT FOR MISSING UPDATE ROWS is mandatory. This tells MultiLoad to insert a row from the data source if that row does not exist in the target table because the update didn't find it.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

21

The table that follows shows you, in more detail, how flexible your options are: ERROR TREATMENT OPTIONS IN DETAIL .DML LABEL WHAT IT DOES OPTION MARK DUPLICATE This option logs an entry for all duplicate INSERT rows in the INSERT ROWS UV_ERR table. Use this when you want to know about the duplicates. IGNORE DUPLICATE This tells MultiLoad to IGNORE duplicate INSERT rows INSERT ROWS because you do not want to see them. MARK DUPLICATE This logs the existence of every duplicate UPDATE row. UPDATE ROWS IGNORE DUPLICATE This eliminates the listing of duplicate update row errors. UPDATE ROWS MARK MISSING This option ensures a listing of data rows that had to be UPDATE ROWS INSERTed since there was no row to UPDATE. IGNORE MISSING This tells MultiLoad NOT to list UPDATE rows as an error. UPDATE ROWS This is a good option when doing an UPSERT since UPSERT will INSERT a new row. MARK MISSING This option makes a note in the ET_Error Table that a row to DELETE ROWS be deleted is missing. IGNORE MISSING This option says, "Do not tell me that a row to be deleted is DELETE ROWS missing. DO INSERT for This is required to accomplish an UPSERT. It tells MultiLoad MISSING UPDATE that if the row to be updated does not exist in the target table, ROWS then INSERT the entire row from the data source.

Figure 5-10

An IMPORT Script with Error Treatment Options


The command .DML LABEL names any DML options (INSERT, UPDATE OR DELETE) that immediately follow it in the script. Each label must be given a name. In IMPORT mode, the label will be referenced for use in the APPLY Phase when certain conditions are met. The following script provides an example of just one such possibility:
/* !/bin/ksh* */

/* +++++++++++++++++++++++++++++++++++++*/ /* MultiLoad SCRIPT */ /*This script is designed to change the */ /*EMPLOYEE_DEPT table using the data from */ /* the IMPORT INFILE CDW_Join_Export.txt */ /* Version 1.1 */ /* Created by Coffing Data Warehousing*/ /* +++++++++++++++++++++++++++++++++++++ */

Load Runs from a Shell Script Any words between /* */ are COMMENTS ONLY and are not processed by Teradata.

Names and describes the purpose of the script; names the author Sets up a Logtable and /* Setup the MulitLoad Logtables, Logon Statements*/ then logs on to .LOGTABLE SQL01.CDW_Log; Teradata. .LOGON TDATA/SQL01,SQL01;
DATABASE SQL01;

Specifies the database in which to find the target table.


Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Reprinted for ibmkvskumar@in.ibm.com, IBM

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

22

/*Drop Error Tables */ DROP TABLE WORKDB.CDW_ET; DROP TABLE WORKDB.CDW_UV;

Drops Existing error tables in the work database. Begins the Load /* Begin Import and Define Work and Error Tables */ Process by telling us .BEGIN IMPORT MLOAD TABLES first the names of the Employee_Dept Target Table, Work WORKTABLES table and error tables WORKDB.CDW_WT ERRORTABLES are in a work database. WORKDB.CDW_ET Note there is no WORKDB.CDW_UV; comma between the names of the error tables (pair). Names the LAYOUT of /* Define Layout of Input File */ the INPUT file.
.LAYOUT FILEIN; .FIELD Employee_No .FIELD First_Name .FIELD Last_Name .FIELD Dept_No .FIELD Dept_Name * * * * * CHAR(11); CHAR(14); CHAR(20); CHAR(6); CHAR(20);

Defines the structure of the INPUT file. Notice the dots before the FIELD command and the semi-colons after each FIELD definition. Names the DML Label

/* Begin INSERT Process on Table */ .DML LABEL INSERTS IGNORE DUPLICATE INSERT ROWS; INSERT INTO SQL01.Employee_Dept ( Employee_No ,First_Name ,Last_Name ,Dept_No ,Dept_Name) VALUES ( :Employee_No ,:First_Name, ,:Last_Name, ,:Dept_No, ,:Dept_Name);

Tells MultiLoad NOT TO LIST duplicate INSERT rows in the error table; notice the option is placed AFTER the LABEL identification and immediately BEFORE the DML function. Lists, in order, the VALUES to be INSERTed. Names the Import File and States its Format type; names the Layout file to use and tells MultiLoad to APPLY the INSERTs. Ends MultiLoad and logs off of Teradata

/* Specify IMPORT File and Apply Parameters */ .IMPORT INFILE CDW_Join_Export.txt FORMAT TEXT LAYOUT FILEIN APPLY INSERTS;

.END MLOAD; .LOGOFF;

Figure 5-11

A IMPORT Script that Uses Two Input Data Files

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

23

/* !/bin/ksh* /*MultiLoad IMPORT SCRIPT with two INPUT files */ /*This script INSERTs new rows into the /* Employee_table and UPDATEs the Dept_Name /*in the Department_table. /* Version 1.1 /* Created by Coffing Data Warehousing /* +++++++++++++++++++++++++++++++++++++*/ .LOGTABLE SQL01.EMPDEPT_LOG; .RUN FILE c:\mydir\logon.txt;

*/ */ */ */ */ */ */

Load Runs from a Shell Script Any words between /* */ are comments only and are not processed by Teradata.

Sets up a Logtable and logs on with .RUN. The logon.txt file contains: .logon TDATA/SQL01,SQL01; Drops the worktables and error tables, in case they existed from a prior load; NOTE: Do NOT include IF you want to RESTART using CHECKPOINT. Identifies the 2 target tables with a comma between them. Names the worktable and error tables for each target table;

DROP DROP DROP DROP DROP DROP

TABLE TABLE TABLE TABLE TABLE TABLE

SQL01.EMP_WT; SQL01.DEPT_WT; SQL01.EMP_ET; SQL01.EMP_UV; SQL01.DEPT_ET; SQL01.DEPT_UV;

/* the following defines 2 tables for loading */ .BEGIN IMPORT MLOAD TABLES SQL01.Employee_Table, SQL01.Department_Table WORKTABLES SQL01.EMP_WT, SQL01.DEPT_WT ERRORTABLES SQL01.EMP_ET SQL01.EMP_UV, SQL01.DEPT_ET SQL01.DEPT_UV;

Note there are NO commas between the pair of names, but there is a comma between this pair and the next pair. Names and Defines /* these next 2 LAYOUTs define 2 different records */ the LAYOUT of the 1st .LAYOUT FILEIN1; INPUT file .FIELD Emp_No * INTEGER;
.FIELD .FIELD .FIELD .FIELD LName FName Sal Dept_Num * * * * CHAR(20); VARCHAR(20); DECIMAL (10,2); INTEGER;

.LAYOUT FILEIN2; .FIELD DeptNo .FIELD DeptName

* CHAR(6); * CHAR(20);

Names and Defines the LAYOUT of the 2nd INPUT file Names the 1st DML Label; Tells MultiLoad to IGNORE duplicate INSERT rows because you do not want to see them. INSERT a row into the table, but does NOT name the columns. So

.DML LABEL EMP_INS IGNORE DUPLICATE INSERT ROWS; INSERT INTO SQL01.Employee_Table VALUES (:Emp_No ,:FName ,:LName ,:Sal ,:Dept_Num);

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

24

all VALUES are passed IN THE ORDER they are defined in the Employee table.
.DML LABEL DEPT_UPD; UPDATE Department_Table SET Dept_Name = :DeptName WHERE Dept_No = :DeptNo;

Names the 2nd DML Label; Tells MultiLoad to UPDATE when it finds Deptno (record) equal to the Dept_No in the Department_table and change the Dept_name column with the DeptName from the INPUT file. Names the TWO Import Files Names the TWO Layouts that define the structure of the INPUT DATA files and tells MultiLoad to APPLY the INSERTs to target table 1 and the UPDATEs to target table 2. Ends MultiLoad and logs off of Teradata.

.IMPORT INFILE Emp_Data LAYOUT FILEIN1 APPLY EMP_INS; .IMPORT INFILE Dept_Data LAYOUT FILEIN2 APPLY DEPT_UPD;

.END MLOAD; .LOGOFF;

Figure 5-12

Redefining the INPUT


Sometimes, instead of using two different INPUT DATA files, which require two separate LAYOUTs, you can combine them into one INPUT DATA file. And you can use that one file, with just one LAYOUT to load more than one table! You see, a flat file may contain more than one type of data record. As long as each record has a unique code to identify it, MultiLoad can check this code and know which layout to use for using different names in the same layout. To do this you will need to REDEFINE the INPUT. You do this by redefining a field's position in the .FIELD or .FILLER section of the LAYOUT. Unlike the asterisk (*), which means that a field simply follows the previous one, redefining will cite a number that tells MultiLoad to take a certain portion of the INPUT file and jump to the redefined position to back toward the beginning of the record.

A Script that Uses Redefining the Input


The following script uses the ability to define two record types in the same input data file. It uses a .FILLER to define the code since it is never used in the SQL, only to determine which SQL to run.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

25

/* !/bin/ksh*

*/

/* +++++++++++++++++++++++++++++++++++++*/ /* MultiLoad IMPORT SCRIPT with multiple target */ /*tables and DML labels */ /*This script INSERTs new rows into the */ /* Employee_table and UPDATEs the Dept_Name */ /*in the Department_table */ /* Version 1.1 */ /* Created by Coffing Data Warehousing */ /* +++++++++++++++++++++++++++++++++++++*/ .LOGTABLE SQL01.EmpDept_Log; .LOGON TDATA/SQL01,SQL01;

Load Runs from a Shell Script Any words between /* */ are comments only and are not processed by Teradata.

Sets Up a Logtable and Logs on to Teradata; Optionally, specifies the database to work in. Identifies the 2 target /* 2 target tables, 2 work tables, 2 error tables per tables;
target table, defined in pairs BEGIN IMPORT MLOAD TABLES SQL01.Employee_Table, SQL01.Department_Table WORKTABLES SQL01.EMP_WT, SQL01.DEPT_WT ERRORTABLES SQL01.EMP_ET SQL01.EMP_UV, SQL01.DEPT_ET SQL01 .DEPT_UV; .LAYOUT FILEIN; .FILLER Trans .FIELD Emp_No .FIELD Dept_Num .FIELD LName .FIELD FName .FIELD Sal .FIELD DeptNo .FIELD DeptName */

Names the worktable and error tables for each target tables; Note there is no comma between the names of the error tables but there is a comma between the pair of error tables. Names and defines the LAYOUT of the INPUT record. The FILLER is for a field that tells what type of record has been read. Here that field contains an "E" or a "D". The "E" tells MLOAD use the Employee data and the "D" is for department data. The definition for Dept_Num tells MLOAD to jump backward to byte 2. Where as the * for Emp_Num defaulted to byte 2. So, Emp_No and Dept_Num both start at byte 2, but in different types of records. When Trans (byte position 1) contains a "D", the APPLY uses the dept names and for an "E" the APPLY uses the employee data.

* * * * * * 2 *

CHAR (1); INTEGER; INTEGER; CHAR(20); VARCHAR(20); DECIMAL (10,2); INTEGER; CHsssssssAR(20);

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

26

.DML LABEL EMPIN IGNORE DUPLICATE INSERT ROWS; INSERT INTO SQL01.Employee_Table VALUES ( :Emp_No ,:FName ,:LName ,:Sal ,:Dept_Num );

Names the 1st DML Label; Tells MultiLoad to IGNORE duplicate INSERT rows because you do not want to see them. Tells MultiLoad to INSERT a row into the 1st target table but optionally does NOT define the target table row format. All the VALUES are passed to the columns of the Employee table IN THE ORDER of that table's row format. Names the 2nd DML Label; Tells MultiLoad to UPDATE the 2nd target table but optionally does NOT define that table's row format. When the VALUE of the DeptNo equals that of the Dept_No column of the Department, then update the Dept_Name column with the DeptName from the INPUT file. Ends MultiLoad and logs off of Teradata.

.DML LABEL DEPTIN; UPDATE Department_Table SET Dept_Name = :DeptName WHERE Dept_No = :DeptNo;

.IMPORT INFILE UPLOAD.dat LAYOUT FILEIN APPLY EMPIN WHERE Trans = 'E' APPLY DEPTIN WHERE Trans = 'D' ; .END MLOAD; .LOGOFF;

Figure 5-13

DELETE MLOAD Script Using a Hard Coded Value


The next script demonstrates how to use the MultiLoad DELETE task. In this example, students no longer enrolled in the university are being removed from the Student_Profile table, based upon the registration date. The profile of any student who enrolled prior to this date will be removed.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

27

.LOGTABLE RemoveLog; .LOGON TDATA/SQL01,SQL01; .BEGIN DELETE MLOAD TABLES Order_Table; DELETE FROM Order_Table WHERE Order_Date < '99/12/31';

Identifies the Logtable and logs onto Teradata with a valid logon string. Begins MultiLoad in DELETE mode and Names the target table. SQL DELETE statement does a massive delete of order data for orders placed prior to the hard coded date in the WHERE clause. Notice that this is not the Primary Index. You CANNOT DELETE in DELETE MLOAD mode based upon the Primary Index. Ends loading and logs off of Teradata.

.END MLOAD; LOGOFF;

Figure 5-14 How many differences from a MultiLoad IMPORT script readily jump off of the page at you? Here are a few that we saw:

At the beginning, you must specify the word "DELETE" in the .BEGIN MLOAD command. You need not specify it in the .END MLOAD command. You will readily notice that this mode has no .DML LABEL command. Since it is focused on just one absolute function, no APPLY clause is required so you see no .DML LABEL. Notice that the DELETE with a WHERE clause is an SQL function, not a MultiLoad command, so it has no dot prefix. Since default names are available for worktables (WT_<target_tablename>) and error tables (ET_<target_tablename> and UV_<target_tablename>), they need not be specifically named, but be sure to define the Logtable.

Do not confuse the DELETE MLOAD task with the SQL delete task that may be part of a MultiLoad IMPORT. The IMPORT delete is used to remove small volumes of data rows based upon the Primary Index. On the other hand, the MultiLoad DELETE does global deletes on tables, bypassing the Transient Journal. Because there is no Transient Journal, there are no rollbacks when the job fails for any reason. Instead, it may be RESTARTed from a CHECKPOINT. Also, the MultiLoad DELETE task is never based upon the Primary Index. Because we are not importing any data rows, there is neither a need for worktables nor an Acquisition Phase. One DELETE statement is sent to all the AMPs with a match tag parcel. That statement will be applied to every table row. If the condition is met, then the row is deleted. Using the match tags, each target block is read once and the appropriate rows are deleted.

A DELETE MLOAD Script Using a Variable


This illustration demonstrates how passing the values of a data row rather than a hard coded value may be used to help meet the conditions stated in the WHERE clause. When you are passing values, you must add some additional commands that were not used in the DELETE example with hard coded values.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

28

.LOGTABLE RemoveLog; .LOGON TDATA/SQL01,SQL01; .BEGIN DELETE MLOAD TABLES Order_Table; .LAYOUT OldMonth .FIELD OrdDate * DATE;

Identifies the Logtable and logs onto Teradata with a valid logon string. Begins the DELETE task and names only one table, but still uses TABLES option.

Names the LAYOUT and defines the column whose value will be passed as a single row to MultiLoad. In this case, all of the order dates in the Order_Table will be tested against this OrdDate value. The condition in the WHERE clause is that the data rows with DELETE FROM Order_Table orders placed prior to the date value (:OrdDate) passed from WHERE Order_Date < :OrdDate; the LAYOUT OldMonth will be DELETEd from the Order_Table. Note that this time there is no dot in front of LAYOUT in this .IMPORT INFILE clause since it is only being referenced.
LAYOUT OldMonth ;

Ends loading and logs off of Teradata.


.END MLOAD; .LOGOFF;

Figure 5-15

An UPSERT Sample Script


The following sample script is provided to demonstrate how to do an UPSERT that is, to update a table and if a row from the data source table does not exist in the target table, then insert a new row. In this instance we are loading the Student_Profile table with new data for the next semester. The clause "DO INSERT FOR MISSING UPDATE ROWS" indicates an UPSERT. The DML statements that follow this option must be in the order of a single UPDATE statement followed by a single INSERT statement. Load Runs from a
/* !/bin/ksh* */ shell script; Any /* +++++++++++++++++++++++++++++++++++++++++++++++++ */ words between /* /* MultiLoad UPSERT SCRIPT */ /*This script Updates the Student_Profile Table */ */ are comments /* with new data and Inserts a new row into the table */ only and are not /* if the row to be updated does not exist. */ processed by /* Version 1.1 */ Teradata; /* Created by Coffing Data Warehousing */ /* ++++++++++++++++++++++++++++++++++++++++++++++++++*/

/* Setup Logtable, Logon Statements*/ .LOGTABLE SQL01.CDW_Log; .LOGON CDW/SQL01,SQL01; /* Begin Import and Define Work and Error Tables */ .BEGIN IMPORT MLOAD TABLES SQL01.Student_Profile WORKTABLES SQL01.SWA_WT ERRORTABLES SQL01.SWA_ET SQL01.SWA_UV; /* Define Layout of Input File */

Names and describes the purpose of the script; names the author. Sets Up a Logtable and then logs on to Teradata. Begins the Load Process by telling us first the names of the target table, work table and error tables. Names the LAYOUT of the INPUT file;

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

29

.LAYOUT FILEIN; .FIELD Student_ID .FIELD Last_Name .FIELD First_Name .FIELD Class_Code .FIELD Grade_Pt

* * * * *

INTEGER; CHAR (20); VARCHAR (12); CHAR (2); DECIMAL(5,2);

An ALL CHARACTER based flat file. Defines the structure of the INPUT file; Notice the dots before the FIELD command and the semi-colons after each FIELD definition; Names the DML Label Tells MultiLoad to INSERT a row if there is not one to be UPDATED, i.e., UPSERT. Defines the UPDATE. Qualifies the UPDATE. Defines the INSERT. We recommend placing comma separators in front of the following column or value for easier debugging. Names the Import File and it names the Layout file to use and tells MultiLoad to APPLY the UPSERTs. Ends MultiLoad and logs off of Teradata

/* Begin INSERT and UPDATE Process on Table */ .DML LABEL UPSERTER DO INSERT FOR MISSING UPDATE ROWS; /* Without the above DO, one of these is guaranteed to fail on this same table. If the UPDATE fails because rows is missing, it corrects by doing the INSERT */ UPDATE SQL01.Student_Profile SET Last_Name = :Last_Name ,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_Pt WHERE Student_ID = :Student_ID; INSERT INTO SQL01.Student_Profile VALUES (:Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt);

.IMPORT INFILE CDW_IMPORT.DAT LAYOUT FILEIN APPLY UPSERTER;

.END MLOAD; .LOGOFF;

Figure 5-16

What Happens when MultiLoad Finishes


MultiLoad Statistics

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

30

****08:06:41 UTY1803 Import Processing Statistics Import 1 Candidate Records considered . . . 70000 Apply conditions satisfied . . . . 70000 Total Thus Far 70000 70000

****08:06:38 UTY0818 Statistics for table Employee_Table INSERTS: 25000 UPDATES: 25000 DELETES: 0 ****08:06:41 UTY0818 Statistics for table Department_Table INSERTS: 0 UPDATES: 20000 DELETES: 0

Figure 5-17

Troubleshooting Multiload Errors


The output statistics in the above example indicate that the load was entirely successful. But that is not always the case. Now we need to troubleshoot in order identify the errors and correct them, if desired. Earlier on, we noted that MultiLoad generates two error tables, the Acquisition Error and the Application error table. You may select from these tables to discover the problem and research the issues. For the most part, the Acquisition error table logs errors that occur during that processing phase. The Application error table lists Unique Primary Index violations, field overflow errors on non-PI columns, and constraint errors that occur in the APPLY phase. MultiLoad error tables not only list the errors they encounter, they also have the capability to STORE those errors. Do you remember the MARK and IGNORE parameters? This is where they come into play. MARK will ensure that the error rows, along with some details about the errors are stored in the error table. IGNORE does neither; it is as if the error never occurred. ErrorCode ErrorField HostData Figure 5-19 THREE COLUMNS SPECIFIC TO THE ACQUISITION ERROR TABLE System code that identifies the error. Name of the column in the target table where the error happened; is left blank if the offending column cannot be identified. The data row that contains the error.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

31

THREE COLUMNS SPECIFIC TO THE APPLICATION ERROR TABLE Contains a certain value that disallows duplicate row errors in this table; can be ignored, if desired. DBCErrorCode System code that identifies the error. DBCErrorField Name of the column in the target table where the error happened; is left blank if the offending column cannot be identified. NOTE: A copy of the target table column immediately follows this column. Uniqueness Figure 5-20

RESTARTing Multiload
Who hasn't experienced a failure at some time when attempting a load? Don't take it personally! Failures can and do occur on the host or Teradata (DBC) for many reasons. MultiLoad has the impressive ability to RESTART from failures in either environment. In fact, it requires almost no effort to continue or resubmit the load job. Here are the factors that determine how it works: First, MultiLoad will check the Restart Logtable and automatically resume the load process from the last successful CHECKPOINT before the failure occurred. Remember, the Logtable is essential for restarts. MultiLoad uses neither the Transient Journal nor rollbacks during a failure. That is why you must designate a Logtable at the beginning of your script. MultiLoad either restarts by itself or waits for the user to resubmit the job. Then MultiLoad takes over right where it left off. Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the host program will restart MultiLoad after Teradata is back up and running. You do not have to do a thing! Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you may simply resubmit the script without changing a thing. MultiLoad will find out where it stopped and start again from that very spot. Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to run until complete. Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOAD clause will be enacted. The results are stored in the Logtable. During the Application Phase, CHECKPOINTs are logged each time a data block is successfully written to its target table. HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT as 60 or less, minutes are assumed. If you specify the checkpoint at 61 or above, the number of records is assumed.

RELEASE MLOAD: When You DON'T Want to Restart MultiLoad


What if a failure occurs but you do not want to RESTART MultiLoad? Since MultiLoad has already updated the table headers, it assumes that it still "owns" them. Therefore, it limits access to the table(s). So what is a user to do? Well there is good news and bad news. The good news is that if the job you may use the RELEASE MLOAD command to release the locks and rollback the job. The bad news is that if you have been loading multiple millions of rows, the rollback may take a lot of time. For this reason, most customers would rather just go ahead and RESTART. Before V2R3: In the earlier days of Teradata it was NOT possible to use RELEASE MLOAD if one of the following three conditions was true: In IMPORT mode, once MultiLoad had reached the end of the Acquisition Phase you could not use RELEASE MLOAD. This is sometimes referred to as the "point of no return."
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

32

In DELETE mode, the point of no return was when Teradata received the DELETE statement. If the job halted in the Apply Phase, you will have to RESTART the job.

With and since V2R3: The advent of V2R3 brought new possibilities with regard to using the RELEASE MLOAD command. It can NOW be used in the APPLY Phase, if:

You are running a Teradata V2R3 or later version You use the correct syntax:
RELEASE MLOAD <target-table> IN APPLY

The load script has NOT been modified in any way The target tables either: Must be empty, or Must have no Fallback, no NUSIs, no Permanent Journals

You should be very cautious using the RELEASE command. It could potentially leave your table half updated. Therefore, it is handy for a test environment, but please don't become too reliant on it for production runs. They should be allowed to finish to guarantee data integrity.

MultiLoad and INMODs


INMODs, or Input Modules, may be called by MultiLoad in either mainframe or LAN environments, providing the appropriate programming languages are used. INMODs are user written routines whose purpose is to read data from one or more sources and then convey it to a load utility, here MultiLoad, for loading into Teradata. They allow MultiLoad to focus solely on loading data by doing data validation or data conversion before the data is ever touched by MultiLoad. INMODs replace the normal MVS DDNAME or LAN file name with the following statement:
.IMPORT INMOD=<INMOD-name>

You will find a more detailed discussion on how to write INMODs for MultiLoad in the chapter of this book titled, "INMOD Processing".

How Multiload Compares with FastLoad

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

33

Function Error Tables must be defined Work Tables must be defined Logtable must be defined Allows Referential Integrity Allows Unique Secondary Indexes Allows Non-Unique Secondary Indexes Allows Triggers Loads a maximum of n number of tables DML Statements Supported DDL Statements Supported Transfers data in 64K blocks Number of Phases Is RESTARTable Stores UPI Violation Rows Allows use of Aggregated, Arithmetic calculations or Conditional Exponentiation Allows Data Conversion NULLIF function Figure 5-21

MultiLoad Optional. 2 Error Tables have to exist for each target table and will automatically be assigned. No Optional. 1 Work Table has to exist for each target table and will automatically be assigned. No Yes No No No No No Yes No No One Five INSERT INSERT, UPDATE, DELETE, and "UPSERT" CREATE and DROP DROP TABLE TABLE Yes Yes Two Five Yes Yes, in all 5 phases (auto CHECKPOINT) Yes Yes No Yes Yes, 1 per column Yes Yes Yes

FastLoad Yes

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

S-ar putea să vă placă și