Sunteți pe pagina 1din 137

Page 1 of 137

Mod 1 - Welcome to the Teradata Database

Objectives

After completing this module, you should be able to:

 Describe the Teradata Database.


 Describe the advantages of the Teradata Database.
 Define the terms associated with relational databases.
 Describe the advantages of a relational database.

HOT TIP: This module contains links to important supplemental course information.
Please be sure to click on each hotword link to capture all of the training content.

What is the Teradata Database?

The Teradata Database is a relational database


management system (RDBMS) that drives a company's
data warehouse. The Teradata Database provides the
foundation to give a company the power to grow, to
compete in today's dynamic marketplace, and to evolve the
business by getting answers to a new generation of
questions. The Teradata Database's scalability allows the
system to grow as the business grows, from gigabytes to
terabytes and beyond. The Teradata Database's unique
technology has been proven at customer sites across
industries and around the world.

The Teradata Database is an open system, compliant with


industry ANSI standards. It is currently available on these
industry standard operating systems, UNIX MP-RAS (Discontinued with Teradata 13.10),
Microsoft Windows 2000, Microsoft Windows 2003 and Novell SUSE Linux operating
systems. For this reason, Teradata is considered an open architecture.

The Teradata Database is a large database server that accommodates multiple client
applications making inquiries against it concurrently. Various client platforms access the
database through a TCP-IP connection or across an IBM mainframe channel
connection. The Teradata Database is accessed using SQL (Structured Query
Language), the industry standard access language for communicating with an RDBMS.
The ability to manage large amounts of data is accomplished using the concept of
parallelism, wherein many individual processors perform smaller tasks concurrently to
accomplish an operation against a huge repository of data. To date, only parallel
architectures can handle databases of this size.

How Is The Teradata Database Used?

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 2 of 137

Each Teradata Database implementation can model a


company's business. The ability to keep up with rapid
changes in today's business environment makes the
Teradata Database an ideal foundation for many
applications, including:

 Enterprise data warehousing


 Active data warehousing
 Customer relationship management
 Internet and E-Business
 Data marts

Just for Fun . . .

Based on what you know so far, what do you think are some Teradata Database
features that make it so successful in today's business environment? (Details on the
following are coming up next.)

j
k
l
m
n A. Scalability.
j
k
l
m
n B. Single data store.
j
k
l
m
n C. High degree of parallelism.
j
k
l
m
n D. Ability to model the business.
j
k
l
m
n E. All of the above.
Feedback:

What Makes the Teradata Database Unique?

In this Web-Based Training, you will learn about many features that make the Teradata
Database, an RDBMS, right for business-critical applications. To start with, this section
covers these key features:

 Single data store


 Scalability
 Unconditional parallelism (parallel architecture)
 Ability to model the business
 Mature, parallel-aware Optimizer

Single Data Store

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 3 of 137

The Teradata Database acts as a single data store, with multiple client applications
making inquiries against it concurrently.

Instead of replicating a database for different purposes, with the Teradata Database you
store the data once and use it for many applications. The Teradata Database provides
the same connectivity for an entry-level system as it does for a massive enterprise data
warehouse.

Scalability

"Linear scalability" means that as you add components to the system, the performance
increase is linear. Adding components allows the system to accommodate increased
workload without decreased throughput. Linear scalability enables the system to grow to
support more users/data/queries/complexity of queries without experiencing
performance degradation. As the configuration grows, performance increase is linear,
slope of 1. The Teradata Database was the first commercial database system to scale
to and support a trillion bytes of data.

The chart below lists the meaning of the prefixes:


Prefix Exponent Meaning
kilo- 103 1,000 (thousand)
mega- 106 1,000,000 (million)
giga- 109 1,000,000,000 (billion)
tera- 1012 1,000,000,000,000 (trillion)

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 4 of 137

peta- 1015 1,000,000,000,000,000 (quadrillion)


exa- 1018 1,000,000,000,000,000,000 (quintillion)

The Teradata Database can scale from 100 gigabytes to over 100 terabytes of data on a
single system without losing any performance capability. The Teradata Database's
scalability provides investment protection for customer's growth and application
development. The Teradata Database is the only database that is predictably scalable
in multiple dimensions, and this extends to data loading with the use of parallel loading
utilities. The Teradata Database provides automatic data distribution and no
reorganizations of data are needed. The Teradata Database is scalable in multiple ways,
including hardware, query complexity, and number of concurrent users.

Hardware

Growth is a fundamental goal of business. An MPP Teradata Database system easily


accommodates that growth whenever it happens. The Teradata Database runs on highly
optimized Teradata servers in the following configurations:

 SMP - Symmetric multiprocessing platforms manage gigabytes of data to support


an entry-level data warehousing system.
 MPP - Massively parallel processing systems can manage hundreds of terabytes
of data. You can start with a couple of nodes, and later expand the system as your
business grows.

With the Teradata Database, you can increase the size of your system without replacing:

 Databases - When you expand your system, the data is automatically redistributed
through the reconfiguration process, without manual interventions such as sorting,
unloading and reloading, or partitioning.
 Platforms - The Teradata Database's modular structure allows you to add
components to your existing system.
 Data model - The physical and logical data models remain the same regardless of
data volume.
 Applications - Applications you develop for Teradata Database configurations will
continue to work as the system grows, protecting your investment in application
development.

Complexity

The Teradata Database is adept at complex data models that satisfy the information
needs throughout an enterprise. The Teradata Database efficiently processes
increasingly sophisticated business questions as users realize the value of the answers
they are getting. It has the ability to perform large aggregations during query run time
and can perform up to 64 joins in a single query.

Concurrent Users

As is proven in every Teradata Database benchmark, the Teradata Database can


handle the most concurrent users, who are often running multiple, complex queries.
The Teradata Database has the proven ability to handle from hundreds to thousands of
users on the system simultaneously. Adding many concurrent users typically reduces
system performance. However, adding more components can enable the system to
accommodate the new users with equal or even better performance.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 5 of 137

Unconditional Parallelism

The Teradata Database provides exceptional performance using parallelism to achieve a


single answer faster than a non-parallel system. Parallelism uses multiple processors
working together to accomplish a task quickly.

An example of parallelism can be seen at an amusement park, as guests stand in line


for an attraction such as a roller coaster. As the line approaches the boarding platform, it
typically will split into multiple, parallel lines. That way, groups of people can step into
their seats simultaneously. The line moves faster than if the guests step onto the
attraction one at a time. At the biggest amusement parks, the parallel loading of the rides
becomes essential to their successful operation.

Parallelism is evident throughout a Teradata Database, from the architecture to data


loading to complex request processing. The Teradata Database processes requests in
parallel without mandatory query tuning. The Teradata Database's parallelism does not
depend on limited data quantity, column range constraints, or specialized data models --
The Teradata Database has "unconditional parallelism."

Teradata supports ad-hoc queries using ANSI-standard SQL, and includes SQL-ready
database management information (log files). This allows Teradata to interface with 3rd
party Business Intelligence (BI) tools and submit queries from other database systems.

Ability to Model the Business

A data warehouse built on a business model contains information from across the
enterprise. Individual departments can use their own assumptions and views of the data
for analysis, yet these varying perspectives have a common basis for a "single view of
the business."

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 6 of 137

With the Teradata Database's centrally located, logical architecture, companies can get
a cohesive view of their operations across functional areas to:

 Find out which divisions share customers.


 Track products throughout the supply chain, from initial manufacture, to inventory,
to sale, to delivery, to maintenance, to customer satisfaction.
 Analyze relationships between results of different departments.
 Determine if a customer on the phone has used the company's website.
 Vary levels of service based on a customer's profitability.

You get consistent answers from the different viewpoints above using a single business
model, not functional models for different departments. In a functional model, data is
organized according to what is done with it. But what happens if users later want to do
some analysis that has never been done before? When a system is optimized for one
department's function, the other departments' needs (and future needs) may not be met.

A Teradata Database allows the data to represent a business model, with data
organized according to what it represents, not how it is accessed, so it is easy to
understand. The data model should be designed without regard to usage and be the
same regardless of data volume. With a Teradata Database as the enterprise data
warehouse, users can ask new questions of the data that were never anticipated,
throughout the business cycle and even through changes in the business environment.

A key Teradata Database strength is its ability to model the customer's business. The
Teradata Database supports business models that are truly normalized, avoiding the
costly star schema and snowflake implementations that many other database vendors
use. The Teradata Database can support star schema and other types of relational
modeling, but Third Normal Form is the method for relational modeling that we
recommend to customers. Our competitors typically implement star schema or snowflake
models either because they are implementing a set of known queries in a transaction
processing environment, or because their architecture limits them to that type of model.
Normalization is the process of reducing a complex data structure into a simple, stable
one. Generally this process involves removing redundant attributes, keys, and
relationships from the conceptual data model. The Teradata Database supports
normalized logical models because it is able to perform 64 table joins and large
aggregations during queries.

Mature, Parallel-Aware Optimizer

The Teradata Database Optimizer is the most robust in the industry, able to handle:

 Multiple complex queries


 Multiple joins per query
 Unlimited ad-hoc processing

The Optimizer is parallel-aware, meaning that it has knowledge of system components


(how many nodes, vprocs, etc.). It determines the least expensive plan (time-wise) to
process queries fast and in parallel. The Optimizer is further explained in the next
module.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 7 of 137

What is a Relational Database?

A database is a collection of permanently stored data that is:

 Logically related (the data was created for a specific purpose).


 Shared (many users may access the data).
 Protected (access to the data is controlled).
 Managed (the data integrity and value are maintained).

The Teradata Database is a relational database. Relational databases are based on the
relational model, which is founded on mathematical Set Theory. The relational model
uses and extends many principles of Set Theory to provide a disciplined approach to
data management. Users and applications access data in an RDBMS using industry-
standard SQL statements. SQL is a set-oriented language for relational database
management.

A relational database is designed to:

 Represent a business and its business practices.


 Be extremely flexible in the way that data can be selected and used.
 Be easy to understand
 Model the business, not the applications
 Allow businesses to quickly respond to changing conditions
 In addition, a single copy of the data can serve multiple purposes

Relational databases present data as of a set of tables. A table is a two-dimensional


representation of data that consists of rows and columns. According to the relational
model, a valid table does not have to be populated with data rows, it just needs to be
defined with at least one column.

Rows

Each row contains all the columns in the table. A row is one instance of all columns,
and each table can contain only one row format. The order of rows is arbitrary and
does not imply priority, hierarchy, or significance. It is a single entity in the table.

Each row represents an occurrence of an entity defined by the table. An entity is a


person, place, thing, or event about which the table contains information. In this
example, the entity is the employee and each row represents a single employee.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 8 of 137

Columns

Each column contains "like data," such as only part names, or only supplier names, or
only employee numbers. In the example below, the Last_Name column contains last
names only, and nothing else. The data in the columns is atomic data, so a telephone
number might be divided into three columns: the area code, the prefix, and the suffix, so
the customer data can be analyzed according to area code, etc. Missing data values
would be represented by "nulls" (the absence of a value). Within a table, the column
position is arbitrary.

Answering Questions with a Relational Database

A relational database is a set of logically related tables. Tables are logically related to
each other by a common field, so information such as customer telephone numbers and
addresses can exist in one table, yet be accessible for multiple purposes.

Relational databases do not use access paths to locate data; data connections are
made by data values. Data connections are made by matching values in one column
with the values in a corresponding column in another table. In relational terminology, this
connection is referred to as a join.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 9 of 137

The diagrams below show how the values in one table may be matched to values in
another table. The tables below shows customer, order, and billing statement data,
related by a common field, Customer ID. The common field of Customer ID lets you look
up information such as a customer name for a particular statement number, even though
the data exists in two different tables. This is done by performing a join between the
tables using the common field, Customer ID. Here are a few other examples of questions
that can be answered:

 "How many mats did customer Wood purchase?"


 "What is the statement number for O'Day's purchase of $45.30?"
 "For statement #344627, what state did the customer live in?"

To sum up, a relational database is a collection of tables. The data contained in the
tables can be associated using columns with matching data values.

Logical/Relational Modeling

The logical model should be independent of usage. A variety of front-end tools can be
accommodated so that the database can be created quickly.

The design of the data model is the same regardless of data volume.

An enterprise model is one that provides the ability to look across functional processes.

Normalization is the process of reducing a complex data structure into a simple, stable
one. Generally this process involves removing redundant attributes, keys, and
relationships from the conceptual data model. Normalization theory is constructed
around the concept of normal forms that define a system of constraints. If a relation
meets the constraints of a particular normal form, we say that relation is “in normal form."
The intent of normalizing a relational database is to put one fact in one place. By

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 10 of 137

decomposing your relations into normalized forms, you can eliminate the majority of
update anomalies that can occur when data is stored in de-normalized tables.

A slightly more detailed statement of this principle would be the definition of a relation (or
table) in a normalized relational database: A relation consists of a primary key, which
uniquely identifies any tuple, and zero or more additional attributes, each of which
represents a single-valued (atomic) property of the entity type identified by the primary
key. A tuple is an ordered set of values. The separator for each value is often a comma.
Common uses for the tuple as a data type are:

1. For passing a string of parameters from one program to another

2. Representing a set of value attributes in a relational database

3NF vs. Star Schema Model

As a model is refined, it passes through different states which can be referred to as


normal forms. A normalized model includes::

 Entities
 Attributes
 Relationships

First normal form rules state that each and every attribute within an entity instance has
one and only one value. No repeating groups are allowed within entities.

Second normal form requires that the entity must conform to the first normal form rules.
Every non-key attribute within an entity is fully dependent upon the entire key (key
attributes) of the entity, not a subset of the key.

Third normal form requires that the entity must conform to the first and second normal
form rules. In addition, no non-key attributes within an entity is functionally dependent
upon another non-key attribute within the same entity.

While the Teradata Database can support any data model that can be processed via
SQL; an advantages of a normalized data model is the ability to support previously
unknown (ad-hoc) questions.

Star Schema
The star schema (sometimes referenced as star join schema) is the simplest style of
data warehouse schema. The star schema consists of a few fact tables (possibly only
one, justifying the name) referencing any number of dimension tables. The star schema
is considered an important special case of the snowflake schema.

Some characteristics of a Star Schema model include:

 They tend to have fewer entities


 They advocate a greater level of denormalization

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 11 of 137

Primary Key

In the relational model, a Primary Key (PK) is used to designate a unique identifier for
each row when you design a table. A Primary Key can be composed of one or more
columns. In the example below, the Primary Key is the employee number.

Primary Key Rules

Rules governing how Primary Keys must be defined and how they function are:

Rule 1: A Primary Key is required.


Rule 2: A Primary Key value must be unique.
Rule 3: The Primary Key value cannot be NULL.
Rule 4: The Primary Key value should not be changed.
Rule 5: The Primary Key column should not be changed.
Rule 6: A Primary Key may be any number of columns.

Rule 1: A Primary Key is Required

In the logical model, each table requires a Primary Key because that is how each row is
able to be uniquely identified. Each table must have one, and only one, Primary Key. In
any given row, the value of the Primary Key uniquely identifies the row. The Primary
Key may span more than one column, but even then, there is only one Primary Key.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 12 of 137

Rule 2: Unique PK

Within the column(s) designated as the Primary Key, the values in each row must be
unique. No duplicate values are allowed. The Primary Key's purpose is to uniquely
identify a row. In a multi-column Primary Key, the combined value of the columns must
be unique, even if an individual column in the Primary Key has duplicate values.

Rule 3: PK Cannot Be NULL

Within the Primary Key column, each row must have a Primary Key value and cannot be
NULL (without a value). Because NULL is indeterminate, it cannot "identify" anything.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 13 of 137

Rule 4: PK Value Should Not Change

Primary Key values should not be changed. If you changed a Primary Key, you would
lose all historical tracking of that row.

Rule 5: PK Column Should Not Change

Additionally, the column(s) designated as the Primary Key should not be changed. If you
changed a Primary Key, you would lose all the information relating that table to other
tables.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 14 of 137

Rule 6: No Column Limit

In the relational model, there is no limit to the number of columns that can be designated
as the Primary Key, so it may consist of one or more columns. In the example below, the
Primary Key consists of three columns: EMPLOYEE NUMBER, LAST NAME, and FIRST
NAME.

Foreign Key

A Foreign Key (FK) is an identifier that links related tables. A Foreign Key defines how
two tables are related to each other. Each Foreign Key references a matching Primary
Key in another table in the database. For example, in the table below, the Department
Number column that is a Foreign Key actually exists in another table as a Primary Key.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 15 of 137

Having tables related to each other gives users the flexibility to look at the data in
different ways, without the database administrator having to manage and maintain many
tables of duplicate data for different applications.

Foreign Key Rules

Rules governing how Foreign Keys must be defined and how they operate are:

Rule 1: Foreign Keys are optional.


Rule 2: A Foreign Key value may be non-unique.
Rule 3: The Foreign Key value may be NULL.
Rule 4: The Foreign Key value may be changed.
Rule 5: A Foreign Key may be any number of columns.
Rule 6: Each Foreign Key must exist as a Primary Key in a related table.

Rule 1: Optional FKs

Foreign Keys are optional; not all tables have them. Tables that do have them can have
multiple Foreign Keys because a table can relate to many other tables. In fact, a table
can have an unlimited number of foreign keys. In the example table below:

 The Department Number Foreign Key relates to the Department Number Primary

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 16 of 137

Key in the Department table.


 The Job Code FK relates to the Job Code PK in the Job Code table.

Having tables related to each other makes a relational database flexible so that different
users can look up information they need, while simplifying the database administration
so the data doesn't have to be duplicated for each purpose or application.

Rule 2: Unique or Non-Unique FKs

Duplicate Foreign Key values are allowed. More than one employee could be assigned
to the same department.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 17 of 137

Rule 3: FKs Can Be NULL

NULL (missing) Foreign Key values are allowed. For example, under special
circumstances, an employee might not be assigned to a department.

Rule 4: FK Value Can Change

Foreign Key values may be changed. For example, if Arnando Villegas moves from
Department 403 to Department 587, the Foreign Key value in his row would change.

Rule 5: FK Has No Column Limit

The Foreign Key may consist of one or more columns. A multi-column foreign key is
used to relate to a multi-column Primary Key in a related table. In the relational model,
there is no limit to the number of columns that can be designated as a Foreign Key.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 18 of 137

Rule 6: FK Must Be PK in Related Table

Each Foreign Key must exist as a Primary Key in a related table. A department number
that does not exist in the Department Table would be invalid as a Foreign Key value in
the Employee Table.

This rule can apply even if the Foreign Key is NULL, or missing. Remember, a missing
value is defined as a non-value; there is no value present. So the rule could be better
stated: if a value exists in the Foreign Key column, it must match a Primary Key value in
the related table.

Just for Fun . . .

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 19 of 137

To check your understanding of Primary Keys and Foreign Keys, complete this
sentence. According to the relational model, a single table can have either: (Choose
two.)

c
d
e
f
g A. Multiple primary keys.
c
d
e
f
g B. Multiple foreign keys.
c C. No primary keys.
d
e
f
g
c D. No foreign keys.
d
e
f
g

Feedback:
Check Answer Show Answer

Exercise 1.1

Choose the best answer from the pull-down menu:

A contains "like data."


A can contain only one row format.
A is one instance of all columns in a table.

Feedback:
Show Answers Reset

To review these topics, click Rows or Columns.

Exercise 1.2

Which statement is true?

j
k
l
m
n A. A database is a two-dimensional array of rows and columns.
j
k
l
m
n B. A Primary Key must contain one, and only one, column.
j C. Foreign Keys have no relationship to existing Primary Key selections.
k
l
m
n
j D. Teradata is an ideal foundation for customer relationship management, e-commerce, and
k
l
m
n
active data warehousing applications.

Feedback:

To review these topics, click How is the Teradata Database Used?, What is a Relational
Database?, Primary Key, or Foreign Key.

Exercise 1.3

Create a relationship between the two tables by clicking on:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 20 of 137

 The Foreign Key column in the Product table


 The Primary Key column in the Vendor table

Feedback:

To review these topics, click Foreign Key or Primary Key.

Exercise 1.4

Click on the name of the customer who placed order 7324.

Feedback:

To review this topic, click Primary Key or Foreign Key.

Exercise 1.5

How many calendars were shipped on 4/15? (These same tables were used in the previous

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 21 of 137

exercise.)

j
k
l
m
n A. 10
j
k
l
m
n B. 2
j C. 40
k
l
m
n
j D. 30
k
l
m
n
Feedback:

To review this topic, click Primary Key or Foreign Key.

Exercise 1.6

Which one is NOT a unique feature of the Teradata Database?

j
k
l
m
n A. Ability to model the business, with data organized according to what it represents.
j B. Provides a mature, parallel-aware Optimizer that chooses the least expensive plan for the
k
l
m
n
SQL request.
j C. Provides linear scalability, so there is no performance degradation as you grow the system.
k
l
m
n
j D. Gives each department in the enterprise a self-contained, functional data store for their own
k
l
m
n
assumptions and analysis.
j E. Provides automatic and even data distribution for faster query processing via its
k
l
m
n
unconditional parallel architecture.

Feedback:

To review these topics, click Single Data Store, Scalability, Unconditional Parallelism, Ability to
Model the Business, and Mature, Parallel-Aware Optimizer.

Exercise 1.7

True or False: The logical model should be independent of

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 22 of 137

usage.

j
k
l
m
n A. True
j
k
l
m
n B. False

Feedback:

To review this topic, click Logical/Relational Modeling

Mod 2 - Teradata Database and Data Warehouse Architecture

Objectives

After completing this module, you should be able to:

 Identify the different types of enterprise data processing.


 Define a data warehouse, active data warehouse, and a data mart.
 List and define the different types of data marts.
 Explain the advantages of detail data over summary data.
 Describe the overall Teradata Database parallel architecture.
 List and describe major Teradata Database hardware and software components
and their functions.
 Explain how the architecture helps to maintain high availability and reliability for
Teradata Database users.

HOT TIP: This module contains links to important supplemental course


information. Please be sure to click on each hotword link to capture all of the
training content.

Evolution to Active Data Warehousing

Data Warehouse Usage Evolution

There is an information evolution happening in the data warehouse environment today.


Changing business requirements have placed demands on data warehousing technology
to do more things faster. Data warehouses have moved from back room strategic decision
support systems to operational, business-critical components of the enterprise. As your
company evolves in its use of the data warehouse, what you need from the data
warehouse evolves too.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 23 of 137

Stage 1 Reporting: The initial stage typically focuses on reporting from a single view of
the business to drive decision-making across functional and/or product boundaries.
Questions are usually known in advance, such as a weekly sales report.

Stage 2 Analyzing: Focuses on why something happened, such as why sales went down
or discovering patterns in customer buying habits. Users perform ad-hoc analysis, slicing
and dicing the data at a detail level, and questions are not known in advance.

Stage 3 Predicting: Analysts utilize the system to leverage information to predict what
will happen next in the business to proactively manage the organization's strategy. This
stage requires data mining tools and building predictive models using historical detail. As
an example, users can model customer demographics for target marketing.

Stage 4 Operationalizing: Providing access to information for immediate decision-


making in the field enters the realm of active data warehousing. Stages 1 to 3 focus on
strategic decision-making within an organization. Stage 4 focuses on tactical decision
support. Tactical decision support is not focused on developing corporate strategy, but
rather on supporting the people in the field who execute it.

Examples:

 Inventory management with just-in-time replenishment.


 Scheduling and routing for package delivery.
 Altering a campaign based on current results.

Stage 5 Active Data Warehousing: The larger the role an ADW plays in the operational
aspects of decision support, the more incentive the business has to automate the decision
processes. You can automate decision-making when a customer interacts with a web site.
Interactive customer relationship management (CRM) on a web site or at an ATM is about
making decisions to optimize the customer relationship through individualized product
offers, pricing, content delivery and so on. As technology evolves, more and more
decisions become executed with event-driven triggers to initiate fully automated decision
processes.

Example: determine the best offer for a specific customer based on a real-time event,
such as a significant ATM deposit.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 24 of 137

Active Enterprise Intelligence

Active Enterprise Intelligence is the seamless integration of the ADW into the customer’s
existing business and technical architectures.

Active Enterprise Intelligence (AEI) is a business strategy for providing strategic and
operational intelligence to back office and front line users from a single enterprise data
warehouse.

The Active Enterprise Intelligence environment:

 Active - Is responsive, agile, and capable of driving better, faster decisions that
drive intelligent, and often immediate, actions.
 Enterprise - Provides a single view of the business, across appropriate business
functions, and enables new operational users, processes, and applications.
 Intelligence - Supports traditional strategic users and new operational users of the
Enterprise Data Warehouse. Most importantly, it enables the linkage and alignment
of operational systems, business processes and people with corporate goals so
companies may execute on their strategies.

The technology that enables that business value is the Teradata Active Data Warehouse
(ADW). The Teradata ADW is a combination of products, features, services, and
business partnerships that support the Active Enterprise Intelligence business strategy.
ADW is an extension of our existing Enterprise Data Warehouse (EDW).

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 25 of 137

Active Data Warehouse

Data warehouses are beginning to take on mission-critical roles supporting CRM, one-
to-one marketing, and minute-to-minute decision-making. Data warehousing
requirements have evolved to demand a decision capability that is not just oriented
toward corporate staff and upper management, but actionable on a day-to-day basis.
Decisions such as when to replenish Barbie dolls at a particular retail outlet may not be
strategic at the level of customer segmentation or long-term pricing strategies, but when
executed properly, they make a big difference to the bottom line. We refer to this
capability as "tactical" decision support.

Tactical decisions are the drivers for day-to-day management of the business.
Businesses today want more than just strategic insight from their data warehouse
implementations - they want better execution in running the business through more
effective use of information for the decisions that get made thousands of times per day.

The origin of the active data warehouse is the timely, integrated store of detail data
available for analytic business decision-making. It is only from that source that the
additional traits needed by the active data warehouse can evolve. These new "active"
traits are supplemental to data warehouse functionality. For example, the work mix in the
database still includes complex decision support queries, but expands to take on short,
tactical queries, background data feeds, and possibly event-driven updates all at the
same time. Data volumes and user concurrency levels may explode upward beyond
expectation. Restraints may need to be placed on the longer, analytical queries in order
to guarantee tactical work throughput. While accessing the detail data directly remains
an important opportunity for analytical work, tactical work may thrive on shortcuts and
summaries, such as operational data store (ODS) level information. And for both
strategic and tactical decisions to be useful to the business, today's data, this hour's
data, even this minute's data has to be available.

The Teradata Database is positioned exceptionally well for stepping up to the challenges
related to high availability, large multi-user workloads, and handling complex queries
that are required for an active data warehouse implementation. The Teradata Database
technology supports evolving business requirements by providing high performance
and scalability for:

 Mixed workloads (both tactical and strategic queries) for mission critical
applications
 Large amounts of detail data
 Concurrent users

The Teradata Database provides 7x24 availability and reliability, as well as continuous
updating of information so data is always fresh and accurate.

Evolution of Data Processing

Traditionally, data processing has been divided into two categories: on-line transaction
processing (OLTP) and decision support systems (DSS). For either, requests are
handled as transactions. A transaction is a logical unit of work, such as a request to
update an account.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 26 of 137

An RDBMS is used in the following main processing environments:

 DSS
 OLTP
 OLAP
 Data Mining

Decision Support Systems (DSS)


In a decision support environment, users submit requests to analyze historical detail
data stored in the tables. The results are used to establish strategies, reveal trends, and
make projections. A database used as a decision support system (DSS) usually receives
fewer, very complex, ad-hoc queries and may involve numerous tables. Decision support
systems include batch reports, which roll-up numbers to give business the big picture,
and over time, have evolved. Instead of routine, pre-written scripts users now require the
ability to perform ad hoc queries (i.e. perform queries as the need arises), analysis, and
predictive what-if type queries that are often complex and unpredictable in their
processing. These types of questions are essential for long range, strategic planning.
DSS systems often process huge volumes of detail data.

On-line Transaction Processing (OLTP)


Unlike the DSS environment, an on-line transaction processing (OLTP) environment
typically has users accessing current data to update, insert, and delete rows in the data
tables. OLTP is typified by a small number of rows (or records) or a few of many possible
tables being accessed in a matter of seconds or less. Very little I/O processing is
required to complete the transaction. This type of transaction takes place when we take
out money at an ATM. Once our card is validated, a debit transaction takes place
against our current balance to reflect the amount of cash withdrawn. This type of
transaction also takes place when we deposit money into a checking account and the
balance gets updated. We expect these transactions to be performed quickly. They must
occur in real time.

On-line Analytical Processing (OLAP)


OLAP is a modern form of analytic processing within a DSS environment. OLAP tools
from companies like Microstrategy and Cognos provide an easy to use Graphical User
Interface to allow “slice and dice” analysis along multiple dimensions (for example,
products, locations, sales teams, inventories, etc.). With OLAP, the user may be looking
for historical trends, sales rankings or seasonal inventory fluctuations for the entire
corporation. Usually, this involves a lot of detail data to be retrieved, processed and
analyzed. Therefore, response time can be in seconds or minutes.

Data Mining
Data Mining (predictive modeling) involves analyzing moderate to large amounts of

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 27 of 137

detailed historical data to detect behavioral patterns (for example, buying, attrition, or
fraud patterns), that are then used to predict future behavior. There are two phases to
data mining. Phase 1: An “analytic model” is built from historical data incorporating the
detected behavior patterns (takes minutes to hours). Phase 2: The model is then applied
against current detail data of customers (that is, customers are scored), to predict likely
outcomes (takes seconds or less). Scores can indicate a customer's likelihood of
purchasing a product, switching to a competitor, or being fraudulent.

Advantages of Using Detail Data

Until recently, most business decisions were based on summary data. The problem is
that summarized data is not as useful as detail data and cannot answer some questions
with accuracy. With summarized data, peaks and valleys are leveled when the peaks fall
at the end of a reporting period and are cut in half.

Here's another example. Think of your monthly bank statement that records checking
account activity. If it only told you the total amount of deposits and withdrawals, would
you be able to tell if a certain check had cleared? To answer that question you need a
list of every check received by your bank. You need detail data.

Decision support -- answering business questions -- is the real purpose of databases. To


answer business questions, decision-makers must have four things:

 The right data


 Enough detail data
 Proper data structure
 Enough computer power to access and produce reports on the data

Consider your own business and how it uses data. Is that data detailed or summarized?

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 28 of 137

If it's summarized, are there questions it cannot answer?

Check Your Understanding

Which type of data processing supports answering this type of question, "How many
women's dresses did our store sell in December of last year?"

j
k
l
m
n A. OLTP
j B. Data Mining
k
l
m
n
j C. OLAP
k
l
m
n
j D. DSS
k
l
m
n

Feedback:

Row vs. Set Processing

Both cursor and set processing define set(s) of rows of the data to process; but, while a
cursor processes the rows sequentially, set processing takes its sets at once. Both can
be processed with a single command.

Row-by-Row Processing

Row-by-row processing is where there are many rows to process, one row is fetched at
a time and all calculations are done on it, then it is updated or inserted. Then the next
row is fetched and processed. This is row-by-row processing and it makes for a slow
program.

A benefit of row processing is that there is less lock contention.

Set Processing

A lot of data processing is set processing, which is what relational databases do best.
Instead of processing row-by-row sequentially, you can process relational data set-by-

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 29 of 137

set, without a cursor. For example, to sum all payment rows with 100 or less balances, a
single SQL statement completely processes all rows that meet the condition as a set.
With sufficient rows to process, this can be 10 to 30 or more times faster than row-at-a-
time processing.

Some good uses of SET processing include:

 An update with all AMPs involved


 Single session processing which takes advantage of parallel processing
 Efficient updates of large amounts of data

Response Time vs. Throughput

When determining how fast something is, there are two kinds of measures. You can
measure how long it takes to do something or you can measure how much gets done per
unit time. The former is referred to as response time, access time, transmission time, or
execution time depending on the context. The latter is referred to as throughput.

Response Time

This speed measure is specified by an elapsed time from the initiation of some activity
until its completion. The phrase response time is often used in operating systems
contexts.

Throughput

A throughput measure is an amount of something per unit time. For operating systems
throughput is often measured as tasks or transactions per unit time. For storage systems
or networks throughput is measured as bytes or bits per unit time. For processors, the
number of instructions executed per unit time is an important component of performance.

What Does this Mean to Teradata?

Throughput Response Time


measures quantity of queries completed during a measures the average duration of
time interval queries

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 30 of 137

a measure of the amount of work processed a measure of process completion


how many queries were processed how long that processing takes
the number of queries executed in an hour the elapsed time per query

In order to improve both response time and throughput on a Teradata system, you can:

 Increase CPU power, ( i.e., add nodes)


 Implement workload management to control resources
 Decrease the number of concurrent users

The Data Warehouse

A data warehouse is a central, enterprise-wide database that contains information


extracted from the operational systems. A Data Warehouse has a centrally located
logical architecture which minimizes data synchronization and provides a single view of
the business. Data warehouses have become more common in corporations where
enterprise-wide detail data may be used in on-line analytical processing to make
strategic and tactical business decisions. Warehouses often carry many years worth of
detail data so that historical trends may be analyzed using the full power of the data.

Many data warehouses get their data directly from operational systems so that the data
is timely and accurate. While data warehouses may begin somewhat small in scope and
purpose, they often grow quite large as their utility becomes more fully exploited by the
enterprise.

Data Warehousing is a process, not a product. It is a technique to properly assemble


and manage data from various sources to answer business questions not previously
possible or known.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 31 of 137

Data Marts

A data mart is a special purpose subset of enterprise data used by a particular


department, function or application. Data marts may have both summary and detail
data for a particular use rather than for general use. Usually the data has been pre-
aggregated or transformed in some way to better handle the particular type of requests
of a specific user community.

Independent Data Marts


Independent data marts are created directly from operational systems, just as is a data
warehouse. In the data mart, the data is usually transformed as part of the load process.
Data might be aggregated, dimensionalized or summarized historically, as the
requirements of the data mart dictate.

Logical Data Marts


Logical data marts are not separate physical structures or a data load from a data
warehouse, but rather are an existing part of the data warehouse. Because in theory the
data warehouse contains the detail data of the entire enterprise, a logical view of the
warehouse might provide the specific information for a given user community, much as a
physical data mart would. Without the proper technology, a logical data mart can be a
slow and frustrating experience for end users. With the proper technology, it removes the
need for massive data loading and transforming, making a single data store available for
all user needs.

Dependent Data Marts


Dependent data marts are created from the detail data in the data warehouse. While
having many of the advantages of the logical data mart, this approach still requires the
movement and transformation of data but may provide a better vehicle for performance-
critical user queries.

Data Models - Enterprise vs. Application


To build an EDW, an enterprise data model should be leveraged. An enterprise data
model serves as a neutral data model that is normalized to address all business areas
and not specific to any function or group, whereas an application model is built for a
specific business area. The application data model only looks at one aspect of the
business whereas an enterprise logical data model integrates all aspects of the
business.

In addition, an enterprise data model is more extensible than an application data model.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 32 of 137

It is intended to encompass the entire enterprise.

Data Mart Pros and Cons

Independent Data Marts


Independent data marts are usually the easiest and fastest to implement and their
payback value can be almost immediate. Some corporations start with several data marts
before deciding to build a true data warehouse. This approach has several inherent
problems:

 While data marts have obvious value, they are not a true enterprise-wide solution
and can become very costly over time as more and more are added.
 A major problem with proliferating data marts is that, depending on where you look
for answers, there is often inconsistency.
 They may not provide the historical depth of a true data warehouse.
 Because data marts are designed to handle specific types of queries from a specific
type of user, they are often not good at ad hoc, or "what if" queries like a data
warehouse is.

Logical Data Marts


Logical data marts overcome most of the limitations of independent data marts. They
provide a single view of the business. There is no historical limit to the data and "what if"
querying is entirely feasible. The major drawback to logical data marts is the lack of
physical control over the data. Because data in the warehouse is not pre-aggregated or
dimensionalized, performance against the logical mart may not be as good as against an
independent mart. However, use of parallelism in the logical mart can overcome some of
the limitations of the non-transformed data.

Dependent Data Marts


Dependent data marts provide all advantages of a logical mart and also allow for physical
control of the data as it is extracted from the data warehouse. Because dependent marts
use the warehouse as their foundation, they are generally considered a better solution
than independent marts, but they take longer and are more expensive to implement.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 33 of 137

A Teradata Database System

A Teradata Database system contains one or more nodes. A node is a term for a
processing unit under the control of a single operating system. The node is where the
processing occurs for the Teradata Database. There are two types of Teradata
Database systems:

 Symmetric multiprocessing (SMP) - An SMP Teradata Database has a single node


that contains multiple CPUs sharing a memory pool.
 Massively parallel processing (MPP) - Multiple SMP nodes working together
comprise a larger, MPP implementation of a Teradata Database. The nodes are
connected using the BYNET, which allows multiple virtual processors on multiple
nodes to communicate with each other.

To manage a Teradata Database system, you use:

 SMP system: System Console (keyboard and monitor) attached directly to the

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 34 of 137

SMP node
 MPP system: Administration Workstation (AWS)

To access a Teradata Database system, a user typically logs on through one of multiple
client platforms (channel-attached mainframes or network-attached workstations). Client
access is discussed in the next module.

Node Components

A node is the basic building block of a Teradata Database system, and contains a large
number of hardware and software components. A conceptual diagram of a node and its
major components is shown below. Hardware components are shown on the left side of
the node and software components are shown on the right side.

For a description, click on each component.

Shared Nothing Architecture

The Teradata Database virtual processors, or vprocs (which are the PEs and AMPs),
share the components of the nodes (memory and cpu). The main component of the
"shared-nothing" architecture is that each AMP manages its own dedicated portion of the
system's disk space (called the vdisk) and this space is not shared with other AMPs.
Each AMP uses system resources independently of the other AMPs so they can all work
in parallel for high system performance overall.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 35 of 137

Check Your Understanding

Which of the following statements is true?

j
k
l
m
n PDE is an application that runs on the Teradata Database software.
j AMPs manage system disks on the node.
k
l
m
n
j The host channel adapter card connects to "bus and tag" cables through a Teradata
k
l
m
n
Gateway.
j An Ethernet card is a hardware component used in the connection between a
k
l
m
n
network-attached client and the node.

Feedback:

Teradata Virtual Storage

What is Teradata Virtual Storage?

Teradata Virtual Storage, introduced with Teradata 13.00, is a change to the way in
which Teradata accesses storage. The purpose is to manage a multi-temperature
warehouse. Teradata Virtual Storage pools all of the cylinders within a clique's disk
space and allocates cylinders from this storage pool to individual AMPs. You can add
storage to the clique-storage-pool versus to every AMP which allows sharing of storage
devices among AMPs. It will allow you to store data that is accessed more frequently
("hot data") on faster devices and data that is accessed less frequently ("cold data") on
slower devices and it can automatically migrate the data based on access frequency.

Teradata Virtual Storage is designed to allow the Teradata Database to make use of
new storage technologies such as adding fast Solid State Disks (SSDs) to an existing
system with a different disk technology/speed/capacity. Teradata Virtual Storage
enables the mixing of drive sizes, speeds, and technologies so you can "mix" storage
devices. Since storage is pooled and shared by the AMPs, adding drives does not
require adding AMPs.

Teradata Virtual Storage is responsible for:

 Pooling clique storage and allocating cylinders from the storage pool to individual
AMPs
 Tracking where data is stored on the physical media
 Maintaining statistics on the frequency of data access and on the performance of
physical storage media
 Migrating frequently used data (“hot data”) to fast disks and data used less
frequently (“cold data”) to slower disks.

Benefits and Key Concepts

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 36 of 137

Teradata Virtual Storage provides the following benefits:

 Storage Optimization, Data Migration, and Data Evacuation

Teradata Virtual Storage maintains statistics on frequency of data access


(“data temperature”) and on the performance (“grade”) of physical media.
This allows the Teradata Virtual Storage product to intelligently place more
frequently accessed data on faster physical storage. As data access patterns
change, Teradata Virtual Storage can move (“migrate”) storage cylinders to
faster or slower physical media within each clique. This can improve system
performance over time.

Teradata Virtual Storage can migrate data away from a physical storage
device in order to prepare for removal or replacement of the device. This
process is called “evacuation.” Complete data evacuation requires a system
restart, but Teradata Virtual Storage supports a “soft evacuation” feature that
allows much of the data to be moved while the system remains online. This
can minimize system down time when evacuations are necessary.

 Lower Barriers to System Growth

Device management features of Teradata Virtual Storage provide the ability


to pool storage within each clique. Each storage device (pdisk) can be
shared, if necessary, by all AMPs in the clique. If the number of storage
devices is not a multiple of the number of AMPs in the clique, the extra
storage will be shared. Consequently, storage can be added to the system in
smaller increments, as needs and opportunities arise.

This diagram illustrates the conceptual differences with and without Teradata
Virtual Storage.

Pre-Teradata Virtual Storage After Teradata Virtual Storage

Cylinders were addressed by drive # AMPs don't know the physical location of
and cylinder #. a cylinder and it can change.

All of the cylinders in a clique are


effectively in a pool that is managed by
the TVS vproc.

Cylinders are assigned a unique cylinder


ID (virtual ID) across all of the pdisks.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 37 of 137

With Teradata Virtual Storage you can easily add storage to an existing
system.

Before Teradata Virtual Storage:

 Existing systems have integral number of drives / AMP


 Today adding storage requires an additional drive per AMP – means
50% or 100% increase in capacity

With Teradata Virtual Storage:

 You can add any number of drives.


 Added drives are shared by all AMPs
 These new drives may have different capacities and / or performance
than those drives which already reside in the system.

Using the BYNET

The BYNET (pronounced, "bye-net") is a high-speed interconnect (network) that enables


multiple nodes in the system to communicate. The BYNET handles the internal
communication of the Teradata Database. All communication between PEs and AMPs is
done via the BYNET.

When the PE dispatches the steps for the AMPs to perform, they are dispatched onto
the BYNET. The messages are routed to the appropriate AMP(s) where results sets and
status information are generated. This response information is also routed back to the
requesting PE via the BYNET. Depending on the nature of the dispatch request, the
communication between nodes may be to all nodes (Broadcast message) or to one
specific node (point-to-point message) in the system.

BYNET Unique Features

The BYNET has several unique features:

 Scalable: As you add more nodes to the system, the overall network bandwidth
scales linearly. This linear scalability means you can increase system size without
performance penalty -- and sometimes even increase performance.

 High performance: An MPP system typically has two BYNET networks (BYNET 0
and BYNET 1). Because both networks in a system are active, the system benefits
from having full use of the aggregate bandwidth of both the networks.

 Fault tolerant: Each network has multiple connection paths. If the BYNET detects

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 38 of 137

an unusable path in either network, it will automatically reconfigure that network so


all messages avoid the unusable path. Additionally, in the rare case that BYNET 0
cannot be reconfigured, hardware on BYNET 0 is disabled and messages are re-
routed to BYNET 1.

 Load balanced: Traffic is automatically and dynamically distributed between both


BYNETs.

BYNET Hardware and Software

The BYNET hardware and software handle the communication between the vprocs and
the nodes.

 Hardware: The nodes of an MPP system are connected with the BYNET
hardware, consisting of BYNET boards and cables.

 Software: The BYNET driver (software) is installed on every node. This BYNET
driver is an interface between the PDE software and the BYNET hardware.

SMP systems do not contain BYNET hardware. The PDE and BYNET software
emulate BYNET activity in a single-node environment.

For more information on communication between the vprocs and nodes, click here.
(Note: You do not need to know this information for the certification exam.)

Just for Fun . . .

1. When a message is delivered to a node using BYNET hardware and software, PDE
software on the node has the ability to route the message to which three? (Choose
three.)

c
d
e
f
g A. A single vproc on a node
c
d
e
f
g B. A group of vprocs on a node

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 39 of 137

c
d
e
f
g C. All vprocs on a node
c
d
e
f
g D. All vprocs on all nodes

Feedback:
Check Answer Show Answer

Cliques

A clique (pronounced, "kleek") is a group of nodes that share access to the same disk
arrays. Each multi-node system has at least one clique. The cabling determines which
nodes are in which cliques -- the nodes of a clique are connected to the disk array
controllers of the same disk arrays.

Cliques Provide Resiliency

In the event of a node failure, cliques provide for data access through vproc migration.
When a node resets, the following happens to the AMPs:

1. When the node fails, the Teradata Database restarts across all remaining nodes in
the system.
2. The vprocs (AMPs) from the failed node migrate to the operational nodes in its
clique.
3. The PE vprocs will migrate as follows: LAN attached PEs will migrate to other
nodes in the clique. Channel attached PEs will not migrate. While that node
remains down, that channel connection is not available.
4. Disks managed by the AMP remain available and processing continues while the
failed node is being repaired.

Cliques in a System

Vprocs are distributed across all nodes in the system. Multiple cliques in the system
should have the same number of nodes.

The diagram below shows three cliques. The nodes in each clique are cabled to the
same disk arrays. The overall system is connected by the BYNET. If one node goes

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 40 of 137

down in a clique the vprocs will migrate to the other nodes in the clique, so data remains
available. However, system performance decreases due to the loss of a node. System
performance degradation is proportional to clique size.

Hot Standby Node

A Hot Standby Node (HSN) is a node that is a member of a clique that is not configured
(initially) to execute any Teradata vprocs. If a node in the clique fails, the AMPs from the
failed node move to the hot standby node. The performance degradation is 0%.

When the failed node is recovered/repaired and restarted, it becomes the new hot
standby node. A second restart of Teradata is not needed.

Characteristics of a hot standby node are:

 A node that is a member of a clique.


 Does not normally participate in the trusted parallel application (TPA).
 Can be brought into the configuration when a node fails in the clique.
 Helps with unplanned outages.
 Eliminates the need for a restart to bring a failed node back into service.

Hot Standby Nodes are positioned as a performance continuity feature.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 41 of 137

1. Performance degradation is 0% as AMPs are moved to the Hot Standby


Node.
2. When node 1 is recovered it becomes the new Hot Standby Node.

Software Components

A Teradata Database node requires three distinct pieces of software:

For each node in the system, you need both of the following:

 Operating system license (UNIX, Microsoft Windows, or Linux)


 Teradata Database software license

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 42 of 137

Operating System

The Teradata Database can run on the following operating systems:

 UNIX MP-RAS (Not supported beyond Teradata 13.)


 Microsoft Windows 2000
 SuSE Linux

Parallel Database Extensions (PDE)

The Parallel Database Extensions (PDE) software layer was added to the operating
system to support the parallel software environment. The PDE controls the virtual
processor (vproc) resources.

Trusted Parallel Application (TPA)

A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs).
The Teradata Database is classified as a TPA. The four components of the Teradata
Database TPA are:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 43 of 137

 AMP (Top Right)


 PE (Bottom Right)
 Channel Driver (Top Left)
 Teradata Gateway (Bottom Left)

Teradata Database Software: PE

A Parsing Engine (PE) is a virtual processor (vproc) that manages the dialogue between
a client application and the Teradata Database, once a valid session has been
established. Each PE can support a maximum of 120 sessions. The PE handles an
incoming request in the following manner:

1. The Session Control component verifies the request for session authorization
(user names and passwords), and either allows or disallows the request.

2. The Parser does the following:


 Interprets the SQL statement received from the application.
 Verifies SQL requests for the proper syntax and evaluates them semantically.
 Consults the Data Dictionary to ensure that all objects exist and that the user
has authority to access them.

3. The Optimizer is cost-based and develops the least expensive plan (in terms of
time) to return the requested response set. Processing alternatives are evaluated
and the fastest alternative is chosen. This alternative is converted into executable
steps, to be performed by the AMPs, which are then passed to the Dispatcher.

The Optimizer is "parallel aware," meaning that it has knowledge of the system
components (how many nodes, vprocs, etc.), which enables it to determine the
fastest way to process the query. In order to maximize throughput and minimize
resource contention, the Optimizer must know about system configuration,
available units of parallelism (AMPs and PEs), and data demographics. The
Teradata Database Optimizer is robust and intelligent, and enables the Teradata
Database to handle multiple complex, ad-hoc queries efficiently.

4. The Dispatcher controls the sequence in which the steps are executed and
passes the steps received from the optimizer onto the BYNET for execution by the
AMPs.

5. After the AMPs process the steps, the PE receives their responses over the
BYNET.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 44 of 137

6. The Dispatcher builds a response message and sends the message back to the
user.

To review the PE software, click the buttons (rectangles) on the PE.

Click on the PE buttons.

Teradata Database Software: AMP

The AMP is a vproc in the Teradata Database's shared-nothing architecture that is


responsible for managing a portion of the database. Each AMP will manage some
portion of each table on the system. AMPs do the physical work associated with
generating an answer set (output) including sorting, aggregating, formatting, and
converting. The AMPs retrieve and perform all database management functions on the
required rows from a table.

An AMP accesses data from its single associated vdisk, which is made up of multiple
ranks of disks. An AMP responds to Parser/Optimizer steps transmitted across the
BYNET by selecting data from or storing data to its disks. For some requests, the AMPs
may redistribute a copy of the data to other AMPs.

The Database Manager subsystem resides on each AMP. This subsystem will:

 Lock databases and tables.


 Create, modify, or delete definitions of tables.
 Insert, delete, or modify rows within the tables.
 Retrieve information from definitions and tables.
 Return responses to the Dispatcher.

Earlier in this course, we discussed the logical organization of data into tables. The
Database Manager subsystem provides a bridge between that logical organization and
the physical organization of the data on disks. The Database Manager performs a
space-management function that controls the use and allocation of space.

To review the AMP software, click the buttons (rectangles) on the AMP.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 45 of 137

Click on the AMP buttons.

Teradata Database Software: Channel Driver

Channel Driver software is the means of communication between an application and the
PEs assigned to channel-attached clients. There is one Channel Driver per node.

In the diagram below, the blue dots show the communication from the channel-attached
client, to the host channel adapter in the node, to the Channel Driver software, to the PE,
and back to the client.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 46 of 137

Teradata Database Software: Teradata Gateway

Teradata Gateway software is the means of communication between an application and


the PEs assigned to network-attached clients. There is one Teradata Gateway per node.

In the diagram below, the blue dots show the communication from the network-attached
client, to the Ethernet card in the node, to the Teradata Gateway software, to the PE,
and back to the client.

Teradata Purpose-Built Family Platform

Each platform is purpose built to meet different analytical requirements. They all
leverage the Teradata Database. Customers may easily migrate applications from one
platform to another without having to change data models, ETL, or underlying structures.

Teradata Extreme Data Appliance 1550

The Teradata Extreme Data Appliance 1550 provides for deep strategic intelligence from
extremely large amounts of detailed data. It supports very high-volume, non-enterprise
data/analysis requirements for a small number of power users in specific workgroups or
projects that are outside of the enterprise data warehouse (EDW).

This appliance is based on the field proven Teradata Active Enterprise Data Warehouse
5550 processing nodes and provides the same scalability and data warehouse

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 47 of 137

capabilities as any other Teradata platform.

Teradata Active Enterprise Data Warehouse - 5550 H and 5555 C/H

These models are targeted to the full-scale large data warehouse. They offer expansion
capabilities up to 1024 TPA and non-TPA nodes. The power of the Teradata Database combined
with the throughput, power and performance of both the Intel® Xeon™ quad-core processors and
BYNET V3 technologies offers unsurpassed performance and capacity within the scalable data
warehouse.

Teradata Data Mart Appliance 2500/2550/2555

The Teradata Data Mart Appliance 2500 is a server that is optimized specifically for high
DSS performance. The Teradata Data Mart Appliance 2550 and 2555 have similar
characteristics to the 2500, but are approximately 40% - 45% faster on a per node basis.
These systems are optimized for fast scans and heavy “deep dive” analytics.
Characteristics of the Teradata Data Mart Appliance 2500/2550/2555 include:

 Delivered ready to run


 Integrated system fully staged and tested
 Includes a robust set of tools and utilities
 Rapid time to value with system live within hours
 Competitive price point
 Capacity on demand available if needed
 Easy migration to an EDW/ADW

Exercise 2.1

Select the answers from the options given in the drop-down boxes that correctly complete the
sentences.

causes vprocs to migrate to other nodes.


carries the communication between nodes in a system.
is a group of nodes with access to the same disk arrays.
A copy of is installed on each node in the system.

Feedback:
Show Answers Reset

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 48 of 137

To review these topics, click Cliques Provide Resiliency, Using the BYNET, or Cliques.

Exercise 2.2

Which three statements about the Teradata Database are true? (Choose three.)

c
d
e
f
g A. Runs on a foundation called a TPA.
c B. PDE is a software layer that allows TPAs to run in a parallel software environment.
d
e
f
g
c C. There are two types of virtual processors: AMPs and PEs.
d
e
f
g
c D. Runs on UNIX MP-RAS (discontinued after Teradata 13), Windows 2000, and Linux.
d
e
f
g
Feedback:
Check Answer Show Answer

To review these topics, click Software Components, Parallel Database Extensions (PDE), A
Teradata Database System, or Operating System.

Exercise 2.3

Four of these components are contained in the TPA software. Click each of your choices and
check the Feedback box below each time to see if you are correct.

Feedback:
Show Answers Reset

To review this topic, click Trusted Parallel Application (TPA).

Exercise 2.4

Select AMP, BYNET, or PE in the pull-down menu as the component responsible for the following
tasks:

Carries messages between nodes.


Sorts, aggregates, and formats data in the processing of requests.
Accesses data on its assigned vdisk.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 49 of 137

Chooses the least expensive plan for creating a response set.


Transports responses from the AMPs back to the PEs, facilitating AMP/PE
communication.
Distributes incoming data or retrieves rows being requested to generate an answer set.
Can manage up to 120 sessions.

Feedback:
Show Answers Reset

To review these topics, click Node Components, Communication Between Nodes, Communication
Between Vprocs, Teradata Database Software: PE, and Teradata Database Software: AMP.

Exercise 2.5

From the drop-down box below, select the answer that correctly completes the sentence.

In processing a request, the determines the most efficient plan


for processing the requested response.

Feedback:

To review this topic, click Teradata Database Software: PE.

Exercise 2.6

Select OLAP, OLTP, DSS or Data Mining (DM) in the pull-down menu as the appropriate type of
data processing for the following requests:

Withdraw cash from ATM.


Show the top ten selling items for 1997 across all stores.
How many blue jeans were sold across all of our Eastern stores in the month of March in
child sizes?

Feedback:
Show Answers Reset

To review these topics, click Evolution of Data Processing.

Exercise 2.7

From the drop-down box below, select the answer that correctly completes the sentence.

A(n) may contain detail or summary data and is a special purpose subset
of enterprise data for a particular function or application, rather than for general use.

Feedback:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 50 of 137

To review this topic, click Data Marts.

Exercise 2.8

From the drop-down box below, select the answer that correctly completes the sentence.

A(n) supports the coexistence of tactical and strategic queries.

Feedback:

To review this topic, click Active Data Warehouse.

Exercise 2.9

From the drop-down box below, select the answer that correctly completes the sentence.

enable(s) the mixing of drive sizes, speeds, and technologies so you


can "mix" storage devices.

Feedback:

To review this topic, click Teradata Virtual Storage.

Exercise 2.10

Select Teradata Extreme Data Appliance (e.g. 1550) Teradata Active Enterprise Data Warehouse
(e.g. 5550) or Teradata Data Mart Appliance (e.g. 2550) in the pull-down menu as the appropriate
platform for each description:

A server that is optimized specifically for high DSS


performance such as fast scans and heavy “deep dive” analytics.
Scalable data warehouse targeted to the full-scale
large DW with expansion up to 1024 TPA and non-TPA nodes.
Provides for deep strategic intelligence from
extremely large amounts of detailed data and supports very high-volume, non-enterprise
data/analysis requirements for a small number of power users in specific workgroups or projects
that are outside of the enterprise data warehouse (EDW).

Feedback:
Show Answers Reset

To review these topics, click Teradata Purpose-Built Family Platform.

Exercise 2.11

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 51 of 137

Match the performance term to its definition:

Measures how long it takes to do something.


Measures how much gets done per unit time.

Feedback:
Show Answers Reset

To review these topics, click Response Time vs. Throughput.

Exercise 2.12

True or False: Both cursor row processing and set processing


define set(s) of rows of the data to process and can be
processed with a single command; but, while a cursor
processes the rows sequentially, set processing takes its sets at
once.

j
k
l
m
n A. True
j
k
l
m
n B. False

Feedback:

To review this topic, click Row vs. Set Processing

Mod 3 - Client Access

Objectives

After completing this module, you should be able to:

 Describe how the clients access the Teradata Database.


 Illustrate how the Teradata Database processes a request.
 Describe the Teradata client utilities and their use.

HOT TIP: This module contains links to important supplemental course


information. Please be sure to click on each hotword link to capture all of the
training content.

Client Connections

Users can access data in the Teradata Database through an application on both
channel-attached and network-attached clients. Additionally, the node itself can act as a
client. Teradata client software is installed on each client (channel-attached, network-
attached, or node) and communicates with RDBMS software on the node. You may hear

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 52 of 137

either type of client referred to by the term "host," though this term is not typically used in
documentation or product literature.

The client may be a mainframe system, such as IBM or Amdahl, which is channel-
attached to the Teradata Database, or it may be a PC, UNIX, or Linux-based system that
is LAN-attached.

The client application submits an SQL request to the database, receives the response,
and submits the response to the user. This application could be a business intelligence
(BI) tool or a data integration (DI/ETL/ELT) tool, submitting queries to Teradata or
loading/updating tables in the database.

Channel-Attached Client

Channel-attached clients are IBM-compatible mainframe systems supported by the


Teradata Database. The following software components installed on the mainframe are
responsible for communications between client applications and the Channel Driver on a
Teradata Database node:

 Teradata Director Program (TDP) software to manage session traffic, installed on


the channel-attached client.
 Call-Level Interface (CLI), a library of routines that are the lowest-level interface to
the Teradata Database.

Communication with the Teradata Database System

Communication from client applications on the mainframe goes through the mainframe
channel, to the Host Channel Adapter on the node, to the Channel Driver software.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 53 of 137

Network Attached Client

The Teradata Database supports network-attached clients connected to the node over a
LAN. The following software components installed on the network-attached client are
responsible for communication between client applications and the Teradata Gateway
on a Teradata Database node:

 Open Database Connectivity (ODBC) is an application programming standard


that defines common database access mechanisms to simplify the exchange of
data between a client and server. ODBC-compliant applications connect with a
database through the use of a driver that translates the application's ODBC
commands into database syntax.
 Call-Level Interface, Version2 (CLIv2) is a library of routines that enable an
application program to access data stored in the Teradata Database. When used
with network-attached clients, CLIv2 contains the following components:
 CLI (Call-Level Interface)
 MTDP (Micro Teradata Director Program)
 MOSI (Micro Operating System Interface)
 Java Database Connectivity (JDBC) is an Application Programming Interface
(API) that allows platform independent Java applications to access a DBMS using
Structured Query Language (SQL). JDBC enables the development of web-based
Teradata end user tools that can access Teradata through a web server. JDBC will
also provide support for access to other commercial databases.
 WinCLI is an additional, legacy API to Teradata from a network host.

Communication with the Teradata Database System

Communication from applications on the network-attached client goes over the LAN, to
the Ethernet card on the node, to the Teradata Gateway software.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 54 of 137

On the database side, the Teradata Gateway software and the PE provide the
connection to the Teradata Database. The Teradata Database is configured with two
LAN connections for redundancy. This ensures high availability.

Node

The node is considered a network-attached client. If you install application software on a


node, it will be treated like an application on a network-attached client. In other words,
communications from applications on the node go through the Teradata Gateway. An
application on a node can be executed through:

 System Console that manages an SMP system.


 Remote login, such as over a network-attached client connection.

Just for Fun . . .

As a review, answer this question: Which two can you use to run an application that is
installed on a node? (Choose two.)

c
d
e
f
g A. Mainframe terminal
c B. Bus terminal
d
e
f
g
c C. System console
d
e
f
g
c D. Network-attached workstation
d
e
f
g

Feedback:
Check Answer Show Answer

Request Processing

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 55 of 137

The steps for processing a request like the one above are somewhat different,
depending on whether the user is accessing the Teradata Database through a channel-
attached or network-attached client:

1. SQL request is sent from the client to the appropriate component on the node:
 Channel-attached client: request is sent to Channel Driver (through the TDP).
 Network-attached client: request is sent to Teradata Gateway (through CLIv2
or ODBC).
2. Request is passed to the PE(s).
3. PEs parse the request into AMP steps.
4. PE Dispatcher sends steps to the AMPs over the BYNET.
5. AMPs perform operations on data on the vdisks.
6. Response is sent back to PEs over the BYNET.
7. PE Dispatcher receives response.
8. Response is returned to the client (channel-attached or network-attached).

Mainframe Request Flow

Workstation Request Flow

Teradata Client Utilities

Teradata has a robust suite of client utilities that enable users and system administrators
to enjoy optimal response time and system manageability. Various client utilities are
available for tasks from loading data to managing the system.

Teradata utilities leverage the Teradata Database’s high performance capabilities and
are fully parallel and scalable. The same utilities run on smaller entry-level systems, and
the largest MPP implementations.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 56 of 137

Teradata Database client utilities include the following, described in this section:

 Query Submitting Utilities


 BTEQ
 Teradata SQL Assistant
 Load and Unload Utilities
 FastLoad
 MultiLoad
 TPump
 FastExport
 Teradata Parallel Transporter (TPT)
 Administrative Utilities
 Teradata Manager
 Teradata Dynamic Workload Manager (TDWM)
 Priority Scheduler
 Database Query Log (DBQL)
 Teradata Workload Analyzer
 Performance Monitor (PMON)
 Teradata Active Systems Management (TASM)
 Teradata Analyst Pack
 Archive Utilities
 Archive Recovery Facility (ARC)
 NetVault (third party)
 NetBackup (third party)

Query Submitting Utilities

The Teradata Database provides tools that are front-end interfaces for submitting SQL
queries. Two mentioned in this section are BTEQ and Teradata SQL Assistant.

BTEQ

BTEQ (Basic Teradata Query) -- pronounced “BEE-teek” -- is a Teradata Database tool


used for submitting SQL queries on all platforms. BTEQ provides the following
functionality:

 Standard report writing and formatting


 Basic import and export of small amounts of data to and from the Teradata
Database across all platforms. For tables more than a few thousand rows, the
Teradata Database load utilities are recommended for more efficiency.
 Ability to submit SQL requests in the following ways:
 Interactive
 Batch

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 57 of 137

Teradata SQL Assistant

Teradata SQL Assistant (formerly known as Queryman) is an information


discovery/query tool that runs on Microsoft Windows. Teradata SQL Assistant enables
you to access the Teradata Database as well as other ODBC-compliant databases.
Some of its features include:

 Ability to save data in PC-based formats, such as Microsoft Excel, Microsoft


Access, and text files.
 History of submitted SQL syntax, to help you build scripts for data mining and
knowledge discovery.
 Help with SQL syntax.
 Import and export of small amounts of data to and from ODBC-compliant
databases. For tables more than a few thousand rows, the Teradata Database
load utilities are recommended for more efficiency.

Data Load and Unload Utilities

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 58 of 137

In a data warehouse environment, the database tables are populated from a variety of
sources, such as mainframe applications, operational data marts, or other distributed
systems throughout a company. These systems are the source of data such as daily
transaction files, orders, usage records, ERP (enterprise resource planning) information,
and Internet statistics. Teradata provides a suite of data load and unload utilities
optimized for use with the Teradata Database. They run on any of the supported client
platforms:

 Channel-attached client
 Network-attached client
 Node

Using Teradata Load and Unload Utilities

Teradata load and unload utilities are fully parallel. Because the utilities are scalable,
they accommodate the size of the system. Performance is not limited by the capacity of
the load and unload tools.

The utilities have full restart capability. This feature means that if a load or unload job
should be interrupted for some reason, it can be restarted again from the last
checkpoint, without having to start the job from the beginning.

The load and unload utilities are:

 FastLoad
 MultiLoad
 TPump
 FastExport
 Teradata Parallel Transporter (TPT)

The concurrency limit for utilities is now 60:

 Up to 30 concurrent FastLoad and MultiLoad jobs.


 Up to 60 concurrent FastExport jobs (assuming no FastLoad or MultiLoad jobs).

FastLoad

Use the FastLoad utility to load data into empty tables.

FastLoad loads to a single empty table at a time. FastLoad loads data into an empty
table in parallel, using multiple sessions to transfer blocks of data. FastLoad achieves
high performance by fully exploiting the resources of the system. After the data load is
complete, the table can be made available to users. A typical use is for mini-batch or
frequent batch where you load the data to an empty "staging" table, and then use an
SQL INSERT/SELECT command to move it to an existing table.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 59 of 137

MultiLoad

Use the MultiLoad utility to maintain tables by:

 Inserting rows into a populated or empty table


 Updating rows in a table
 Deleting multiple rows from a table

MultiLoad can load multiple input files concurrently and work on up to five tables at a
time, using multiple sessions. MultiLoad is optimized to apply multiple rows in block-
level operations. MultiLoad usually is run during a batch window, and places a lock on
the destination table(s) to prevent user queries from getting inconsistent results before
the data load or update is complete. Access locks may be used to query tables being
maintained with MultiLoad.

TPump

Use TPump to:

 Continuously load, update, or delete data in tables


 Update lower volumes of data using fewer system resources than other load
utilities
 Vary the resource consumption and speed of the data loading activity over time

TPump performs the same operations as MultiLoad. TPump updates a row at a time and
uses row hash locks, which eliminates the need for table locks and "batch windows"
typical with MultiLoad. Users can continue to run queries during TPump data loads. In
addition, TPump maintains up to 60 tables at a time.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 60 of 137

TPump has a dynamic throttle that operators can set to specify the percentage of system
resources to be used for an operation. This enables operators to set when TPump
should run at full capacity during low system usage, or within limits when TPump may
affect other business users of the Teradata Database.

FastExport

Use the FastExport utility to export data from one or more tables or views on the
Teradata Database to a client-based application.

You can export data from any table or view on which you have the SELECT access
rights. The destination for the exported data can be a:

 Host file: A file on your channel-attached or network-attached client system


 User-written application: An Output Modification (OUTMOD) routine you write to
select, validate, and preprocess the exported data.

FastExport is a data extract utility. It transfers large amounts of data using block
transfers over multiple sessions and writes the data to a host file on the network-
attached or channel-attached client. Typically, FastExport is run during a batch window,
and the tables being exported are locked.

Teradata Parallel Transporter

Teradata Parallel Transporter is a load/update/export tool that enables data extraction,


transformation and loading processes common to all data warehouses.

Using built-in operators, Teradata Parallel Transporter combines the functionality of the
Teradata utilities (FastLoad, MultiLoad, FastExport, and TPump) in a single parallel
environment. Its extensible environment supports FastLoad INMODs, FastExport

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 61 of 137

OUTMODs, and Access Modules to provide access to all the data sources you use
today. There is a set of open APIs (Application Programmer Interface) to add third party
or custom data transformation to Teradata Parallel Transporter scripts. Using multiple,
parallel tasks, a single Teradata Parallel Transporter script can load data from disparate
sources into the Teradata Database in the same job.

Teradata Parallel Transporter provides a single, SQL-like scripting language, as well as


a GUI to make scripting faster and easier. You can do the extract, some transformation,
and loads all in one SQL-like scripting language.

A single Teradata Parallel Transporter job can load data from multiple disparate
sources into the Teradata Database, as indicated by the green arrow.

Teradata Parallel Transporter Operators

The operators are components that "plug" into the Teradata Parallel Transporter
infrastructure and actually perform the functions.

 The FastLoad INMOD and FastExport OUTMOD operators support the current
FastLoad and FastExport INMOD/OUTMOD features.
 The Data Connector operator is an adapter for the Access Module or non-Teradata
files.
 The SQL Select and Insert operators submit the Teradata SELECT and INSERT
commands.
 The Load, Update, Export and Stream operators are similar to the current
FastLoad, MultiLoad, FastExport and TPump utilities, but built for the Teradata PT
parallel environment.

The INMOD and OUTMOD adapters, Data Connector operator, and the SQL
Select/Insert operators are included when you purchase the Infrastructure. The Load,
Update, Export and Stream operators are purchased separately.

To simplify these new concepts, let's compare the Teradata Parallel Transporter
Operators with the classic utilities that we just covered.

Teradata Parallel Transporter (TPT) Operator Teradata Utility Description

Teradata
TPT Operator Description
Utility

A consumer-type operator that uses


LOAD FastLoad the Teradata FastLoad protocol.
Supports Error limits and
Checkpoint/ Restart. Both support

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 62 of 137

Multi-Value Compression and PPI.


Utilizes the Teradata MultiLoad
protocol to enable job based table
UPDATE MultiLoad updates. This allows highly scalable
and parallel inserts and updates to
an existing table.
A producer operator that emulates
EXPORT FastExport
the FastExport utility
Uses multiple sessions to perform
STREAM TPump
DML transactions in near real-time.
This operator emulates the Data
Connector API. Reads external data
DataConnector N/A files, writes data to external data
files, reads an unspecified number
of data files.
Reads data from an ODBC
ODBC N/A
Provider.

Administrative Utilities

Administrative utilities use a graphical user interface (GUI) to monitor and manage
various aspects of a Teradata Database system.

The administrative utilities are:

 Workload Management:
 Teradata Manager
 Teradata Dynamic Workload Manager (TDWM)
 Priority Scheduler
 Database Query Log (DBQL)
 Teradata Workload Analyzer
 Performance Monitor
 Teradata Active Systems Management (TASM)
 Teradata Analyst Pack

Workload Management

Workload Management in Teradata is used to control system resource allocation to the


various workloads on the system. Some of the components that make up Teradata’s
Workload Management capability are:

Teradata Manager

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 63 of 137

Teradata Manager is a production and performance monitoring system that helps a DBA
or system manager monitor, control, and administer one or more Teradata Database
systems through a GUI. Running on LAN-attached clients, Teradata Manager has a
variety of tools and applications to gather, manipulate, and analyze information about
each Teradata Database being administered.

For examples of Teradata Manager functions, click here: Teradata Manager Examples

Teradata Dynamic Workload Manager (TDWM)


Teradata Dynamic Workload Manager (also known as Teradata DWM or TDWM) is a
query workload management tool that can restrict (run, suspend, schedule later,
or reject) queries based on current workload and set thresholds. TDWM provides a
graphical user interface (GUI) for creating rules that manage database access, increase
database efficiency, and enhance workload capacity. Via the rules created through
TDWM, queries can be rejected, throttled, or executed when they are submitted to the
Teradata Database.

For example, with TDWM a request can be scheduled to run periodically or during a
specified time period. Results can be retrieved any time after the request has been
submitted by TDWM and executed.

TDWM can restrict queries based on factors such as:

 Analysis control thresholds - TDWM can restrict requests that will exceed a
certain processing time, or whose expected result set size exceeds a specified
number of rows.
 Object control thresholds - TDWM can limit access to and use of static criteria
such as database objects and other items. Object controls can control workload
requests based on user IDs, tables, views, date, time, macros, databases, and
group IDs.
 Environmental factors -TDWM can manage requests based on dynamic
environment factors, including database system CPU and disk utilization, network
activity, and number of users.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 64 of 137

Teradata Dynamic Workload Manager is a key supporting product component for


Teradata Active Systems Manager (TASM), a new concept as of Teradata V2r6.1,
described in another sub-topic below. The major functions performed by the DBA are to:

 Define Filters and Throttles.


 Define Workloads (new) and their operating periods, goals and Priority Scheduler
facility (PSF) mapping/weights.
 Define general TASM controls - TASM automates the allocation of resources to
workloads to assist the DBA or application developer with system performance
management.

TDWM allows the Database Administrator to provide operational control of and to


effectively manage and regulate access to the Teradata Database.

The database administrator can use the following capabilities of TDWM to manage work
submitted to the database in order to maximize system resource utilization:

 Query Management
 Scheduled Requests

With Query Management, database query requests are intercepted within the Teradata
Database, their components are compared against criteria that are defined by the
administrator, and requests that fail to meet the criteria are restricted: either run,
suspended, scheduled later, or rejected.

With Scheduled Requests, clients can submit SQL requests to be executed at scheduled
off-peak times.

Priority Scheduler
Priority Scheduler is a resource management tool that is used to assign resources and
controls how computer resources, (e.g., CPU), are allocated to different users in a
Teradata system. This resource management function is based on scheduler parameters
that satisfy site-specific requirements and system parameters that depict the current
activity level of the Teradata Database system. You can provide Priority Scheduler
parameters to directly define a strategy for controlling resources.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 65 of 137

Database Query Log (DBQL)


The Database Query Log (DBQL) logs query processing activity for later analysis. Query
counts and response times can be charted and SQL text and processing steps can be
compared to fine-tune applications for optimum performance.

Teradata Workload Analyzer


Teradata Workload Analyzer recommends candidate workloads for analysis. In addition,
it provides the following capabilities:


Identifies classes of queries and candidate workloads for analysis and
recommends workload definitions and operating rules.
 Recommends workload allocation group mappings and Priority Scheduler facility
(PSF) weights.
 Provides the ability to migrate existing Priority Schedule Definitions (PD Sets) into
new workloads.
 Provides recommendations for appropriate workload Service Level Goals (SLGs).
 Establishes workload definitions from query history or directly.
 Can be used “iteratively” to analyze and understand how well existing workload
definitions are working and modify them if necessary.

Workload Analyzer creates a Workload Rule set, ( i.e., Workload


Definitions and recommended Service Level Goals) by either using:

1. Statistics from DBQL data


2. Migrated current Priority Scheduler settings

Teradata Workload Analyzer can also apply best practice standards to workload
definitions such as assistance in Service Level Goal (SLG) definition and priority
scheduler setting recommendations.

In addition, Teradata Workload Analyzer supports the conversion of existing Priority


Scheduler Definitions (PD Sets) into new workloads.

Performance Monitor
The Performance Monitor (formerly called PMON) collects near real-time system
configuration, resource usage, and session information from the Teradata Database
either directly or through Teradata Manager. Performance Monitor formats and displays
this information as requested: Performance Monitor allows you to analyze current
performance and both current and historical session information, and to abort sessions
that are causing system problems.

Application flow control:


Teradata Resource control prior
Dynamic to execution. Control
 Pre-Execution
Workload what and how much is
Manager allowed to begin
execution.

 Resource
Manage the level of
Management
Priority resources allocated to
 Query
Scheduler different priorities of
Executes
executing work.
 Resource
control during

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 66 of 137

execution
Allows DBA or user to
Performance  During Query
examine the active
Monitor Execution
workload.
Analyze query
 Application
Database performance and
Query Post-
Query Log behavior after
Execution
completion.

Teradata Active Systems Management (TASM)

Teradata Active System Management is made up of several products/tools that assist


the DBA or application developer in defining and refining the rules that control the
allocation of resources to workloads running on a system. These rules include
filters, throttles, and “workload definitions”. Workload definitions are rules to control the
allocation of resources to workloads and are new with Teradata V2R6.1.

Tools are also provided to monitor workloads in real time and to produce historical
reports of resource utilization by workloads. By analyzing this information, the workload
definitions can be adjusted to improve the allocation of system resources.

TASM is primarily comprised of three products that are used to create and manage
“workload definitions”:

 Teradata Dynamic Workload Manager (TDWM) - (enhanced with TASM)


 Teradata Manager - which reviews historical workloads - (enhanced with TASM)
 Teradata Workload Analyzer (TWA) – which recommends candidate workloads for
analysis - (new with TASM)

Teradata Active Systems Management (TASM), allows you to perform the following:

 Limit user concurrency s


 Monitor Service Level Goals (SLGs) on a system
 Optimize mixed workloads
 Reject queries based on table access
 Prioritize workloads
 Provide more consistent response times and influence response times
 React to hardware failures
 Block access on a table to a user, and
 Determine the workload on a system.

Teradata Analyst Pack

Teradata Analyst Pack is a suite of the following products.

Teradata Visual Explain


Teradata Visual Explain makes query plan analysis easier by providing the ability to

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 67 of 137

capture and graphically represent the steps of the plan and perform comparisons
of two or more plans. It is intended for application developers, database administrators
and database support personnel to better understand why the Teradata Database
Optimizer chooses a particular plan for a given SQL query. All information required for
query plan analysis such as database object definitions, data demographics and cost
and cardinality estimates is available through the Teradata Visual Explain interface. It is
helpful in identifying the performance implications of data skew and bad or missing
statistics. Visual Explain uses a Query Capture Database to store query plans which
can then be visualized or manipulated with other Teradata Analyst Pack tools.

Teradata System Emulation Tool (Teradata SET)


Teradata SET simplifies the task of emulating a target system by providing the ability
to export and import all information necessary to fake out the optimizer in a test
environment. This information can be used along with the Target Level Emulation
feature to generate query plans on the test system as if they were run on the target
system. This feature is useful for verifying queries and reproducing optimizer related
issues in a test environment.

Teradata SET allows the user to capture the following by database, query, or workload:

 System cost parameters


 Object definitions
 Random AMP samples
 Statistics
 Query execution plans
 Demographics

This tool does not export user data.

Teradata Index Wizard


Teradata Index Wizard automates the process of manual index design by
recommending secondary indexes for a particular workload. Teradata Index Wizard
provides a graphical user interface (GUI) that guides the user through analyzing a
database workload and provides recommendations for improving performance through
the use of indexes. Teradata Index Wizard provides support for Partitioned Primary
Indexes (PPI) recommendations. PPI is discussed in the Indexes module of this course.

Teradata Statistics Wizard


Teradata Statistics Wizard is a graphical tool that has been designed to automate the
collection and re-collection of statistics, resulting in better query plans and helping
the DBA to efficiently manage statistics.

The Statistics Wizard enables the DBA to:

 Specify a workload to be analyzed for recommendations to improve the


performance of the queries in that workload.
 Select databases, tables, indexes, or columns for analysis, collection, or re-
collection of statistics.
 Schedule the COLLECT STATISTICS activity.

As changes are made within a database, the Statistics Wizard identifies those changes
and recommends which tables should have statistics collected, based on age of data
and table growth, and which columns/indexes would benefit from having statistics
defined and collected for a specific workload. The DBA is then given the opportunity to
accept or reject the recommendations.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 68 of 137

Archival Utilities

Teradata provides the Archive Recovery utility (ARC) to perform backup and restore
operations on tables, databases, and other objects.

In addition, ARC interfaces to third party products to support backup and restore
capabilities in a network-attached environment.

There are several scenarios where restoring objects from external media may be
necessary:

 Restoring non-Fallback tables after a disk failure.


 Restoring tables that have been corrupted by batch processes that may have left
the data in an uncertain state.
 Restoring tables, views, or macros that have been accidentally dropped by the
user.
 Miscellaneous user errors resulting in damaged or lost database objects.
 Archive a single partition.

With the ARC utility you can copy a table and restore it to another Teradata Database. It
is scalable and parallel, and can run on a channel-attached client, network-attached
client, or a node.

Archiving on Channel-Attached Clients

In a channel-attached (mainframe) client environment, ARC is used to back up and


restore data. It supports commands written in Job Control Language (JCL). ARC
archives and restores database objects, allowing recovery of data that may have been
damaged or lost.

ARC may be running on the node or on the channel-attached client, and will backup data
directly across the channel into the mainframe-attached tape subsystem.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 69 of 137

Archiving on Network-Attached Clients

In a network-attached client environment, ARC is used to back up data, along with one
of the following tape management products:

 NetVault (from BakBone Software Inc.)


 Veritas NetBackup – from Symantec Software

These products provide modules for Teradata Database systems that run on network-
attached clients or a node (Microsoft Windows or UNIX MP-RAS). Data is backed up
through these interfaces into a tape storage subsystem using the ARC utility.

Exercise 3.1

Processing a Request: Drag an icon from the group on the


right to its correct position in the empty boxes on the left.
Correctly placed icons will stay where you put them.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 70 of 137

To review this topic, click Request Processing.

Exercise 3.2

Select the appropriate Teradata load or unload utility from the pull-down menus.

Enables constant loading (streaming) of data into a table to keep data fresh.
Data extract utility that exports data from a Teradata table and writes it to a host file.
Updates, inserts, or deletes empty or populated tables (block level operation).
Uses parallel processing to load an empty table.
Performs the same function as the UPDATE Teradata Parallel Transporter operator.
Performs the same function as the STREAM Teradata Parallel Transporter operator.

Feedback:
Show Answers Reset

To review these topics, click FastLoad, MultiLoad, TPump, and FastExport.

Exercise 3.3

Move the software components required for a channel connection into the appropriate blue
squares. Correctly placed components will stay where you put them.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 71 of 137

To review this topic, click Channel Attached Client.

Exercise 3.4

Which three statements are true? (Choose three.)

c A. Teradata SQL Assistant and TDWM are the two utilities used for Teradata system
d
e
f
g
management.
c B. TDWM can reject a query based on current workload and set thresholds.
d
e
f
g
c C. BTEQ runs on all client platforms to access the Teradata Database.
d
e
f
g
c D. Archive Recovery (ARC) is used to copy and restore a table to another Teradata Database.
d
e
f
g
c E. NetVault and Veritas NetBackup are utilities used for network management.
d
e
f
g
Feedback:
Check Answer Show Answer

To review these topics, click BTEQ, Teradata SQL Assistant, Teradata Manager, TDWM, Archiving
on Channel-Attached Clients, and Archiving on Network-Attached Clients.

Exercise 3.5

Select the correct type of connection (network-attached client or channel-attached client) from the
drop-down boxes below that corresponds to the listed software and hardware components.

Teradata Gateway
Teradata Director Program
Channel Driver
Ethernet Card
"mainframe host"

Feedback:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 72 of 137

Show Answers Reset

To review this topic, click Channel Attached Client or Network Attached Client.

Exercise 3.6

Select the correct Teradata Analyst Pack tool from the drop-down menus below.

Verifies queries and reproduces optimizer related (query plans)


issues in a test environment.
Recommends one or more Secondary Indexes for a table.
Uses a Query Capture Database to store query plans.
Recommends and automates the Statistics Collection process.

Feedback:
Show Answers Reset

To review this topic, click Teradata Analyst Pack.

Exercise 3.7

__________ is made up of several products/tools that assist the


DBA or application developer in defining and refining the rules,
(i.e., filters, throttles and workload definitions), that control the
allocation of resources to workloads running on a system.

j
k
l
m
n A. Teradata Workload Analyzer
j B. Database Query Log
k
l
m
n
j C. Teradata Active Systems Manager
k
l
m
n
j D. Performance Monitor
k
l
m
n

Feedback:

To review this topic, click Administrative Utilities.

Exercise 3.8

True or False: Workload definitions are rules to control the


allocation of resources to workloads.

j
k
l
m
n A. True
j
k
l
m
n B. False

Feedback:

To review this topic, click Administrative Utilities.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 73 of 137

Exercise 3.9

Select the correct term from the drop-down menus below.

is an application programming standard that defines common


database access mechanisms to simplify the exchange of data between a client and server.
________ compliant applications connect with a database through the use of a driver that
translates the application's ________ commands into database syntax.

is a library of routines that enable an application program to


access data stored in the Teradata Database.

is an Application Programming Interface (API) that allows


platform independent Java applications to access a DBMS using Structured Query Language
(SQL). It enables the development of web-based Teradata end user tools that can access
Teradata through a web server and also provides support for access to other commercial
databases.

is an additional legacy API that allows access to Teradata


from a network host.

Feedback:
Show Answers Reset

To review this topic, click Network Attached Client.

Mod 4 - Data Structure

Objectives

After completing this module, you should be able to:

 Distinguish between a Teradata Database and a Teradata User.


 List and define the Teradata Database objects.
 Define Perm Space, Temp Space, and Spool Space, and explain how each is
used.
 Describe the function of the Data Dictionary.
 List methods for authentication and security on Teradata.

HOT TIP: This module contains links to important supplemental course


information. Please be sure to click on each hotword link to capture all of the
training content.

Creating Databases and Users

In the Teradata Database, Databases (including a special category of Databases called

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 74 of 137

Users), have attributes assigned to them:

 Access Rights: Privileges that allow a User to perform operations (such as


CREATE, DROP, and SELECT) against database objects. A User must have the
correct access rights to a database object in order to access it.
 Perm Space: The maximum amount of Permanent Space assigned and available
to a User or Database to store tables. Unlike some other relational databases, the
Teradata Database does not physically pre-allocate Perm Space for
Databases and Users when they are defined during object definition time. Only the
Permanent Space limit is defined, then the space is consumed dynamically as
needed. All Databases have a defined upper limit of Permanent Space.
 Spool Space: The amount of space assigned and available to a User or Database
to gather answer sets. For example, when executing a conditional query, qualifying
rows are temporarily stored using Spool Space. Depending on how the system is
set up, a single query could temporarily use all available system space to store its
result in spool. Permanent Space not being used for tables is available for Spool
Space.
 Temp Space: The amount of space used for global temporary tables, and these
results remain available to the User until the session is terminated. Tables created
in Temp Space will survive a restart. Permanent Space not being used for tables is
available for Temp Space as well as Spool Space.

A Logical Database Hierarchy

In a logical, hierarchical organization, Databases (including Users) are created


subordinate to existing Databases or Users. The owning Database or User is called the
parent. The subordinate Database or User is called the child. Permanent Space for the
new Database or User comes from its immediate parent.

When the Teradata Database software is first installed, all Permanent Space is assigned
to Database DBC (also a User in Teradata Database terminology, because you can log
on to it with a userid and password). During installation, the following Databases are
created:

 Database Crashdumps (initially empty)


 User SystemFE (with its views and macros)
 User SysAdm (with its views and macros)

Because Database DBC is the immediate parent of these child Databases, Permanent
Space limits for the children are subtracted from Database DBC.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 75 of 137

Creating a New Database

After the initial installation, you will create your database hierarchy. One way to set up
this hierarchy would be to create a Database Administrator User directly subordinate to
Database DBC. Most of the system Permanent Space would be assigned to the
Database Administrator User. This setup gives you the freedom to have multiple
administrators logging on to the Database Administrator User, and limit the number of
people logging on directly to Database DBC (which has more access rights than any
other User).

Next, all other Users and Databases would be created from the database administrator
User, and their Permanent Space limits would be subtracted from the Database
Administrator User's space limit. Your hierarchy would look like this:

 Database DBC at the highest level, the parent of all other Databases (including
Users).
 User SysDBA (we called it SysDBA; you can assign it any name) with the majority
of the system's Perm Space assigned to it.
 All Databases and Users in the system created from User SysDBA .
 Each table, view, macro, stored procedure, and trigger are owned by a Database
(or User).

Data Layers

There are several “layers” built in to the EDW environment. These layers include:

 Staging – the primary purpose of the staging layer is to perform data

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 76 of 137

transformation, either in the ETL or ELT process.


 Semantic – this layer is the “access” layer. Access is often provided via views and
business intelligence (BI) tools; whether a Teradata application or a 3rd party tool.
 Physical – the physical layer is where denormalizations that will make access more
efficient occur; pre-aggregations, summary tables, join indexes, etc. The purpose
of this layer is to provide efficient, friendly access to end users.

Maximum Perm Space Allocations: An Example

Below is an example of how Permanent Space limits for Users and Databases come
from the immediate parent User or Database. In this case, the User SysDBA has 500 GB
of maximum Permanent Space assigned to it.

The User HR is created from SysDBA with 200 GB of maximum Permanent Space. The
200 GB for HR is subtracted from SysDBA, who now has 300 GB (500 GB minus 200
GB).

The User Payroll is created as a child of HR with 100 GB of Permanent Space. The 100
GB for Payroll is subtracted from HR, which now has 100 GB (200 GB minus 100 GB).

At a different level under SysDBA, Database Marketing is created as a child of SysDBA,

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 77 of 137

with 100 GB of maximum Permanent Space. The 100 GB for Marketing comes from its
parent, SysDBA, which now has 200 GB (300 GB minus 100 GB).

A Teradata Database

In Teradata Database systems, the words "database" and "user" have specific
definitions.

Database: The Teradata Definition

In Teradata, a "database" is a logical grouping of information contained in tables. A


Teradata Database also provides a key role in space allocation and access control. A
Teradata Database is a defined, logical repository that can contain objects, including:

 Database: A defined object that may contain other database objects.


 User: A database that has a user ID and password for logging on to the Teradata
Database, and may contain other database objects.
 Table: A two-dimensional structure of columns and rows of data. (Requires Perm
Space)
 View: A virtual "window" into subsets of one or more tables or other views. It is
pre-defined using a single SELECT statement. (Uses no Perm Space)
 Macro: A definition composed of one or more Teradata SQL and report formatting
commands. (Uses no Perm Space)
 Trigger: One or more Teradata SQL statements attached to a table and executed
when specified conditions are met. (Uses no Perm Space)
 Stored Procedure: A combination of procedural and non-procedural statements
run using a single CALL statement. (Requires Perm Space)
 User Defined Function: Allows authorized users to write external functions.
Teradata allows users to create scalar functions to return single value results,
aggregate functions to return summary results, and table functions to return tables.
UDFs may be used to protect sensitive data such as personally identifiable data.

Note: A Database with no Perm Space can contain views, macros, and triggers, but no
tables or stored procedures.

These Teradata Database objects are created, maintained, and deleted using SQL.

User: A Special Kind of Database

A user may be a collection of tables, views, macros, triggers, and stored procedures. A

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 78 of 137

user is a specific type of database, and has attributes in addition to the ones listed
above:

 User ID
 Password

So, a user is the same as a database except that a user can actually log on to the
database. To log on to a Teradata Database, you need to specify a user (which is simply
a database with a password). You cannot log on to a database because it has no
password.

Note: In this course, we will use uppercase "U" for User and uppercase "D" for Database
when referring to these specific Teradata Database objects.

Spool Space

Maximum Spool Space

As mentioned previously in "Creating Databases and Users," Spool Space is work space
used to hold intermediate answer sets. Any Perm Space currently unassigned is
available as Spool Space.

Defining a Spool Space limit is not required when Users and Databases are created. If it
is not defined, the Spool Space limit for the User or Database is inherited from its parent.
Thus, if no Spool Space limit were defined for any Users or Databases, an erroneous
SQL request could create a "runaway transaction" that consumes all of the system's
resources. For this reason, defining Spool Space limits for a User or Database is highly
recommended.

The Spool Space limit for a Database or User is not subtracted from its immediate
parent, but the Database or User's maximum spool allocation can only be as large as its
immediate parent. For example:

 Database A has a Spool Space limit of 500 GB.


 Database B is created as a child of Database A. The maximum Spool Space that
can be allocated to Database B is 500 GB.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 79 of 137

 Database C is created as another child of Database A. The maximum Spool Space


that can be allocated to Database C is also 500 GB.

Because Spool Space is work space, temporarily used and released by the system as
needed, the total maximum Spool Space allocated for all the Databases and Users on
the system can actually exceed the total system disk space. But this is not the amount of
Spool Space actually consumed.

Consuming Spool Space

The maximum Spool Space for a Database (or User) is merely an upper limit of the
Spool Space that the Database can use while processing a transaction. There are two
limits to Spool Space utilization:

 The maximum Spool Space assigned to a User or Database. If a transaction is


going to exceed its assigned limit, it is aborted and an error message is given
stating that the maximum Spool Space was exceeded.

 Physical limitation of disk space. For a specific transaction, the system can only
use the amount of Spool Space actually available on the system at that
particular time, whether a maximum spool limit has been defined or not. If a job is
going to exceed the Spool Space available on the system, an error message is
given stating that there is not enough space to process the job.

As the amount of Permanent Space used to store data varies over a long period of time,
so will the amount of space available for spool (work space).

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 80 of 137

Temporary Space

Temporary Space is Permanent Space currently not being used. Temporary Space is
used for global temporary tables, and these results remain available to the user until
their session is terminated. Tables created in Temp Space will survive a restart.

Check Your Understanding

Which statement is true? (Check the best answer.)

j The Spool Space used by a request is limited to the amount of Spool Space
k
l
m
n
assigned to the originating user and the physical space available on the system at that
point in time.
j A request can use as much Spool Space as necessary as long as it does not exceed
k
l
m
n
the system’s total installed physical space limit.
j A request can use as much Spool Space as necessary as long as it does not exceed
k
l
m
n
the Spool Space limit of the originating user, regardless of the space available on the
system.
j The Spool Space used by a request is limited only by the maximum Perm Space of
k
l
m
n
the originating user.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 81 of 137

Feedback:

Data Dictionary

The Data Dictionary is a set of relational tables that contains information about the
RDBMS and database objects within it. It is like the metadata or "data about the data" for
a Teradata Database (except that it does not contain business rules, like true metadata
does). The Data Dictionary resides in Database DBC. Some of the items it tracks are:

 Disk space
 Access rights
 Ownership
 Data definitions

Disk Space

The Data Dictionary stores information about how much space is allocated for perm and
spool for each Database and User. The table below shows an example of Data
Dictionary information for space allocations. In this example, the Users Payroll and
Benefits have no Permanent Space allocated or consumed because they do not contain
tables.

Access

The Data Dictionary also stores information about which Users can access which
database objects.

System Administrators are often responsible for archiving the system. In the example
below, it is likely that the SysAdm User would have access to the tables in the Employee
and Crashdumps databases, as well as other objects. When you grant and revoke
access to any User for any database object, privileges are stored in the AccessRights
table in the Data Dictionary.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 82 of 137

Owners

The Data Dictionary also stores information about which Databases and Users own each
database object.

Definitions

The Data Dictionary stores definitions of all database objects, their names, and their
place in the hierarchy.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 83 of 137

For macros, the Data Dictionary also stores the actual SQL statements of the macro.
While stored procedures also contain statements (SQL and SPL statements), the
statements for each stored procedure are kept in a separate table and distributed among
the AMPs (like regular user data), rather than in the Data Dictionary.

Database Security

There are several mechanisms for implementing security on a Teradata Database.


These mechanisms include authenticating access to the Teradata Database with the
following:

 LDAP
 Single Sign-On
 Passwords

Authentication
After users have logged on to Teradata Database and have been authenticated, they are
authorized access to only those objects allowed by their database privileges.

Additional Security Mechanisms


In addition to authentication, there are several database objects or constructs that allow
for a more secure database environment. These include:

 Privileges, or Access Rights


 Views
 Macros
 Stored Procedures
 User Defined Functions (UDFs)
 Roles – a collection of Access Rights

Privilege (access right) is the right to access or manipulate an object within Teradata.
Privileges control user activities such as creating, executing, inserting, viewing,

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 84 of 137

modifying, deleting, or tracking database objects and data. Privileges may also include
the ability to grant privileges to other users in the database.

In addition to access rights, the database hierarchy can be set up such that users
access tables or applications via the semantic layer, which could include Views, Macros,
Stored Procedures, and even UDFs.

Roles, which are a collection of access rights, can be granted to groups of users to
further protect the security of data and objects within Teradata.

Exercise 4.1

When you log on to the Teradata Database, you must specify:

j
k
l
m
n A. The path to the data.
j
k
l
m
n B. A SELECT command.
j C. A User and password.
k
l
m
n
j D. An IP address.
k
l
m
n

Feedback:

To review this topic, click A Teradata Database.

Exercise 4.2

Database_Employee was created with 500 GB of Perm Space. If Database_Addresses (100 GB of


Perm Space) and Database_Compensation (100 GB of Perm Space) both are created from
Database_Employee, how much available Perm Space does Database_Employee have now?

j
k
l
m
n A. 300 GB
j B. 500 GB
k
l
m
n
j C. 600 GB
k
l
m
n
j D. 700 GB
k
l
m
n
Feedback:

See Calculation

To review this topic, click Space Allocations: An Example.

Exercise 4.3

Select the answers from the options given in the drop-down boxes that correctly complete the
sentences.

A view is a "virtual table" that does not exist as an actual table.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 85 of 137

Permanent Space is pre-defined and allocated for a Database or User.


Users must have privileges to access any database object.
Perm Space limits apply to Databases, Users, tables, views, macros, triggers, and stored
procedures.
Temp Space is used for global temporary tables.
Perm Space is assigned to a User or Database to gather answer sets.

Feedback:
Show Answers Reset

To review these topics, click Creating Databases and Users, Creating a New Database.

Exercise 4.4

Select the choice from the drop-down box that corresponds to each statement:

Privileges granted to Users and Databases.


Work area consumed by the system as it processes requests.
Maximum space allocated to Databases and Users for data.

Feedback:
Show Answers Reset

To review these topics, click Creating Databases and Users.

Exercise 4.5

A Teradata User is a special type of database:

j
k
l
m
n A. Always
j B. Sometimes
k
l
m
n
j C. Never
k
l
m
n

Feedback:

To review this topic, click A Teradata Database.

Exercise 4.6

True of False: A User-Defined Function (UDF) allows authorized


users to write external functions. Teradata allows users to
create scalar functions to return single value results, aggregate
functions to return summary results, and table functions to
return tables. UDFs may be used to protect sensitive data such
as personally identifiable data.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 86 of 137

j
k
l
m
n A. True
j
k
l
m
n B. False

Feedback:

To review this topic, click A Teradata Database.

Exercise 4.7

The three Teradata Database security mechanisms for authenticating access to the Teradata
Database are? (Choose three.)

c
d
e
f
g A. LDAP
c
d
e
f
g B. Single Sign-On
c C. User Defined Functions
d
e
f
g
c D. Passwords
d
e
f
g

Feedback:
Check Answer Show Answer

To review these topics, click Database Security.

Exercise 4.8

Match the data “layers” built into the Teradata EDW environment to their definitions.

The primary purpose for this layer is to perform data transformation, either in the ETL
or ELT process.

This layer is where denormalizations that will make access more efficient occur; pre-
aggregations, summary tables, join indexes, etc. The purpose of this layer is to provide efficient,
friendly access to end users.

This is the “access” layer. Access is often provided via views and business intelligence
(BI) tools; whether a Teradata application or a 3rd party tool.

Feedback:
Show Answers Reset

To review these topics, click Data Layers .

Mod 5 - Data Protection

Objectives

After completing this module, you should be able to:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 87 of 137

 Describe the types of data protection and fault tolerance used by the Teradata
Database.
 Discuss the types of RAID protection used on Teradata Database systems.
 Explain basic data storage concepts.
 Explain the concept of Fallback tables.
 List the types and levels of locking provided by the Teradata Database.
 Describe the function of recovery journals, transient journals, and permanent
journals.

HOT TIP: This module contains links to important supplemental course


information. Please be sure to click on each hotword link to capture all of the
training content.

Protecting Data

Several types of data protection are available with the Teradata Database. All the data
protection methods shown on this page are covered in further detail later in this module.

RAID

Redundant Array of Inexpensive Disks (RAID) is a storage technology that provides data
protection at the disk drive level. It uses groups of disk drives called "arrays" to ensure
that data is available in the event of a failed disk drive or other component. The word,
"redundant," implies that either data, functions, and/or components have been duplicated
in the array's architecture. The industry has agreed on six RAID configuration levels
(RAID 0 through RAID 5). The classifications do not imply superiority of one mode over
the other, but differentiate how data is stored on the disk drives.

With the Teradata Database, the two RAID technologies that are supported are RAID 1
and RAID 5. On systems using EMC disk drives, RAID 5 is called RAID S.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 88 of 137

Disk arrays contain the following major components:

 SCSI bus
 Physical disks
 Disk array controllers

For maximum availability and performance, the Teradata Database uses dual redundant
disk array controllers. Having two disk array controllers provides a level of protection in
case one controller fails, and provides parallelism for disk access.

Fallback

Fallback is a Teradata Database feature that protects data against AMP failure. As
shown later in this module, Fallback uses clusters of AMPs that provide for data
availability and consistency if an AMP is unavailable.

Locks

Locks can be placed on database objects to prevent multiple users from simultaneously
changing them. The four types of locks are:

 Exclusive
 Write
 Read
 Access

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 89 of 137

Journals

The Teradata Database has journals that are used for specific types of data or process
recovery:

 Recovery Journals
 Permanent Journals

RAID 1

RAID 1 is a data protection scheme that uses mirrored pairs of disks to protect data from
a single drive failure.

RAID 1: Effects on Your System

RAID 1 requires double the number of disks because every drive has an identical
mirrored copy. Recovery with RAID 1 is faster than with RAID 5. The highest level of
data protection is RAID 1 with Fallback.

RAID 1: How It Works

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 90 of 137

RAID 1 protects against a single disk failure using the following principles:

 Mirroring
 Reading

Mirroring: RAID 1 maintains a mirrored disk for each disk in the system.

Note: If you configure more than one pair of disks per AMP, the RDAC stripes the data
across both the regular and mirror disks.

Reading: Using both copies of the data, the system reads data blocks from the first
available disk. This does not so much protect data as provide a performance benefit.

RAID 1: How It Handles Failures

If a disk fails, the Teradata Database is unaffected and the following are each handled in
a different way:

 Reads
 Writes
 Replacements

Reads: When a drive is down, the system reads the data from the other drive. There
may be a minor performance penalty because the read will occur from one drive instead
of both.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 91 of 137

Writes: When a drive is down, the system writes to the functional drive. No mirror image
exists at this time.

Replacements: After you replace the failed disk, the disk array controller automatically
reconstructs the data on the new disk from the mirror image. Normal system
performance is affected during the reconstruction of the failed disk.

RAID 5

RAID 5 is a data protection scheme that uses parity striping in a disk array to protect
data from the failure of a single drive.

Note: RAID S is the name for RAID 5 implemented on EMC disk drives.

RAID 5: Effects on Your System

The number of disks per rank varies from vendor to vendor. The number of disks in a
rank impacts space utilization:

 4 drives per rank requires a 33% increase in data space.


 5 drives per rank requires a 25% increase in data space.

RAID 5 also uses some overhead during a write operation, because it has to read the
data, then calculate and write the parity.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 92 of 137

RAID 5: How It Works

RAID 5 uses a data parity scheme to provide data protection.

Rank: For the Teradata Database, RAID 5 uses the concept of a rank, which is a set of
disks working together. Note that the disks in a rank are not directly cabled to each
other.

Parity: In RAID 5, data is handled as follows:

 Data is striped across a rank of disks (spread across the disk drives) one segment
at a time, using a binary "exclusive-or" (XOR) algorithm.
 Parity is also striped across all disk drives, interleaved with the data. A "parity
byte" is an extra byte written to a drive in a rank. The process of writing data and
parity to the disk drives includes a read-modify-write operation for each new
segment:
1. Read existing data on the disk drives in the rank.
2. Read existing parity in that rank for the corresponding segment.
3. Calculate the parity: existing data + new data + existing parity = new parity.
4. Write new data.
5. Write new parity.
 If one of the disk drives in the rank becomes unavailable, the system uses the
parity byte to calculate the missing data from the down drive so the system can
remain operational. With a rank of 4 disks, if a disk fails, any missing data block
may be reconstructed using the other 3 disks.

In the example below, data bytes are written to disk drives 1, 2, and 3. The system
calculates the parity byte using the binary XOR algorithm and writes it to disk drive 4.

RAID 5: How It Handles Failures

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 93 of 137

If a disk fails, the Teradata Database is unaffected and the following are each handled in
different ways:

 Reads
 Writes
 Replacements

Reads: Data is reconstructed on-the-fly as users request data using the binary XOR
algorithm.

Writes: When a drive is down, the system writes to the functional drives, but not to the
failed drive.

Replacements: After you replace the failed disk, the disk array controller automatically
reconstructs the data on the new disk, using known data values to calculate the missing
data. Normal system performance is affected during reconstruction of the failed disk.

Give It a Try

In the example below, Disk 2 has experienced a failure. To allow users to still access the
data while Disk 2 is down, the system must calculate the data on the missing disk drive
using the parity byte. What would be the missing byte for this segment?

j
k
l
m
n A. 1111 0011
j B. 0111 1011
k
l
m
n
j C. 0010 0110
k
l
m
n
j D. 0000 1100
k
l
m
n
Feedback:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 94 of 137

Fallback

Fallback is a Teradata Database feature that protects data in the case of an AMP vproc
failure. Fallback guarantees the maximum availability of data. You can specify Fallback
protection at the table or database level. It is especially useful in applications that require
high availability.

Fallback protects your data by storing a second copy of each row of a table on a
different AMP in the same cluster. If an AMP fails, the system accesses the Fallback
rows to meet requests. Fallback provides AMP fault tolerance at the table level. With
Fallback tables, if one AMP fails, all data is still available. Users may continue to use
Fallback tables without any loss of access to data.

During table creation or after a table is created, you may specify whether or not the
system should keep a Fallback copy of the table. If Fallback is specified, it is automatic
and transparent.

Fallback guarantees that the two copies of a row will always be on different AMPs. If
either AMP fails, the alternate row is still available on the other AMP.

Fallback: Effects on Your System

Fallback has the following effects on your system:

Space

In addition to the original database size, you need space for:

 Fallback-protected tables (100% additional storage space for each Fallback-


protected table)
 RAID protection of Fallback-protected tables

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 95 of 137

Performance

There is a benefit to protecting your data, but there are costs associated with that
benefit. With Fallback use, you need twice the disk space for storage and twice the I/O
required for INSERTs, UPDATEs, and DELETEs of rows in Fallback protected tables.
The Fallback option does not require any extra I/O for SELECTS, as the system will read
from one copy or the other, and the Fallback I/O will be performed in parallel with the
primary I/O so there is no performance hit.

Fallback benefits include:

 A level of protection beyond RAID disk array protection.


 Can be specified on a table-by-table basis to protect data requiring the highest
availability.
 Permits access to data while an AMP is off-line.
 Automatically restores data that was changed during the AMP off-line period.

The highest level of data protection is Fallback and RAID1.

Fallback: Software Tools

The following Teradata utilities are used to recover a failed AMP:

 Vproc Manager: Enables you to:


 Display and modify vproc states.
 Initiate Teradata Database restarts.
 Table Rebuild: Reconstructs tables on an AMP from data on other AMPs in the
cluster.
 Recovery Manager: Lets you monitor recovery processing.

Fallback: How It Works

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 96 of 137

Fallback is accomplished by grouping AMPs into clusters. When a table is defined as


Fallback-protected, the system stores a second copy of each row in the table on a
"Fallback AMP" in the AMP cluster.

Below is a cluster of four AMPs. Each AMP has a combination of Primary and Fallback
data rows:

 Primary Data Row: A record in a database table that is used in normal system
operation.
 Fallback Data Row: The online backup copy of a Primary data row that is used in
the case of an AMP failure.

Write: Each Primary data row has a duplicate Fallback row on another AMP. The
Primary and Fallback data rows are written in parallel.

P=Primary F=Fallback

Read: When an AMP is down with a table that is defined as Fallback, Teradata will
access the Fallback copies of the rows.

More Clusters: The diagram below shows how Fallback data is distributed among
multiple clusters.

P=Primary F=Fallback

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 97 of 137

Fallback: How It Handles Failures

If two physical disks fail in the same RAID 5 rank or RAID 1 mirrored pair, the associated
AMP vproc fails. Fallback protects against the failure of a single AMP in a cluster.

If two AMPs in a cluster fail, the system halts and must be restarted manually, after the
AMP is recovered by replacing the failed disk(s).

Reads: When an AMP fails, the system reads all rows it needs from the remaining AMPs
in the cluster. If the system needs to find a Primary row from the failed AMP, it reads the
Fallback copy of that row, which is on another AMP.

Writes: A failed AMP is not available, so the system cannot access any of that AMP's
disk space. Copies of its unavailable primary rows are available as Fallback rows on the
other AMPs in the cluster, and are updated there.

Replacement: Repairing the failed AMP requires replacing the failed physical disks and
bringing the AMP online. Once the AMP is online, the system uses the Fallback data on
the other AMPs to automatically reconstruct data on the newly replaced disks.

Disk Allocation

The operating system, PDE, and the Teradata Database do not recognize the physical
disk hardware. Each software component recognizes and interacts with different
components of the data storage environment:

 Operating system: Recognizes a logical unit (LUN). The operating system


recognizes the LUN as its "disk," and is not aware that it is actually writing to
spaces on multiple disk drives. This technique enables the use of RAID technology
to provide data availability without affecting the operating system.
 PDE: Translates LUNs into vdisks using slices (in UNIX) or partitions (in Microsoft
Windows and Linux) in conjunction with the Teradata Parallel Upgrade Tool.
 Teradata Database: Recognizes a virtual disk (vdisk). Using vdisks instead of
direct connections to physical disk drives enables the use of RAID technology with
the Teradata Database.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 98 of 137

Creating LUNs

Space on the physical disk drives is organized into LUNs. The RAID level determines
how the space is organized. For example, if you are using RAID 5, a LUN includes a
region of space from each of the physical disk drives in a rank.

Pdisks: User Data Space

After a LUN is created, it is divided into partitions.

 In UNIX systems, a LUN consists of one partition, which is further divided into
slices:
 Boot slice (a very small slice, taking up only 35 sectors)
 User slices for storing data. These user slices are called "pdisks" in the
Teradata Database.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 99 of 137

 In Microsoft Windows systems, a LUN consists of multiple partitions, not slices.


Thus, LUNs in Microsoft Windows do not have a boot slice. Instead, they contain a
"Master Boot Record" that includes information such as the partition layout. The
partitions store data and are called "pdisks" in the Teradata Database.
 Linux systems are similar to Microsoft Windows, both use a Master Boot Record
and an MS DOS style partition table.

In summary, pdisks are the user slices (UNIX), partitions (Microsoft Windows), or
partitions (Linux) and are used for storage of the tables in a database. A LUN may have
one or more pdisks.

Assigning Pdisks to AMPs

The pdisks (user slices or partitions, depending on the operating system) are assigned
to an AMP through the software. No cabling is involved.

The combined space on the pdisks is considered the AMP's vdisk. An AMP manages
only its own vdisk (disk space assigned to it), not the vdisk of any other AMP. All AMPs
then work in parallel, processing their portion of the data.

Vdisks and Ranks

Each AMP in the system is assigned one vdisk. Although numerous configurations are
possible, generally all pdisks from a rank (RAID 5) or mirrored pair (RAID 1) are
assigned to the same AMP for optimal performance.

However, an AMP recognizes only the vdisk. The AMP has no control over the physical
disks or ranks that compose the vdisk.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 100 of 137

Reviewing the Terminology

To help review the terminology you just learned, choose the correct term from the pull-
down boxes next to each definition.

A logical unit that is composed of a region of space from each of the physical
disk drives in a rank. The operating system sees this logical unit as its "disk," and is not
aware that it is actually writing to spaces on multiple disk drives.

For a UNIX system, a portion of physical disk drive space that is used for
storing data. One of these from each disk drive in a rank composes a LUN.

For a Microsoft Windows system, a portion of physical disk drive space that
is used for storing data. One of these from each disk drive in a rank composes a LUN.

This is Teradata Database terminology for a user slice (UNIX) or partition


(Microsoft Windows) that store data. It is just another name for user slice or partition, but
from the Teradata Database point of view. These are assigned to AMPs, which manage
the data stored.

This is the collective name for all the logical disk space that an AMP
manages. Thus, it is composed of all the pdisks assigned to that AMP (as many as 64
pdisks).

Feedback:
Show Answers Reset

Journals for Data Availability

The following journals are kept on the system to provide data availability in the event of a
component or process failure in the system:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 101 of 137

 Recovery Journals
 Permanent Journals

Recovery Journals

The Teradata Database uses Recovery Journals to automatically maintain data integrity
in the case of:

 An interrupted transaction (Transient Journal)


 An AMP failure (Down-AMP Recovery Journal)

Recovery Journals are created, maintained, and purged by the system automatically, so
no DBA intervention is required. Recovery Journals are tables stored on disk arrays like
user data is, so they take up disk space on the system.

Transient Journal

A Transient Journal maintains data integrity when in-flight transactions are interrupted
(due to aborted transactions, system restarts, and so on). Data is returned to its
original state after transaction failure.

A Transient Journal is used during normal system operation to keep "before images" of
changed rows so the data can be restored to its previous state if the transaction is not
completed. This happens on each AMP as changes occur. When a transaction is
started, the system automatically stores a copy of all the rows affected by the
transaction in the Transient Journal until the transaction is committed (completed). Once
the transaction is complete, the "before images" are purged. In the event of a transaction
failure, the "before images" are reapplied to the affected tables and deleted from the
journal, and the "rollback" operation is completed.

Down-AMP Recovery Journal

The Down-AMP Recovery Journal allows continued system operation while an AMP is
down (for example, when two disk drives fail in a rank or mirrored pair). A Down-AMP
Recovery Journal is used with Fallback-protected tables to maintain a record of write
transactions (updates, creates, inserts, deletes, etc.) on the failed AMP while it is
unavailable.

The Down-AMP Recovery Journal starts automatically after the loss of an AMP in a
cluster, Any changes to the data on the failed AMP are logged into the Down-AMP
Recovery Journal by the other AMPs in the cluster. When the failed AMP is brought back
online, the restart process includes applying the changes in the Down-AMP Recovery
Journal to the recovered AMP. The journal is discarded once the process is complete,
and the AMP is brought online, fully recovered.

Permanent Journals

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 102 of 137

Permanent Journals are an optional feature used to provide an additional level of data
protection. You specify the use of Permanent Journals at the table level. It provides full-
table recovery to a specific point in time. It can also reduce the need for costly and time-
consuming full-table backups.

Permanent Journals are tables stored on disk arrays like user data is, so they take up
additional disk space on the system. The Database Administrator maintains the
Permanent Journal entries (deleting, archiving, and so on.)

How Permanent Journals Work

A Database (object) can have one Permanent Journal.

When you create a table with Permanent Journaling, you must specify whether the
Permanent Journal will capture:

 Before images -- for rollback to "undo" a set of changes to a previous state.


 After images -- for rollforward to "redo" to a specific state.

You can also specify that the system keep both before images and after images. In
addition, you can choose that the system captures:

 Single images (the default) -- this means that the Permanent Journal table is not
Fallback protected.
 Dual images -- this means that the Permanent Journal table is Fallback protected.

The Permanent Journal captures images concurrently with standard table maintenance
and query activity. The additional disk space required may be calculated in advance to
ensure adequate resources. Periodically, the Database Administrator must dump the
Permanent Journal to external media, thus reducing the need for full-table backups since
only changes are backed up rather than the entire database.

Locks

Locking prevents multiple users who are trying to access or change the same data
simultaneously from violating data integrity. This concurrency control is implemented by
locking the target data.

Locks are automatically acquired during the processing of a request and released when
the request is terminated.

Levels of Locking

Locks may be applied at three levels:

 Database Locks: Apply to all tables and views in the database.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 103 of 137

 Table Locks: Apply to all rows in the table or view.


 Row Hash Locks: Apply to a group of one or more rows in a table.

Types of Locks

The four types of locks are described below.

Exclusive

Exclusive locks are applied to databases or tables, never to rows. They are the most
restrictive type of lock. With an exclusive lock, no other user can access the database or
table. Exclusive locks are used when a Data Definition Language (DDL) command is
executed (i.e., CREATE TABLE). An exclusive lock on a database or table prevents
other users from obtaining any lock on the locked object.

Write

Write locks enable users to modify data while maintaining data consistency. While the
data has a write lock on it, other users can only obtain an access lock. During this time,
all other locks are held in a queue until the write lock is released.

Read

Read locks are used to ensure consistency during read operations. Several users may
hold concurrent read locks on the same data, during which time no data modification is
permitted. Read locks prevent other users from obtaining the following locks on the
locked data:

 Exclusive locks
 Write locks

Access

Access locks can be specified by users unconcerned about data consistency. The use of
an access lock allows for reading data while modifications are in process. Access locks
are designed for decision support on tables that are updated only by small, single-row
changes. Access locks are sometimes called "stale read" locks, because you may get

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 104 of 137

"stale data" that has not been updated. Access locks prevent other users from obtaining
the following locks on the locked data:

 Exclusive locks

What Type of Lock?

Match the type of lock to the descriptions:

Allows other users to see a stable version of the data, but not make any
modifications.
Allows other users to obtain an access lock only, not any other type of
lock.
This kind of lock cannot be applied to rows.

Feedback:
Show Answers Reset

Exercise 5.1

True or False: If a single disk drive fails, the Teradata Database


halts, then restarts.

j
k
l
m
n A. True
j
k
l
m
n B. False
Feedback:

To review this topic, click RAID 1: How It Handles Failures or


RAID 5: How It Handles Failures.

Exercise 5.2

RAID 5 protects data from disk failures using:

j
k
l
m
n A. DARDAC
j
k
l
m
n B. Mirroring
j C. Parity Striping
k
l
m
n
j D. Partitioning
k
l
m
n

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 105 of 137

Feedback:

To review this topic, click RAID 5: How It Works.

Exercise 5.3

Match the type of journal to the appropriate phrase:

Stores before-images and after-images.


Protects data from a transaction that does not complete.
Starts logging changes for a Fallback table when an AMP goes down.

Feedback:
Show Answers Reset

To review this topic, click Down-AMP Recovery Journal, Transient Journal, or Permanent Journals.

Exercise 5.4

Which three statements are true? (Choose three.)

c
d
e
f
g A. Fallback protects data from the failure of one AMP per cluster.
c B. A clique provides protection in the case of a node failure.
d
e
f
g
c C. ARC protects disk arrays from electrostatic discharge.
d
e
f
g
c D. Locks prevent multiple users from simultaneously changing the same data.
d
e
f
g

Feedback:
Check Answer Show Answer

To review these topics, click Fallback, Cliques Provide Resiliency, Archival Utilities, or Locks.

Exercise 5.5

True or False: Restoration of Fallback-protected data starts automatically when a failed AMP is
brought online.

j
k
l
m
n A. True
j B. False
k
l
m
n
Feedback:

2. True or False: Fallback protection is specified at the row hash level.

j
k
l
m
n A. True
j
k
l
m
n B. False
Feedback:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 106 of 137

To review these topics, click Fallback, Fallback: How It Handles Failures.

Exercise 5.6

From the drop-down boxes below, match the storage concepts to the descriptions:

The collection of pdisks used to store data. This space is assigned to an AMP.
A collection of areas across the disk drives in a rank. The operating system sees this as
its logical "disk."
A collection of AMPs that keeps Fallback copies of rows for each other in case one AMP
fails.
An area of a LUN (also known as a user slice in UNIX or partition in Microsoft Windows)
that stores user data.
A collection of disk drives used to provide data availability.

Feedback:
Show Answers Reset

To review these topics, click Assigning Pdisks to AMPs, Creating LUNs, Fallback: How It Works,
Pdisks: User Data Space, or RAID 5: How It Works.

Mod 6 - Indexes

Objectives

After completing this module, you should be able to:

 List tasks Teradata Database Administrators never have to perform.


 Define primary and secondary indexes and their purposes.
 Distinguish between a primary index and a primary key.
 Distinguish between a UPI and a NUPI.
 Define a Partitioned Primary Index (PPI) and its purpose.
 Distinguish between a USI and a NUSI.
 Explain the makeup of the Row-ID and its role in row storage.
 Describe the sequence of events for locating a row.
 Explain the roles of the hashing algorithm and hash map in locating a row.
 Describe the operation of a full-table scan.

HOT TIP: This module contains links to important supplemental course


information. Please be sure to click on each hotword link to capture all of the
training content.

Indexes in the Teradata Database

Indexes are used to access rows from a table without having to search the whole table.
In the Teradata Database, an index is made up of one or more columns in a table. Once

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 107 of 137

Teradata Database indexes are selected, they are maintained by the system. While
other vendors may require data partitioning or index maintenance, these tasks are
unnecessary with the Teradata Database.

In the Teradata Database, there are two types of indexes:

 Primary Indexes define the way the data is distributed.


 Primary Indexes and Secondary Indexes are used to locate the data rows more
efficiently than scanning the whole table.

You specify which column or columns are used as the Primary Index when you create a
table. Secondary Index columns can be specified when you create a table or at any time
during the life of the table.

Data Distribution

When the Primary Index for a table is well chosen, the rows are evenly distributed across
the AMPs for the best performance. The way to guarantee even distribution of data is by
choosing a Primary Index whose columns contain unique values. The values do not
have to be evenly spaced, or even "truly random," they just have to be unique to be
evenly distributed.

Each AMP is responsible for a subset of the rows in a table. If the data is evenly
distributed, the work is evenly divided among the AMPs so they can work in parallel and
complete their processing about the same time. Even data distribution is critical to
performance because it optimizes the parallel access to the data.

Unevenly distributed data, also called "skewed data," causes slower response time as
the system waits for the AMP(s) with the most data to finish their processing. The
slowest AMP becomes a bottleneck. If distribution is skewed, an all-AMP operation will
take longer than if all AMPs were evenly utilized.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 108 of 137

When data is loaded into the Teradata Database:

 The system automatically distributes the data across the AMPs based on row
content (the Primary Index values).
 The distribution is the same regardless of the data volume being loaded. In other
words, large tables are distributed the same way as small tables.

Data is not distributed in any particular order. The benefits of having unordered data are
that they don't need any maintenance to preserve order, and they are independent
of any query being submitted. The automatic, unordered distribution of data eliminates
tasks for a Teradata Database Administrator that are necessary with some other
relational database systems. The DBA does not waste time on labor-intensive data
maintenance tasks.

Teradata Database Manageability

A key benefit of the Teradata Database is its manageability. The list of tasks that
Teradata Database Administrators do not have to do is long, and illustrates why the
Teradata Database system is so easy to manage and maintain compared to other
databases.

Things Teradata Database Administrators Never Have to Do

Teradata Database Administrators never have to do the following tasks:

 Reorganize data or index space.


 Pre-allocate table/index space.
 Physical partitioning of disk space.
 While it is possible to have partitioned indexes in the Teradata Database,
they are not required, and they are created logically.
 Pre-prepare data for loading (convert, sort, split, etc.).
 Unload/reload data spaces due to expansion. With the Teradata Database, the
data can be redistributed on the larger configuration with no offloading and
reloading required.
 Write or run programs to split input source files into partitions for loading.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 109 of 137

With the Teradata Database, the workload for creating a table of 100 rows is the same
as creating a table with 1,000,000,000 rows. Teradata Database Administrator know that
if data doubles, the system can expand easily to accommodate it. The Teradata
Database provides huge cost advantages, especially when it comes to staffing Database
Administrators. Customers tell us that their DBA staff requirements for administering
non-Teradata databases are three to four times higher.

How Other Databases Store Rows and Manage Data

Even data distribution is not easy for most databases to do. Many databases use range
distribution, which creates intensive maintenance tasks for the DBA. Others may use
indexes as a way to select a small amount of data to return the answer to a query. They
use them to avoid accessing the underlying tables if possible. The assumption is that the
index will be smaller than the tables so they will take less time to read. Because they
scan indexes and use only part of the data in the index to search for answers to a query,
they can carry extra data in the indexes, duplicating data in the tables. This way they do
not have to read the table at all in some cases. This is not as efficient as the Teradata
Database's method of data storage and access.

Other DBAs have to ask themselves questions like:

 How should I partition the data?


 How large should I make the partitions?
 Where do I have data contention?
 How are the users accessing the data?

Many other databases require the DBAs to manually partition the data. They might
place an entire table in a single partition. The disadvantage of this approach is it creates
a bottleneck for all queries against that data. It is not the most efficient way to either
store or access data rows.

With other databases, adding, updating and deleting data affects manual data
distribution schemes thereby reducing query performance and requiring reorganization.
A Teradata Database provides high performance because it distributes the data evenly
across the AMPs for parallel processing. No partitioning or data re-organizations are
needed. With the Teradata Database, your DBA can spend more time with users
developing strategic applications to beat your competition!

What Do You Think?

Which two statements are true about data distribution and Teradata Database indexes?
(Choose two.)

c A. If a table has 103 rows and there are 4 AMPs in the system, each AMP will not
d
e
f
g
have exactly the same number of rows from that table. However, if the Primary Index is
chosen well, each AMP will still contain some rows from that table.
c B. The rows of a table are stored on a single disk for best access performance.
d
e
f
g
c C. Skewed data leads to poor performance in processing data access requests.
d
e
f
g

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 110 of 137

c D. Teradata Database performance can be increased by maintaining the indexes and


d
e
f
g
conducting periodic data partitioning and sorting.
Feedback:
Check Answer Show Answer

Primary Index

A Primary Index (PI) is the physical mechanism for assigning a data row to an AMP
and a location on the AMPs disks. It is also used to access rows without having to
search the entire table. A Primary Index operation is always a one-AMP operation.
You specify the column(s) that comprise the Primary Index for a table when the table is
created. For a given row, the Primary Index value is the combination of the data values
in the Primary Index columns.

Choosing a Primary Index for a table is perhaps the most critical decision a database
designer makes, because this choice affects both data distribution and access.

Primary Index Rules

The following rules govern how Primary Indexes in a Teradata Database must be
defined as well as how they function:

Rule 1: One Primary Index per table.


Rule 2: A Primary Index value can be unique or non-unique.
Rule 3: The Primary Index value can be NULL.
Rule 4: The Primary Index value can be modified.
Rule 5: The Primary Index of a populated table cannot be modified.
Rule 6: A Primary Index has a limit of 64 columns.

Rule 1: One PI Per Table

Each table must have a Primary Index. The Primary Index is the way the system
determines where a row will be physically stored. While a Primary Index may be
composed of multiple columns, the table can have only one (single- or multiple-column)
Primary Index.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 111 of 137

Rule 2: Unique or Non-Unique PI

There are two types of Primary Index:

 Unique Primary Index (UPI) - For a given row, the combination of the data values
in the columns of a Unique Primary Index are not duplicated in other rows within
the table, so the columns are unique. This uniqueness guarantees even data
distribution and direct access. For example, in the case where old employee
numbers are sometimes recycled, the combination of the Social Security Number
and Employee Number columns would be a UPI. With a UPI, there is no duplicate
row checking done during a load, which makes it a faster operation.

 Non-Unique Primary Index (NUPI) - For a given row, the combination of the data
values in the columns of a Non-Unique Primary Index can be duplicated in other
rows within the table. So, there can be more than one row with the same PI
value. A NUPI can cause skewed data, but in specific instances can still be a
good Primary Index choice. For example, either the Department Number column or
the Hire Date column might be a good choice for a NUPI if you will be accessing
the table most often via these columns.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 112 of 137

Rule 3: PI Can Be NULL

If the Primary Index is unique, you could have one row with a null value. If you have
multiple rows with a null value, the Primary Index must be Non-Unique.

Rule 4: PI Value Can Be Modified

The Primary Index value can be modified. In the table below, if Loretta Ryan changes
departments, the Primary Index value for her row changes.

When you update the index value in a row, the Teradata Database re-hashes it and
redistributes the row to its new location based on its new index value.

Rule 5: PI Cannot Be Modified

The Primary Index of a table cannot be modified.

In the event that you need to change the Primary Index, you must drop the table,
recreate it with the new Primary Index, and reload the table.

The ALTER TABLE statement allows you to change the PI of a table if the table is
empty.

Rule 6: PI Has 64-Column Limit

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 113 of 137

You can designate a Primary Index that is composed of 1 to 64 columns.

SQL Syntax for Creating a Primary Index

When a table is created, it must have a Primary Index specified. The Primary Index is
designated in the CREATE TABLE statement in SQL.

If you do not specify a Primary Index in the CREATE TABLE statement, the system
will use the Primary Key as the Primary Index. If a Primary Key has not been specified,
the system will choose the first unique column. If there are no unique columns, the
system will use the first column in the table and designate it as a Non-Unique Primary
Index.

Creating a Unique Primary Index

The SQL syntax to create a Unique Primary Index is:

CREATE TABLE sample_1


(col_a INT
,col_b INT
,col_c INT)
UNIQUE PRIMARY INDEX (col_b);

Creating a Non-Unique Primary Index

The SQL syntax to create a Non-Unique Primary Index is:

CREATE TABLE sample_2


(col_x INT
,col_y INT
,col_z INT)
PRIMARY INDEX (col_x);

Modifying the Primary Index of a Table

As mentioned in the Primary Index rules, you cannot modify the Primary Index of a table.
In the event that you want to change the Primary Index, you must drop the table,
recreate it with the new Primary Index, and reload the table.

Data Mechanics of Primary Indexes

This section describes how Primary Indexes are used in:

 Data distribution
 Data access

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 114 of 137

Distributing Rows to AMPs

The Teradata Database uses hashing to randomly distribute data across all AMPs for
balanced performance. For example, in a two clique system, data is hashed across all
AMPs in the system for even data distribution, which results in evenly distributed
workloads. Each AMP holds a portion of the rows of each table. An AMP is responsible
for the storage, maintenance, and retrieval of the data under its control. The Teradata
Database's automatic hash distribution eliminates costly data maintenance tasks. An
advantage of the Teradata Database is that the Teradata File System manages
data and disk space automatically, which eliminates the need to rebuild indexes
when tables are updated or structures change.

Rows are distributed to AMPs during the following operations:

 Loading data into a table (one or more rows, using a data loading utility)
 Inserting or updating rows (one or more rows, using SQL)
 Changing the system configuration (redistribution of data, caused by
reconfigurations to add or delete AMPs)

When loading data or inserting rows, the data being affected by the load or insert is not
available to other users until the transaction is complete. During a reconfiguration, no
data is accessible to users until the system is operational in its new configuration.

Row Distribution Process

The process the system uses for inserting a row on an AMP is described below:

1. The system uses the Primary Index value in each row as input to the hashing
algorithm.
2. The output of the hashing algorithm is the row hash value (in this example, 646).
3. The system looks at the hash map, which identifies the specific AMP where the
row will be stored (in this example, AMP 3).
4. The row is stored on the target AMP.

Duplicate Row Hash Values

It is possible for the hashing algorithm to end up with the same row hash value for two

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 115 of 137

different rows. There are two ways this could happen:

 Duplicate NUPI values: If a Non-Unique Primary Index is used, duplicate NUPI


values will produce the same row hash value.
 Hash synonym: Also called a hash collision, this occurs when the hashing
algorithm calculates an identical row hash value for two different Primary Index
values. Hash synonyms are rare. When using a Unique Primary Index, you will still
get uniform data distribution.

To differentiate each row in a table, every row is assigned a unique Row ID. The Row
ID is the combination of the row hash value and a uniqueness value.

Row ID = Row Hash Value + Uniqueness Value

The uniqueness value is used to differentiate between rows whose Primary Index
values generate identical row hash values. In most cases, only the row hash value
portion of the Row ID is needed to locate the row.

When each row is inserted, the AMP adds the row ID, stored as a prefix of the row. The
first row inserted with a particular row hash value is assigned a uniqueness value of 1.
The uniqueness value is incremented by 1 for any additional rows inserted with the
same row hash value.

Duplicate Rows

A duplicate row is a row in a table whose column values are identical to another row in
the same table. In other words, the entire row is the same, not just the index. Although
duplicate rows are not allowed in the relational model (because every Primary Key must
be unique), the ANSI Standard does allow duplicate rows and the Teradata Database
supports that.

Because duplicate rows are allowed in the Teradata Database, how does it affect the
UPI, which, by definition, is unique? When you create a table, the following definitions
determine whether or not it can contain duplicate rows:

 MULTISET tables: May contain duplicate rows. The Teradata Database will not
check for duplicate rows.
 SET tables: The default. The Teradata Database checks for and does not permit
duplicate rows. If a SET table is created with a Unique Primary Index, the check for
duplicate rows is replaced by a check for duplicate index values.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 116 of 137

Accessing a Row With a Primary Index

When a user submits an SQL request against a table using a Primary Index, the request
becomes a one-AMP operation, which is the most direct and efficient way for the
system to find a row. The process is explained below.

Hashing Process

1. The primary index value goes into the hashing algorithm.


2. The output of the hashing algorithm is the row hash value.
3. The hash map points to the specific AMP where the row resides.
4. The PE sends the request directly to the identified AMP.
5. The AMP locates the row(s) on its vdisk.
6. The data is sent over the BYNET to the PE, and the PE sends the answer set on to
the client application.

Choosing a Unique or Non-Unique Primary Index

Criteria for choosing a Primary Index include:

 Uniqueness: A UPI is often a good choice because it:


 Guarantees even data distribution.
 Eliminates duplicate row checking during a load, which makes it a faster
operation.

A NUPI with few duplicate values could provide good (if not perfectly uniform)
distribution, and might meet the other criteria better.

 Known Access Paths - Use in value access: Retrievals, updates, and deletes
that specify the Primary Index are much faster than those that do not. Because a
Primary Index is a known access path to the data, it is best to choose column(s)
that will be frequently used for access. For example, the following SQL statement
would directly access a row based on the equality WHERE clause:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 117 of 137

SELECT * FROM employee WHERE employee_ID = ABC456789

A NUPI may be a better choice if the access is based on another, mostly


unique column. For example, the table may be used by the Mail Room to
track package delivery. In that case, a column containing room numbers or
mail stops may not be unique if employees share offices, but a better choice
for access.

 Join Performance - Use in join access: SQL requests that use a JOIN statement
perform the best when the join is done on a Primary Index. Consider Primary Key
and Foreign Key columns as potential candidates for Primary Indexes. For
example, if the Employee table and the Payroll table are related by the Employee
ID column, then the Employee ID column could be a good Primary Index choice for
one or both of the tables.

For join performance, a NUPI can be a better choice than a UPI.

 Non-volatile values: Look for columns where the values do not change frequently.
For example, in an Invoicing table, the outstanding balance column for all
customers probably has few duplicates, but probably changes too frequently to
make a good Primary Index. A customer ID, statement number, or other more
stable columns may be better choices.

When choosing a Primary Index, try to find the column(s) that best fit these criteria and
the business need.

What Do You Think?

Which three are key considerations in choosing a Primary Index? (Choose three.)

c
d
e
f
g A. Column(s) containing unique (or nearly unique) values for uniform distribution.
c B. Column(s) with values in sequential order for best load and access performance.
d
e
f
g
c C. Column(s) frequently used in queries to access data or to join tables.
d
e
f
g
c D. Column(s) with values that are stable (do not change frequently), to minimize
d
e
f
g
redistribution of table rows.
c E. Column(s) with many duplicate values for redundancy.
d
e
f
g

Feedback:
Check Answer Show Answer

Partitioned Primary Index

The Teradata Database provides an indexing mechanism called Partitioned Primary


Index (PPI). PPI is used to:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 118 of 137

 Improve performance for large tables when you submit queries that specify a
range constraint.
 Reduce the number of rows to be processed by using a technique called partition
elimination.
 Increase performance for incremental data loads, deletes, and data access when
working with large tables with range constraints.
 Instantly drop old data and rapidly add new data.
 Avoid full-table scans without the overhead of a Secondary Index.

How Does PPI Work?

Data distribution with PPI is still based on the Primary Index:


Primary Hash
Determines which AMP gets the row
Index Value

With PPI, the ORDER in which the rows are stored on the AMP is affected. Using the
traditional method, No Partitioned Primary Index (NPPI), the rows are stored in row hash
order.

4 AMPs with Orders Table Defined with NPPI

Using PPI, the rows are stored first by partition and then by row hash. In our example,
there are four partitions. Within the partitions, the rows are stored in row hash order.

4 AMPs with Orders Table Defined with PPI on O_Date

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 119 of 137

With PPI, the Optimizer uses partition elimination to eliminate partitions that are not
included in the query. This reduces the number of partitions to be accessed and rows to
be processed. For example, in the table above, a query specifying the date 02/09 allows
the Optimizer to eliminate the other partitions so each AMP can access just the 02/09
partition to retrieve the rows.

The multilevel PPI feature improves response to business questions. Specifically, it


improves the performance of queries that can take advantage of partition elimination.

For example, an insurance claims table could be partitioned by claim date and then sub-
partitioned by state. The analysis performed for a specific state (such as Connecticut)
within a date range that is a small percentage of the many years of claims history in the
data warehouse (such as March 2006) would take advantage of partition elimination for
faster performance.

Similarly, a retailer may commonly run an analysis of retail sales for a particular district
(such as eastern Canada ) for a specific timeframe (such as the first quarter of 2004) on
a table partitioned by date of sale and sub-partitioned by sales district.

Data Storage Using PPI

To store rows using PPI: specify Partitioning in the CREATE TABLE statement. The
query will run through the hashing algorithm as normal, and come out with the Base
Table ID, the Partition number(s), the Row Hash, and the Primary Index values.
Data Storage Using PPI

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 120 of 137

Access Without a PPI

Let's say you have a table with Store information by Location and did not use a PPI. If
you query on Location 3 on this NPPI table, the entire table will be scanned to find
records for Location (Full-Table Scan).
Access Without a PPI

SELECT * FROM Store_NPPI


QUERY
WHERE Location_Number = 3;
PLAN ALL-AMPs - Full-Table Scan

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 121 of 137

Access with a PPI

In the same example for a PPI table, you would partition the table with as many
Locations as you have (or will soon have in the future.) Then if you query on Location 3,
each AMP will use partition elimination and each AMP only has to scan partition 3 for the
query. This query will run much faster than the Full-Table Scan in the previous example.
Access With a PPI
SELECT * FROM Store
QUERY
WHERE Location_Number = 3;

PLAN
ALL-AMPs - Single Partition
Scan

Multi-Level Partitioned Primary Index

Multi-level partitioning allows each partition, (i.e., PPI) to be sub-partitioned. With MLPPI you can
use multiple partitioning expressions instead of only one for a table or a non-compressed join
index. Each partitioning level is defined independently using a Range_N or Case_N expression.

With a multi-level PPI (MLPPI), you create multiple access paths to the rows in the base table that
the Optimizer can choose from. This improves response to business questions by improving
the performance of queries which take advantage of partition elimination.

For example, an insurance claims table could be partitioned by claim date and then sub-
partitioned by state. The analysis performed for a specific state (such as Connecticut) within a
date range that is a small percentage of the many years of claims history in the data warehouse
(such as March 2006) would take advantage of partition elimination for faster performance.

Note: an MLPPI table must have at least two partition levels defined.

Syntax:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 122 of 137

Advantages and Disadvantages

Advantages of partitioned tables:

 They provide efficient searches by using partition elimination at the various


levels or combination of levels.
 They reduce the I/O for range constraint queries
 They take advantage of dynamic partition elimination
 They provide multiple access paths to the data, and an MLPPI provides even more
partition elimination and more partitioning expression choices, (i.e., you can use last
name or some other value that is more readily available to query on.)
 The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other
similar entities
 Row hash locks are used for SELECT with equality conditions on the PI columns.
 Partitioned tables allow for fast deletes of data in a partition.
 They allow for range queries without having to use a secondary index.
 Specific partitions maybe archived or deleted.
 May be created on Volatile tables; global temp tables, base tables, and non-
compressed join indexes.
 May replace a Value Ordered NUSI for access.

Disadvantages of partitioned tables:

 Rows in a partitioned table are 2 bytes longer.


 Access via the Primary Index may take longer.
 Full table joins to a NPPI table with the same PI may take longer.

What is a NoPI Table?

A NoPI Table is simply a table without a primary index. It is a Teradata 13.00 feature. As
rows are inserted into a NoPI table, rows are always appended at the end of the table
and never inserted in the middle of a hash sequence. Organizing/sorting rows based
on row hash is therefore avoided.

Prior to Teradata Database 13.00, Teradata tables required a primary index. The primary
index was primarily used to hash and distribute rows to the AMPs according to hash
ownership. The objective was to divide data as evenly as possible among the AMPs to
make use of Teradata’s parallel processing. Each row stored in a table has a RowID
which includes the row hash that is generated by hashing the primary index value. For
example, the optimizer can choose an efficient single-AMP execution plan for SQL
requests that specify values for the columns of the primary index.

Starting with Teradata Database 13.00, a table can be defined without a primary index.
This feature is referred to as the NoPI Table feature. NoPI stands for No Primary Index.

Without a PI, the hash value as well as AMP ownership of a row is arbitrary. Within the
AMP, there are no row-ordering constraints and therefore rows can be appended to the
end of the table as if it were a spool table. Each row in a NoPI table has a hash bucket

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 123 of 137

value that is internally generated. A NoPI table is internally treated as a hashed table; it
is just that typically all the rows on one AMP will have the same hash bucket value.

Benefits:

 A NoPI table will reduce skew in intermediate ETL tables which have no natural PI.
 Loads (FastLoad and TPump array insert) into a NoPI staging table are faster.

Secondary Index

A Secondary Index (SI) is an alternate data access path. It allows you to access the data
without having to do a full-table scan. Secondary indexes do not affect how rows are
distributed among the AMPs.

You can drop and recreate secondary indexes dynamically, as they are needed. Unlike
Primary Indexes, Secondary Indexes are stored in separate subtables that require extra
overhead in terms of disk space, and maintenance which is handled automatically by the
system. So, Secondary Indexes do require some system resources.

What Do You Think?

In what instances would it be a good idea to define a Secondary Index for a table? (This
information will be covered in this module, but here is a preview.)

j The Primary Index exists for even data distribution and access, but a Secondary
k
l
m
n
Index is defined to efficiently generate reports based on a different set of columns.
j The Product table is accessed by the retailer (who accesses data based on the
k
l
m
n
retailer's product code column), and by a vendor (who access the same data based on
the vendor's product code column).
j The table already has a Unique Primary Index, but a second column must also have
k
l
m
n
unique values. The column is specified as a Unique Secondary Index (USI) to enforce
uniqueness on the second column.
j All of the above.
k
l
m
n
Feedback:

Secondary Index Rules

Several rules that govern how Secondary Indexes must be defined and how they
function are:

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 124 of 137

Rule 1: Secondary Indexes are optional.


Rule 2: Secondary Index values can be unique or non-unique.
Rule 3: Secondary Index values can be NULL.
Rule 4: Secondary Index values can be modified.
Rule 5: Secondary Indexes can be changed.
Rule 6: A Secondary Index has a limit of 64 columns.

Rule 1: Optional SI

While a Primary Index is required, a Secondary Index is optional. If one path to the data
is sufficient, no Secondary Index need be defined.

You can define 0 to 32 Secondary Indexes on a table for multiple data access paths.
Different groups of users may want to access the data in various ways. You can define a
Secondary Index for each heavily used access path.

Rule 2: Unique or Non-Unique SI

Like Primary Indexes, Secondary Indexes can be unique or non-unique.

 A Unique Secondary Index (USI) serves two possible purposes:

 Enforces uniqueness on a column or group of columns. The database will


check USIs to see if the values are unique. For example, if you have chosen
different columns for the Primary Key and Primary Index, you can make the
Primary Key a USI to enforce uniqueness on the Primary Key.

 Speeds up access to a row (data retrieval speed). Accessing a row with a


USI requires one or two AMPs, which is less direct than a UPI (one AMP)
access, but more efficient than a full-table scan.

 A Non-Unique Secondary Index (NUSI) is usually specified to prevent full-table


scans, in which every row of a table is read. The Optimizer determines whether a
full-table scan or NUSI access will be more efficient, then picks the best method.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 125 of 137

Accessing a row with a NUSI requires all AMPs.

Rule 3: SI Can Be NULL

As with the Primary Index, the Secondary Index column may contain NULL values.

Rule 4: SI Value Can Be Modified

The values in the Secondary Index column may be modified as needed.

Rule 5: SI Can Be Changed

Secondary Indexes can be changed. Secondary Indexes can be created and dropped
dynamically as needed. When the index is dropped, the system physically drops the
subtable that contained it.

Rule 6: SI Has 64-Column Limit

You can designate a Secondary Index that is composed of 1 to 64 columns. To use the
Secondary Index below, the user would specify both Budget and Manager Employee
Number.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 126 of 137

Other Secondary Indexes

Join Index

Join indexes have several uses:

 Define a pre-join table on frequently joined columns (with optional data


aggregation) without denormalizing the database.
 Create a full or partial replication of a base table with a primary index on a foreign
key column to facilitate joins of very large tables by hashing their rows to the same
AMP as the large table.
 Define a summary table without denormalizing the database.

You can define a join index on one or several tables. Single-table join index functionality
is an extension of the original intent of join indexes, hence the confusing adjective "join"
used to describe a single-table join index.

Sparse Index

Any join index, whether simple or aggregate, multi-table or single-table, can be sparse. A
sparse join index uses a constant expression in the WHERE clause of its definition to
narrowly filter its row population. This is known as a Sparse Join Index.

Hash Index

Hash indexes are used for the same purposes as single-table join indexes. Hash
indexes create a full or partial replication of a base table with a primary index on a
foreign key column to facilitate joins of very large tables by hashing them to the same
AMP.

You can only define a hash index on a single table. Hash indexes are not indexes in the
usual sense of the word. They are base tables that cannot be accessed directly by a
query.

Value-Ordered NUSI

Value-ordered NUSIs are very efficient for range constraints and conditions with an
inequality on the secondary index column set. Because the NUSI rows are sorted by
data value, it is possible to search only a portion of the index subtable for a given range
of key values. Thus, the major advantage of a value-ordered NUSI is in the performance
of range queries.

Value-ordered NUSIs have the following limitations:

 The sort key is limited to a single numeric column.


 The sort key column cannot exceed four bytes.
 They count as two indexes against the total of 32 non-primary indexes you can
define on a base or join index table.

Join Indexes

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 127 of 137

A Join Index is an optional index which may be created by a User. Join indexes provide
additional processing efficiencies:

 Eliminate base table access


 Eliminate aggregate processing
 Reduce Joins
 Eliminate redistributions

The three basic types of join indexes commonly used with Teradata will be described
first:

Single Table Join Index

 Distribute the rows of a single table on the hash value of a foreign key value.
 Facilitates the ability to join the foreign key table with the primary key table without
redistributing the data.
 Useful for resolving joins on large tables without having to redistribute the joined
rows across the AMPs.

Multi-Table Join Index

 Pre-join multiple tables; stores and maintains the result from joining two or more
tables.
 Facilitates join operations by possibly eliminating join processing or by
reducing/eliminating join data redistribution.

Aggregate Join Index

 Aggregate one or more columns of a single table or multiple tables into a summary
table.
 Facilitates aggregation queries by eliminating aggregation processing. The pre-
aggregated values are contained in the AJI instead of relying on base table
calculations.

A join index is a system-maintained index table that stores and maintains the joined rows
of two or more tables (multiple table join index) and, optionally, aggregates selected
columns, referred to as an aggregate join index.

Join indexes are defined in a way that allows join queries to be resolved without
accessing or joining their underlying base tables. A join index is useful for queries where
the index structure contains all the columns referenced by one or more joins, thereby
allowing the index to cover all or part of the query. For obvious reasons, such an index is
often referred to as a covering index. Join indexes are also useful for queries that
aggregate columns from tables with large cardinalities. These indexes play the role of
pre-join and summary tables without denormalizing the logical design of the database
and without incurring the update anomalies presented by denormalized tables.

Using Secondary Indexes

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 128 of 137

In the table below, users will be accessing data based on the Department Name column.
The values in that column are unique, so it has been made a USI for efficient access. In
addition, the company wants reports on how many departments each manager is
responsible for, so the Manager Employee Number can also be made a secondary
index. It has duplicate values, so it is a NUSI.

How Secondary Indexes Are Stored

Secondary indexes are stored in index subtables. The subtables for USIs and NUSIs are
distributed differently:

 USI: The Unique Secondary Indexes are hash distributed separately from the data
rows, based on their USI value. (As you remember, the base table rows are
distributed based on the Primary Index value). The subtable row may be stored on
the same AMP or a different AMP than the base table row, depending on the hash
value.

 NUSI: The Non-Unique Secondary Indexes are stored in subtables on the same
AMPs as their data rows. This reduces activity on the BYNET and essentially
makes NUSI queries an AMP-local operation - the processing for the subtable and
base table are done on the same AMP. However, in all NUSI access requests, all
AMPs are activated because the non-unique value may be found on multiple
AMPs.

Data Access Without a Primary Index

You can submit a request without specifying a Primary Index and still access the data.
The following access methods do not use a Primary Index:

 Unique Secondary Index (USI)


 Non-Unique Secondary Index (NUSI)
 Full-Table Scan

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 129 of 137

Accessing Data with a USI

When a user submits an SQL request using the table name and a Unique Secondary
Index, the request becomes a one- or two-AMP operation, as explained below.

USI Access

1. The SQL is submitted, specifying a USI (in this case, a customer number of 56).
2. The hashing algorithm calculates a row hash value (in this case, 602).
3. The hash map points to the AMP containing the subtable row corresponding to the
row hash value (in this case, AMP 2).
4. The subtable indicates where the base row resides (in this case, row 778 on AMP
4).
5. The message goes back over the BYNET to the AMP with the row and the AMP
accesses the data row (in this case, AMP 4).
6. The row is sent over the BYNET to the PE, and the PE sends the answer set on to
the client application.

As shown in the example above, accessing data with a USI is typically a two-AMP
operation. However, it is possible that the subtable row and base table row could end up
being stored on the same AMP, because both are hashed separately. If both were on
the same AMP, the USI request would be a one-AMP operation.

Accessing Data with a NUSI

When a user submits an SQL request using the table name and a Non-Unique
Secondary Index, the request becomes an all-AMP operation, as explained below.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 130 of 137

NUSI Access

1. The SQL is submitted, specifying a NUSI (in this case, a last name of "Adams").
2. The hashing algorithm calculates a row hash value for the NUSI (in this case, 567).
3. All AMPs are activated to find the hash value of the NUSI in their index subtables.
The AMPs whose subtables contain that value become the participating AMPs in
this request (in this case, AMP1 and AMP2). The other AMPs discard the
message.
4. Each participating AMP locates the row IDs (row hash value plus uniqueness
value) of the base rows corresponding to the hash value (in this case, the base
rows corresponding to hash value 567 are 640, 222, and 115).
5. The participating AMPs access the base table rows, which are located on the same
AMP as the NUSI subtable (in this case, one row from AMP 1 and two rows from
AMP 2).
6. The qualifying rows are sent over the BYNET to the PE, and the PE sends the
answer set on to the client application (in this case, three qualifying rows are
returned).

Full-Table Scan – Accessing Data Without Indexes

In the Teradata Database, you can access data on any column, whether that column is
an index or not. You can ask any question, of any data, at any time.

If the request does not use a defined index, the Teradata Database does a full-table
scan. A full-table scan is another way to access data without using Primary or
Secondary Indexes. In evaluating an SQL request, the Optimizer examines all possible
access methods and chooses the one it believes to be the most efficient.

While Secondary Indexes generally provide a more direct access path, in some cases

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 131 of 137

the Optimizer will choose a full-table scan because it is more efficient. A request could
turn into a full-table scan when:

 An SQL request searches on a NUSI column with many duplicates. For example, if
a request using last names in a Customer database searched on the very
prevalent "Smith" in the United States, then the Optimizer may choose a full table
scan to efficiently find all the matching rows in the result set.

 An SQL request uses a non-equality WHERE clause on an index column. For


example, if a request searched an Employee database for all employees whose
annual salary is greater than $100,000, then a full-table scan would be used,
even if the Salary column is an index. In this example, a full-table scan can be
avoided by using an equality WHERE clause on a defined index column.

 An SQL request uses a range WHERE clause on an index column. For example, if
a request searched an Employee database for all employees hired between
January 2001 and June 2001, then a full-table scan would be used, even if the
Hire_Date column is an index.

For all requests, you must specify a value for each column in the index or the Teradata
Database will do a full-table scan. A full-table scan is an all-AMP operation. Every data
block must be read and each data row is accessed only once. As long as the choice
of Primary Index has caused the table rows to distribute evenly across all of the AMPs,
the parallel processing of the AMPs working simultaneously can accomplish the full-table
scan quickly. However, if a Primary Index causes skewed data distribution, all AMP
operations will take longer.

While full-table scans are impractical and even disallowed on some commercial
database systems, the Teradata Database routinely permits ad-hoc queries with full-
table scans.

When choosing between a NUSI and a full-table scan, if the optimizer determines that
there is no selective SI, hash or join index and that most of the rows in the table would
qualify for the answer set if a NUSI were used, it would most likely choose the full-table
scan as the most efficient access method.

If statistics are stale or have not been collected on the NUSI column(s), the optimizer
may choose to do a full-table scan, as it does not have updated data demographics.

Summary of Keys and Indexes

Some fundamental differences between Keys and Indexes are shown below:

Keys Indexes
A relational modeling convention used A Teradata Database mechanism used
in a logical data model. in a physical database design.
Uniquely identify a row (Primary Key). Used for row distribution (Primary Index).
Establish relationships between tables Used for row access (Primary Index and
(Foreign Key). Secondary Index).

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 132 of 137

While most commercial database systems use the Primary Key as a way to retrieve
data, a Teradata Database system does not. In the Teradata Database, you use the
Primary Key only when designing a database, as a mechanism for maintaining
referential integrity according to relational theory. The Teradata Database itself does not
require keys in order to manage the data, and can function fully with no awareness of
Primary Keys.

The Teradata Database's parallel architecture uses Primary Indexes to distribute and
access the data rows. A Primary Index is always required when creating a Teradata
Database table.

A Primary Index may include the same columns as the Primary Key, but does not have
to. In some cases, you may want the Primary Key and Primary Index to be different. For
example, a credit card account number may be a good Primary Key, but customers may
prefer to use a different kind of identification to access their accounts.

Rules for Keys and Indexes

A summary of the rules for keys (in the relational model) and indexes (in the Teradata
Database) is shown below.

Rule Primary Key Foreign Key Primary Index Secondary Index

1 One PK Multiple FKs One PI 0 to 32 SIs

2 Unique values Unique or non-unique Unique or non-unique Unique or non-


unique
3 No NULLs NULLs allowed NULLs allowed NULLs allowed

4 Values should not Values may be Values may be changed Values may be
change changed (redistributes row) changed
5 Column should not Column should not Column cannot be Index may be
change change changed (drop and changed (drop
recreate table) and recreate
index)
6 No column limit No column limit 64-column limit 64-column limit

7 n/a FK must exist as PK n/a n/a


in the related table

Defining Primary and Foreign Keys in the Teradata Database

Although Primary Indexes are required and Primary Keys are not, you do have the
option to define a Primary Key or Foreign Key for any table. When you define a Primary
Key in a Teradata Database table, the RDBMS will implement the specified column(s) as
an index. Because a Primary Key requires unique values, a defined Primary Key is
implemented as one of the following:

 Unique Primary Index (If the DBA did not specify the Primary Index in the

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 133 of 137

CREATE TABLE statement.)

 Unique Secondary Index (If the PK was not chosen to be the PI)

When a Primary Key is defined in Teradata SQL and implemented as an index, the rules
that govern that type of index now apply to the Primary Key. For example, in relational
theory, there is no limit to the number of columns in a Primary Key. However, if you
specify a Primary Key in Teradata SQL, the 64-column limit for indexes now applies to
that Primary Key.

What Do You Think?

Which statement is true? (Choose the best answer.)

j A. A Primary Index is used to distribute data, while a Primary Key is used to uniquely
k
l
m
n
identify a row.
j B. A Primary Key is used to access data, while a Primary Index is used to uniquely
k
l
m
n
identify a row.
j C. In a Teradata Database system, "Primary Key" means the same thing as "Primary
k
l
m
n
Index."
j D. A Primary Index is used to distribute data, while a Primary Key is converted to a
k
l
m
n
hash map.

Feedback:

Exercise 6.1

Which one provides uniform data distribution through the


hashing algorithm?

j
k
l
m
n A. UPI
j
k
l
m
n B. NUPI
j C. Both UPI and NUPI
k
l
m
n
j D. Neither UPI nor NUPI
k
l
m
n

Feedback:

To review this topic, click Rule 2: Unique or Non-Unique PI or


Distributing Rows to AMPs.

Exercise 6.2

The output from the hashing algorithm is the:

j
k
l
m
n A. hash map

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 134 of 137

j
k
l
m
n B. uniqueness value
j
k
l
m
n C. row ID
j D. row hash
k
l
m
n

Feedback:

To review this topic, click Distributing Rows to AMPs.

Exercise 6.3

Choose the appropriate answers from the drop-down boxes that complete each sentence:

Accessing a row with a Unique Secondary Index (USI) typically requires AMP(s).
Accessing a row with a Non-Unique Secondary Index (NUSI) requires AMP(s).
A full-table scan accesses row(s).
Accessing a row with a Unique Primary Index (UPI) accesses row(s) on one AMP.
Accessing a row with a Non-Unique Primary Index (NUPI) accesses multiple rows on AMP
(s).

Feedback:
Show Answers Reset

To review these topics, click Accessing a Row With a Primary Index, Accessing Data with a USI,
Accessing Data with a NUSI, Full-Table Scan - Accessing Data Without Indexes.

Exercise 6.4

Which column should be selected as the Primary Index in the CUSTOMER table below? The table
contains information on 50,000 customers of this regional telecommunication services company.
Whenever a customer calls, the call center operator should be able to easily access and confirm
customer information. In addition, the company wants to track all service activities on a per-
household basis. Select the best Primary Index for the business use.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 135 of 137

j
k
l
m
n A. Column 4, because each address is clearly a household, which is what is being tracked.
j
k
l
m
n B. Column 5, because it is nearly unique, easy to remember and input, and can be used for
householding.
j C. Column 2, because most of the customers with the same last name belong to a single
k
l
m
n
household.
j D. Columns 2 and 3 together, because the combination is nearly unique, and it is easy for the
k
l
m
n
customer to remember.
j E. Column 1, because it is the Primary Key and its unique values will cause table rows to be
k
l
m
n
distributed evenly for best performance. Customers must give their Customer ID when calling for
service.

Feedback:

To review this topic, click Choosing a UPI or NUPI.

Exercise 6.5

The row ID helps the system to locate a row in case of a(n):

j
k
l
m
n A. even distribution of rows
j B. Unique Primary Index
k
l
m
n
j C. multi-AMP request
k
l
m
n
j D. hash synonym
k
l
m
n

Feedback:

To review this topic, click Distributing Rows to AMPs or Accessing a Row With a Primary Index.

Exercise 6.6

Which task does a Teradata Database Administrator have to


perform? (Choose one.)

j
k
l
m
n A. Select Primary Indexes
j B. Re-organize data
k
l
m
n
j C. Pre-prepare data for loading
k
l
m
n
j D. Pre-allocate table space
k
l
m
n

Feedback:

To review this topic, click Teradata Database Manageability.

Exercise 6.7

With a ______ you create multiple access paths to the rows in


the base table that the Optimizer can choose from which
improves response to business questions by improving the
performance of queries which take advantage of partition

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 136 of 137

elimination?

j
k
l
m
n A. Multi-Level Partitioned Primary Index (MLPPI)
j
k
l
m
n B. NUPI
j C. Partitioned Primary Index (PPI)
k
l
m
n
j D. NoPI
k
l
m
n

Feedback:

To review this topic, click Multi-Level PPI , What is a NoPI


Table? , Choosing a Unique or Non-Unique Primary Index or
Partitioned Primary Index .

Exercise 6.8

True or False: A NoPI Table is simply a table without a primary


index. As rows are inserted into a NoPI table, rows are always
appended at the end of the table and never inserted in the
middle of a hash sequence. Organizing/sorting rows based on
row hash is therefore avoided.

j
k
l
m
n A. True
j
k
l
m
n B. False

Feedback:

To review this topic, click What is a NoPI Table?.

Exercise 6.9

True or False: If statistics are stale or have not been collected


on the NUSI column(s), the optimizer may choose to do a full-
table scan, as it does not have updated data demographics.

j
k
l
m
n A. True
j
k
l
m
n B. False

Feedback:

To review this topic, click Choosing a Unique or Non-Unique


Primary Index or Data Access without a Primary Index.

Teradata Certification

Teradata Certification

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010
Page 137 of 137

Now that you have learned about the Teradata Database basics, consider the first
level of Teradata Certification, Teradata Certified Professional. Information on the
Teradata Certified Professional Program (TCPP) including exam objectives,
practice questions, test center locations, and registration information is located
on the Teradata Certified Professional Program (TCPP) website. Candidates for
the Teradata Certified Professional Certification must pass the Teradata 12 Basics
Certification exam administered at Prometric testing centers listed on the TCPP
website.

We recommend you review the WBT content and the practice questions located
on the TCPP website before signing up for the official Teradata 12 Basics
Certification exam.

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/18109_IntrotoTer... 11/3/2010

S-ar putea să vă placă și