Documente Academic
Documente Profesional
Documente Cultură
Read Lock:
Read locks are used to ensure consistency during read operations.
Several users may hold concurrent read locks on the same data, during which no modification of the data
is permitted.
Write Lock:Write locks enable users to modify data while locking out all other users except readers not
concerned about data consistency (Access lock readers).
Until a Write lock is released, no new read or write locks are allowed.
Access Lock:Access locks can be specified by users who are not concerned about data consistency.
The use of an access lock allows for reading data while modifications are in process. Access locks are
designed for decision support on large tables that are updated only by small single row changes. Access
locks are sometimes called stale read locks, i.e. you may get stale data that hasnot been updated
=========================================================================
are sent to parser simultaneously and all statements are executed parallel.
In sql Assistance, once you place all queries and press F5. All will be executed parallel. Make sure each
sql statement ends semicolon(;).
=========================================================================
SLEEP to specify the number of minutes FastLoad waits before retrying a logon.The default is 6 minutes.
The statement must be placed before LOGON.
=========================================================================
=========================================================================
What Teradata statistics ?
Statistics collection is essential for the optimal performance of the Teradata query optimizer.
The query optimizer relies on statistics to help it determine the best way to access data.
Statistics also help the optimizer ascertain how many rows exist in tables being queried and predict how
many rows will qualify for given conditions.
Lack of statistics, or out-dated statistics, might result in the optimizer choosing a less-than-optimal method
for accessing data tables.
Points:
1: Once a collect stats is done on the table(on index or column) where is this information stored so that
the optimizer can refer this?
Ans: Collected statistics are stored in DBC.TVFields or DBC.Indexes. However, you cannot query these
two tables.
2: How often collect stats has to be made for a table that is frequently updated?
Answer: You need to refresh stats when 5 to 10% of table's rows have changed. Collect stats could be
pretty resource consuming for large tables. So it is always advisable to schedule the job at off peak period
and normally after approximately 10% of data changes.
3: Once a collect stats has been done on the table how can i be sure that the optimizer is considering this
before execution ? i.e; until the next collect stats has been done will the optimizer refer this?
Ans: Yes, optimizer will use stats data for query execution plan if available. That's why stale stats is
dangerous as that may mislead the optimizer.
=========================================================================
What is a HOT AMP?
When the workload is not distributed across all the AMPs, only a few AMPs end up overburdened with the
work. This is a hot AMP condition.
This typically occurs when the volume of data you are dealing with is high and
(a). You are trying to retrieve the data in a TERADATA table which is not well distributed across the AMPs
on the system (bad Primary Index)
OR
(b). When you are trying to join on column with highly non unique values
OR
(c). When you apply the DISTINCT operator on a column with highly non unique values
=========================================================================
What are different SPACES available in Teradata?
There are mainly 3 types of spaces available.
Perm Space
Temp Space
Spool space
Perm Space :All databases have a defined upper limit of permanent space. Permanent space is used for
storing the data rows of tables. Perm space is not pre-allocated. It
Spool Space :All databases also have an upper limit of spool space. If there is no limit defined for a
particular database or user, limits are inherited from parents.
Theoretically, a user could use all unallocated pace in the system for their query. Spool space is
temporary space used to hold intermediate query results or formatted answer sets to queries. Once the
query is complete, the spool space is released.
Example: You have a database with total disk space of 100GB. You have 10GB of user data and an
additional 10GB of overhead. What is the maximum amount of spool space available for queries?
Answer: 80GB. All of the remaining space in the system is available for spool
Temp Space :The third type of space is temporary space. Temp space is used for Global and Volatile
temporary tables, and these results remain available to the user until the session is terminated. Tables
created in temp space will survive a restart.
=========================================================================
How Does Teradata Store Rows?
Teradata uses hash partitioning and distribution to randomly and evenly distribute data across all AMPs.
The rows of every table are distributed among all AMPs - and ideally will be evenly distributed among all
AMPs.
Down amp recovery journal will start when AMP goes down to restore the data for the down amp.
Fall back is like it has redundant data,if one amp goes down in the cluster also it wont affect your
queries.the query will use data from fall back rows.the down amp wont be updated use the data from fall
back.
=========================================================================
Which one will take care when a NODE goes down?
In the event of node failure, all virtual processors can migrate to another available node in the clique.
All nodes in the clique must have access to the same disk arrays
=========================================================================
What is the use of EXPLIN plan?
The EXPLAIN facility allows you to preview how Teradata will execute a requested query. It returns a
summary of the steps the Teradata RDBMS would perform to execute the request.
EXPLAIN also discloses the strategy and access method to be used, how many rows will be involved,
and its cost in minutes and seconds. Use EXPLAIN to evaluate a query performance
and to develop an alternative processing strategy that may be more efficient. EXPLAIN works on any
SQL request. The request is fully parsed and optimized, but not run. The complete plan is returned to the
user in readable English statements.
EXPLAIN provides information about locking, sorting, row selection criteria, join strategy and conditions,
access method, and parallel step processing. EXPLAIN is useful for performance tuning, debugging, prevalidation of requests, and for technical training.
=========================================================================
The newer ANSI standard COALESCE can also convert a NULL to a zero. However, it can convert a
NULL value to any data value as well.
The COALESCE searches a value list, ranging from one to many values, and returns the first Non-NULL
value it finds.
At the same time, it returns a NULL if all values in the list are NULL.
To use the COALESCE, the SQL must pass the name of a column to the function. The data in the column
is then compared for a NULL. Although one column name is all that is required, normally more than one
column is normally passed to it. Additionally, a literal value, which is never NULL, can be returned to
provide a default value if all of the previous column values are NULL
SELECT
COALESCE ( [, ] )
,( COALESCE([,] ) )
FROM GROUP BY 1 ;
=========================================================================
When will you create PPI and when will you create secondary indexes?
Partitioned Primary Indexes are Created so as to divide the table onto partitions based on Range or
Values as Required. This is effective for Larger Tables partitioned on the Date and integer columns.
There is no extra Overhead on the System (no Spl Tables Created ect )
Secondary Indexes are created on the table for an alternate way to access data. This is the second
fastest method to retrieve data from a table next to the primary index. Sub tables are created.
PPI and secondary indexes do not perform full table scans but they access only a defined st of data in the
AMP's.
=========================================================================
When you chose primary index and when will you choose secondary index?
Primary index will be chosen at the time of table creation. This will help us in data distribution, data
retrieval and join operations.
Secondary indexes can be created and dropped at any time. They are used as an alternate path to
access data other than the primary index.
A user performs actions where as a database is passive. Users have passwords and startup strings;
databases do not. Users can log on to the Teradata Database, establish sessions, and submit SQL
statements; databases cannot.
Creator privileges are associated only with a user because only a user can log on and submit a CREATE
statement. Implicit privileges are associated with either a database or a user because each can hold an
object and an object is owned by the named space in which it resides
=========================================================================
When a PI is not mentioned on a table, how will Teradata consider the PI for that table?
If you don't specify a PI at table create time then Teradata must chose one. For instance, if the DDL is
ported from another database that uses a Primary Key instead of a Primary Index, the CREATE TABLE
contains a PRIMARY KEY (PK) constraint. Teradata is smart enough to know that Primary Keys must be
unique and cannot be null. So, the first level of default is to use the PRIMARY KEY column(s) as a UPI.
If the DDL defines no PRIMARY KEY, Teradata looks for a column defined as UNIQUE. As a second
level default, Teradata uses the first column defined with a UNIQUE constraint as a UPI.
If none of the above attributes are found, Teradata uses the first column defined in the table as a NONUNIQUE PRIMARY INDEX (NUPI).
=========================================================================
What is covered query in Teradata?
If a SELECT query covers all the columns that are defined in the JOIN INDEX as join columns, such type
of queries are called as COVERED query.
=========================================================================
What is identity column?
Teradata has a concept of identity columns on their tables beginning around V2R6.x. These columns
differ from Oracle's sequence concept in that the number assigned is not guaranteed to be sequential.
The identity column in Teradata is simply used to guaranteed row-uniqueness.
Example:
CREATE MULTISET TABLE MyTable
(
ColA INTEGER GENERATED BY DEFAULT AS IDENTITY
(START WITH 1
INCREMENT BY 20)
ColB VARCHAR(20) NOT NULL
)
UNIQUE PRIMARY INDEX pidx (ColA);
Granted, ColA may not be the best primary index for data access or joins with other tables in the data
model. It just shows that you could use it as the PI on the table.
=========================================================================
How to implement UPSERT logic in Teradata using SQL?
We have MERGE-INTO option available in Teradata data which works as an UPSERT logic in teradata. If
the condition matches then update else insert.
ON (Target.dept_no = 20)
=========================================================================
What is value ordered NUSI?
When we define a value ordered NUSI on a column the rows in the secondary subtable get sorted based
on the secondary index value. The columns should be of integer or date type.
This is used for range queries and to avoid full table scans on large tables.
=========================================================================
What are the DBQL tables.
Database Query Log tables are the tables present in DBC database which store the history of all the
operations performed on the tables present in the databases.
The history could get very large so these tables should be purged when the data is no longer needed.
=========================================================================
What is the difference between sample and top?
The Sampling function (SAMPLE) permits a SELECT to randomly return rows from a Teradata database
table.
It allows the request to specify either an absolute number of rows or a percentage of rows to return.
Additionally, it provides an ability to return rows from multiple samples.
The TOP clause is used to specify the number of records to return. The TOP clause can be very useful
on large tables with thousands of records. Returning a large number of records can impact on
performance.
=========================================================================
What is difference between ZEROIFNULL and NULLIFZERO ?
The ZEROIFNULL function: will pass zero when data coming as null
The NULLIFZERO function: will pass null when data coming as zero.
=========================================================================
What is Range_N?
Range_N is defined on a partition primary index to specify the range of values of a column that should be
assigned to a partition.
The number of partitions = the number of ranges specified + no case + unknown
no case > if the value does not belong to any range
unknown -> for the values like nulls, spaces etc
=========================================================================
What is Casting in Teradata ?
=========================================================================
When will you go for hash index?
A hash index organizes the search keys with their associated pointers into a hash file structure.
We apply a hash function on a search key to identify a bucket, and store the key and its associated
pointers in the bucket (or in overflow buckets).
Strictly speaking, hash indices are only secondary index structures, since if a file itself is organized using
hashing, there is no need for a separate hash index structure on it.
=========================================================================
Difference between Global and Volatile Temporary Table? Where they can be used and how they
can be used?
Definitions can be materialized by any user with the necessary DML privileges
An instance an be dropped manually any time during a session with DROP TEMPORARY TABLE
A very large number of global tables can prolong logoff, because materialized global tables are dropped
automatically at session end.
Volatile Tables
Definitions are kept in cache and go away at session end or during a Teradata Database restart
Local to a session
Can be dropped manually any time during a session with DROP TABLE
A very large number of volatile tables can prolong logoff, because volatile tables are dropped
automatically at session end.
=========================================================================
What are the Partitioning Rules:
1)A table can have up to 65,535 Partitions.
=========================================================================
What are the Disadvantages of partitioned tables
1)Rows in a partitioned table are 2 bytes longer.
2)Access via the Primary Index may take longer.
3)Full table joins to a NPPI table with the same PI may take longer.
=========================================================================
Difference between Set and Multiset Table?
The SET table does not allow duplicate rows while the multiset table allow duplicate rows. For a SET
table, to enhance the performance and to enforce uniqueness, a UPI or a USI can be created on a
particular column. For a multiset table, a NUPI or NUSI can be created on a particular column to enhance
performance.
=========================================================================What
are the Advantages and Disadvantages of multi-column PI?
As many as 64 columns may be included in an index. Multi-column Primary Indexes may provide better
distribution.
Advantage
Disadvantage
*More columns = less usability
*PI can only be used when values for all PI columns are provided in SQL statement.
*Partial values do not generate a useful hash result.
=========================================================================
What is Value List Compression? how would you implement compression ?
The Value list compression cannot compress an existing column but it can be used to add a compressed
column to a table.
Use the ALTER TABLE statement to compress columns and reduce the number of I/O operations.
Consider the following:
*Set the column default value to most frequent value.
*Compress to the default value.
*This is especially useful for sparsely populated columns.
*Overhead is not high.
*The I/O savings correlates to the percentage of data compressed out of a row.
Value List Compression (VLC) provides the Teradata database with the capacity to support multiple value
compression for fixed width columns. When you specify a values or values, the system suppresses any
data matching the compress value from a row. This saves disk space. Smaller physical row size results in
less data blocks and fewer I/Os and improved overall performance.
Because VLC allows you to specify a list of compress values for a column, the system suppresses data
when one of the specified values exists in the column. Up to 255 distinct values (plus NULL) may be
compressed per fixed-width column.
=========================================================================
What have you done to resolve the running out of spool problem?
To resolve running out of spool problem,
1) The column involved in the where clause/joining column is an index column or not, should be checked .
2) Also whether statistics have been collected on the index column. If no statistics have been collected
then collect stats on it.
3) Check maxspool and peak spool from dbc.diskspace per amp basis and check for data skew.
4) Also check for improper join condition or join condition missing.
4) Despite of the steps taken, the running out of spool problem comes then explicitly increase the spool
space.
=========================================================================
What are the limitations of Fast load?
1) No Secondary Indexes are allowed on the Target Table: Usually UPI and NUPI are used in Teradata to
distribute the rows evenly across the AMPs.Secondary indexes are stored in a subtable block and many
times on a different AMP from the data row.
2)No Referential Integrity is allowed: The Referential Integrity defined on a table would take more system
checking to prevent referential constraints.
3)No Triggers are allowed at load time: Fast load focused on data load with high speed. So triggers not
allowed.
4)Duplicate Rows (in Multi-Set Tables) are not supported: Multiset tables are allowed duplicate data.
Fastload can load the data into multiset tables but duplicate rows are discarded.
5)No AMPs may go down (i.e., go offline) while FastLoad is processing: The down AMP must be repaired
before the load process can be restarted
6)No more than one data type conversion is allowed per column: Data type conversion cause high
resource utilization on the system
=========================================================================
What are the limitations of Multi load?
1)Unique Secondary Indexes are not supported on a Target Table:Like FastLoad, MultiLoad does not
support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support the use of Non-Unique
Secondary Indexes (NUSIs) because the index subtable row is on the same AMP as the data row.
2)Referential Integrity is not supported: The Referential Integrity defined on a table would take more
system checking to prevent referential constraints.
3)Triggers are not supported at load time: Disable all the Triggers prior to using it.
4)No concatenation of input files is allowed: It could impact are restart if the files were concatenated in a
different sequence or data was deleted between runs.
5)No Join Indexes: All the join indexes must be dropped before running a MultiLoad and then recreate
them after the load is completed
6)Will not process aggregates, arithmetic functions or exponentiation:If you need data conversions or
math, you might be better off using an INMOD to prepare the data prior to loading it
=========================================================================
Explain the phases in Multiload?
Phase 4: Application Phase: In this phase all DML opreations are applied on target tables.
Phase 5: Cleanup Phase: Table locks will be released and all the intermediate work tables will be
dropped.
=========================================================================
What are the limitations of Tpump?
No more than four IMPORT commands may be used in a single load task.
Dates before 1900 or after 1999 must be represented by the yyyy format for the year portion of the date,
not the default format of yy.
On some network attached systems, the maximum file size when using TPump is 2GB.
=========================================================================
How many types of error tables in Fast Load?
Each FastLoad requires two error tables. These are automatically created during run. These will
populated only errors occurred during the load.
The first error table is for any translation errors or constraint violations
For example, if a column is defined as integer but the data from source the data is coming in CHAR
format.i.e wrong data.
The second error table is for errors caused by duplicate values for Unique Primary Indexes.
=========================================================================
What is Quantiles?
Quantiles are used to divide a number of rows into a number of partitions of roughly equal size. The most
common quantile is the percentile which is based on a value of 100. It is also not uncommon to have
quartiles (based on 4), tertiles (based on 3) or deciles (based on 10). Notice that by default, both the
quantile column and the quantile value itself will be output in ascending sequence.
Aggregations may not be combined with OLAP functions, thus a derived or temporary table is sometimes
necessary to produce the aggregation before the function is applied. The standard form of the quantile
function is:
QUANTILE (quantile_constant,sortlist)
=========================================================================
What is WIDTH_BUCKET ?
The WIDTH_BUCKET function assigns partition numbers to values associated with a column or
expression, based on a specified range and based on the number of partitions desired.
Like many OLAP and aggregate functions, WIDTH_BUCKET is used to extract statistical information from
sample populations.
The format of the function is as follows:
WIDTH_BUCKET (column_expr,lower_bound, upper_bound ,partition_count)
=========================================================================
What is Random function?
The RANDOM function may be used to generate a random number between a specified range.
RANDOM (Lower limit, Upper limit) returns a random number between the lower and upper limits
inclusive. Both limits must be specified, otherwise a random number between 0 and approximately 4
billion is generated.
limitations:
RANDOM is non-ANSI standard
RANDOM may be used in a SELECT list or a WHERE clause, but not both
RANDOM may be used in Updating, Inserting or Deleting rows
RANDOM may not be used with aggregate or OLAP functions
RANDOM cannot be referenced by numeric position in a GROUP BY or ORDER BY clause
=========================================================================
Add a comment
http://teradatatuning.blogspot.in/2013/11/curing-slow-inserts-and-create-tables-i.html
How to identify
Hash collisions can be easily found by using PRISE Tuning Assistant tool also, or follow this method:
Explanation
If we use "SET" type of table (this is the default setting), Teradata ensures that there will be no
perfectly alike records in the table. This can be ensured by comparing the inserted/updated record
with the existing ones.
Teradata's "hash filesystem" gives a very effective trick: only those records must be compared, whose
RowID (hash) equals, otherwise at least the PI fields must differ.
If we've chosen the Primary Index for UNIQUE, or non-UNIQUE, but on field(s) that are almost unique,
then the "SET comparison"
restricts to zero or one records in most cases.
Solution
For good solution unfortunately we have to modify the table structure.
Option 1: Change table type to MULTISET. This will eliminate duplication checks, but its
disadvantage is the same. If the process falls back on the de-duplication of SET table, you have
to replace it with programmed de-duplication (group by, left join...).
Option2: Change the table's PI to a unique or nearly unique column set. Be prudent, consider
the workload also. (joins, where conditions, group by expressions, etc.)
Summary
Tables with strongly non unique PI are highly dangered for hash collision slowed INSERTs/CREATE
TABLEs (or UPDATEs), even if they are not skewed. Use "more unique" PI, or MULTISET table.
Typical mistake: if a CREATE TABLE ... as SELECT... lacks the PRIMARY INDEX() section. In this case
Teradata chooses the first column as PI, which often causes terrible performance.
What's next
Next post will discuss Multi Value Compress (MVC) optimization.