Teradata Performance Optimization

*Teradata makes itself the decision to use the index or not - if you are not careful you
spend time in table updates to keep up an index which is no used at all (one cannot give
the query optimizer hints to use some index - though collecting of statistics may affect
the optimizer strategy
*In the !-"#$ environment% look at the script &'etc'gsc'bin'perflook(sh&( This will
provide a system-wide snapshot in a series of files( The )$* uses this data for incident
analysis(
* +hen using an index one must keep sure that the index condition is met in the sub
queries &using I,% nested queries% or derived tables&
* Indication of the proper index use is found by explain log entry &a "-+ .#$.
#T*. $*#, across #//-#!$&
* If the index is not used the result of the analysis is the 012// T#3/4 $*#,0 where the
performance time grows when the size of the history table grows
* 5eeping up an index information is a time'space consuming issue( $ometimes Teradata
is much better when you &manually& imitatate the index 6ust building it from scratch(
* keeping up 6oin index might help% but you cannot multiload to a table which is a part of
the 6oin index - loading with 0tpump0 or pure 0$7/0 is -5 but does not perform as well(
8ropping and re-creating a 6oin index with a big table takes time and space(
* when your Teradata &explain& gives 09:0 steps from your query (even without the
update of the results; and the actual query is a 6oin of six or more tables
Case e.g.
+e had already given up updating the secondary indexes - because we have not had
much use for them(
#fter some trials and errors we ended up to the strategy% where the actual &purchase
frequency analysis& is never made &directly& against the history table(
Instead<
=; There is a &one-shot& run to build the initial &customer0s previous purchase& from the
&purchase history& - it takes time% but that time is saved later
9; The purchase frequency is calculated by 6oining the &latest purchase& with the
&customer0s previous purchase&(
>; +hen the &latest purchase& rows are inserted to the &purchase history& the &customer0s
previous purchase& table is dropped and recreated by merging the &customer0s previous
purchase& with the &latest purchase&
?; 3y following these steps the performance is not too fast yet (about 9: minutes in our
two node system; for a bunch of almost =(@@@(@@@ latest receipts - but it is tolerable now(
(+e also tested by adding both the previous and latest purchase to the same table% but
because its size was in average case much bigger than the pure &latest purchase&% the self-
6oin was slower in that case;
*********
MANAGING CONCURRENT WORKLOADS
Integrated e-commerce efforts present many warehouse challenges. Here's
how Teradata can help.
The word e-commerce means many things to many people( #lthough for some it
connotes only the +eb% the real value of e-commerce can only be realized when all
channels of a business are integrated and have full access to all customer information and
transactions( In fact% to me% e-commerce means using the rich technology available today
to bring added value to the customer and additional value to the business through all
customer interaction channels(
2nder this definition of e-commerce% an active warehouse is at the epicenter% providing
the storage and access for decision making in the e-commerce world( #s more and more
companies adopt active warehousing for this purpose% data warehouse workloads are
expanding and changing(
If your warehouse relies on a Teradata 83$% you0ll find that handling the challenge of
high-volume% widely varying% disparate service-level workloads is one of its core
competencies( -ne of the biggest concerns I hear from customers is how to deal with the
quickly rising number of concurrent queries and concurrent users that can result from
active warehousing and e-commerce initiatives( 4xpected service levels vary widely
among different groups of users% as do query types( #nd% of course% the entire workload
must scale upward linearly as the demand increases% ideally with a minimum of effort
required from users and systems staff( .ere0s a look at some of the most frequent
questions I receive on the sub6ect of mixed workloads and concurrency requirements(

How do I balance the work coming in across all nodes of my Teradata
configuration?
Aou don0t( Teradata automatically balances sessions across all nodes to evenly distribute
work across the entire parallel configuration( 2sers connect to the system as a whole
rather than a specific node% and the system uses a balancing algorithm to assign their
sessions to a node( 3alancing requires no effort from users or system administrators(
Does Teradata balance the work queries cause?
The even distribution of data is the key to parallelism and scalability in Teradata( 4ach
query request is sent to all units of parallelism% each of which has an even portion of the
data to process% resulting in even work distribution across the entire system(
1or short queries and update flow typical of +eb interactions% the optimizer recognizes
that only a single unit of parallelism is needed( # query coordinator routes the work to
the unit of parallelism needed to process the request( The hashing algorithm does not
cluster related data% but spreads it out across the entire system( 1or example% this month0s
data and even today0s data is evenly distributed across all units of parallelism% which
means the work to update or look at that data is evenly distributed(
ill many concurrent requests cause bottlenecks in query coordination?
7uery coordination is carried out by a fully parallel parsing engine (!4; component(
2sually% one or more !4s are present on each node( 4ach !4 handles the requests for a
set of sessions% and sessions are spread evenly across all configured !4s( 4ach !4 is
multithreaded% so it can handle many requests concurrently( #nd each !4 is independent
of the others with no required cross-coordination( The number of users logged on and
requests in flight are limited only by the number of !4s in the configuration(
How do you a!oid bottlenecks when the query coordinator must retrie!e
information from the data dictionary?
In Teradata% the 83$ itself manages the data dictionary( 4ach dictionary table is simply
a relational table% parallelized across all nodes( The same query engine that manages user
workloads also manages the dictionary access% using all nodes for processing dictionary
information to spread the load and avoid bottlenecks( The !4 even caches recently used
dictionary information in memory( 3ecause each !4 has its own cache% there is no
coordination overhead( The cache for each !4 learns the dictionary information most
likely to be needed by the sessions assigned to it(
ith a large !olume of work" how can all requests e#ecute at once?
#s in any computer system% the total number of items that can execute at the same time is
always limited to the number of *!2s available( Teradata uses the scheduling services
2nix and ,T provide to handle all the threads of execution running concurrently( $ome
requests might also exist on other queues inside the system% waiting for I'- from the disk
or a message from the 3A,4T% for example( 4ach work item runs in a threadB each
thread gets a turn at the *!2 until it needs to wait for some external event or until it
completes the current work( Teradata configures several units of parallelism in each $!
node( 4ach unit of parallelism contains many threads of execution that aren0t restricted to
a particular *!2B therefore% every thread gets to compete equally for the *!2s in the
$! node(
There is a limit% of course% to the number of pieces of work that can actually have a thread
allocated in a unit of parallelism( -nce that limit is reached% Teradata queues work for the
threads( 4ach thread is context free% which means that it is not assigned to any session%
transaction% or request( Therefore% each thread is free to work on whatever is next on the
queue( The unit of work on the queue is a processing step for a request( *ombining the
queuing of steps with context-free threads allows Teradata to share the processing service
equally across all the concurrent requests in the system( 1rom the users0 point of view% all
the requests in the system are running% receiving service% and sharing system resources(
How does Teradata a!oid resource contention and the resulting
performance and management problems?
Teradata algorithms are very resource efficient( -ther 83$s optimize for single-query
performance by giving all resources to the single query( 3ut Teradata optimizes for
throughput of many concurrent queries by allocating resources sparingly and using them
efficiently( This kind of optimization helps avoid wide performance variations that can
occur depending on the number of concurrent queries(
+hen faced with a workload that requires more system resources than are available%
Teradata tunes itself to that workload( Thrashing% a common performance failure mode in
computer systems% occurs when the system has fewer resources than the current workload
requires and begins using more processing time to manage resources than to do the work(
+ith most databases% a 83# would tune the system to avoid thrashing( .owever%
Teradata ad6usts automatically to workload changes by ad6usting the amount of running
work and internally pushing back incoming work( 4ach unit of parallelism manages this
flow control mechanism independently(
If all concurrent work shares resources e!enly" how are different ser!ice
le!els pro!ided to different users?
The !riority $cheduler 1acility (!$1; in Teradata manages service levels among different
parts of the workload( !$1 allows granular control of system resources( The system
administrator can define up to five resource partitionsB each partition contains four
available priorities( Together% they provide 9@ allocation groups (#)s; to which portions
of the workload are assigned by an attribute of the logon I8 for the user or application(
The administrator assigns each #) a portion of the total system resources and a
scheduling policy(
1or example% the administrator can assign short queries from the +eb site a guaranteed
9@ percent of system resources and a high priority( In contrast% the administrator might
assign medium priority and =@ percent of system resources to more complex queries with
lower response-time requirements( $imilarly% the administrator might assign data mining
queries a low priority and five percent of the total resources% effectively running them in
the background( Aou can define policies so that the resources ad6ust to the work in the
system( 1or example% you could allow data mining queries to take up all the resources in
the system if nothing else is running(
2nlike other scheduling utilities% !$1 is fully integrated into the 83$% not managed at
the task or thread level% which makes it easier to use for parallel database workloads(
3ecause !$1 is an attribute of the session% it follows the work wherever it goes in the
system( +hether that piece of work is executed by a single thread in a single unit of
parallelism or in 9%@@@ threads in :@@ units of parallelism% !$1 manages it without system
administrator involvement(
*!2 scheduling is a primary component of !$1% using all the normal techniques (such as
quantum size% *!2 queues by priority% and so on;( .owever% !$1 is endemic throughout
the Teradata 83$( There are many queues inside a 83$ handling a large volume
mixed workload( #ll of those queues are prioritized based on the priority of the work(
Thus% a high priority query entered after several lower priority requests that are awaiting
their turn to run will go to the head of the queue and will be executed first( I'- is
managed by priority( 8ata warehouse workloads are heavy I'- users% so a large query
performing a lot of I'- could hold up a short% high-priority request( !$1 puts the high-
priority request I'-s to the head of the queue% helping to deliver response time goals(
Data warehouse databases often set the system en!ironment to allow for
fast scans. Does Teradata performance suffer when the short work is
mi#ed in?
3ecause Teradata was designed to handle a high volume of concurrent queries% it doesn0t
count on sequential scans to produce high performance for queries( #lthough other
83$ products see a large fall in request performance when they go from a single large
query to multiple queries or when a mixed workload is applied% Teradata sees no such
performance change( Teradata never plans on sequential access in the first place( In fact%
Teradata doesn0t even store the data for sequential accesses( Therefore% random accesses
from many concurrent requests are 6ust business as usual(
$ync scan algorithms provide additional optimization( +hen multiple concurrent requests
are scanning or 6oining the same table% their I'- is piggybacked so that only a single I'-
is performed to the disk( ultiple concurrent queries can run without increasing the
physical I'- load% leaving the I'- bandwidth available for other parts of the workload(
hat if work demand e#ceeds Teradata's capabilities?
There are limits to how much work the engine can handle( # successful data warehouse
will almost certainly create a demand for service that is greater than the total processing
power available on the system( Teradata always puts into execution any work presented
to the 83$(
If the total demand is greater than the total resources% then controls must be in place
before the work enters the 83$( +hen your warehouse reaches this stage% you can use
8atabase 7uery anager (837; to manage the flow of user requests into the
warehouse( 837% inserted between the users0 -83* applications and the 83$%
evaluates each request and then applies a set of rules created by the system administrator(
If the request violates any of the rules% 837 notifies the user that the request is denied
or deferred to a later time for execution(
"ules can include% for example% system use levels% query cost parameters% time of day%
ob6ects accessed% and authorized users( Aou can read more about 837 in a recent
Teradata "eview article (&1ield "eport< 837%& $ummer =CCC% available online at
www(teradatareview(com'summerCC'truet(html;(
How do administrators and D$%s stay on top of comple# mi#ed
workloads?
The Teradata anager utility provides a single operational system view for
administrators and 83#s( The tool provides real-time performance% logged past
performance% users and queries currently executing% management of the schema% and
more(
&T%'I() %*TI+,
The active warehouse is a busy place( It must handle all decision making for the
organization% including strategic% long-range data mining queries% tactical decisions for
daily operations% and event-based decisions necessary for effective +eb sites(
,evertheless% managing this diversity of work does not require a staff of hundreds
running a complex architecture with multiple data marts% operational data stores% and a
multitude of feeds( It simply requires a database management system that can manage
multiple workloads at varying service levels% scale with the business% and provide 9?>D
availability year round with a minimum of operational staff(
Use COMPRESS in whichever attribute possible. This helps in reducing
IO and hence Improves perormance. Especiall! or attribute having lots o
"U## values$Uni%ue &nown values.
CO##ECT ST'TISTICS on dail! basis (ater ever! load) inorder to
improve perormance.
*rop and recreate secondar! indices beore and ater ever! load. This
helps in improving load perormance (i critical)
Regularl! Chec& or E+E" data distribution across all 'MPs using
Teradata Manager or thru %uer!man
Chec& or the combination on CPU, 'MP-s, PE, nodes or perormance
optimi.ation.
Each 'MP can handle /0 tas&s and each PE can handle 120 sessions.
M#O'* 3 Customi.e the number sessions or each M#O'* 4obs
depending on the
1. "umber o concurrent M#O'* 4obs 5
2. "umber o PE-s in the s!stem
e.g
SCE"'RIO 1
# of AMPS = 10
# of MAx load Jobs handled by Teradata=5 (Parameter which
can be set al!es"5 to 15#
# of Sessions $er load %ob= 1 ($arameter that can be set at
&lobal or at each M'(A) scri$t leel#
# of P*+s=1
So 10,5,1= 50 - 10 (. $er %ob oerhead# = /0 is the Max
sessions on Teradata box
This is '*SS then 1.00 which is max # of sessions a P* can
handle
SCE"'RIO 2
#AMPS = 1/
#Max load Jobs handles by Teradata=15
#Sessions $er load %ob= 1
#of P*+s=1
So 1/,15,1= .10 - 20 (. $er %ob oehead# = .30 (Max
sessions on Teradata box#4
This is M(5* then 1.00 which is the max sessions a P* can
handle4
6ence M'(A) fail0 ins$ite of the !sa7e of the S'**P 8
T*9A:;T< feat!res4
Use the S#EEP and TE"'CIT6 eatures o M#O'* or scheduling
M#O'* 4obs.
Chec& the T'7#E8'IT parameter. I omitted can cause immediate
load 4ob ailure i !ou submit two M#O'*S loads that are tr!ing to
update the same table.
9OI" I"*E: ; Chec& the limit on number o ields or a 4oin Inde< (ma<
1= ields). It ma! var! b! version
9oin Inde< is li&e building the table ph!sicall!. >ence it has the advantage li&e
7ETTER Perormance since data is ph!sicall! stored and not calculated O" T>E
?#6 etc. Cons are o #O'*I"@ time(M#O'* needs 9oin Indices to be dropped
beore loading) and additional space since it is a ph!sical table.

Teradata Performance Optimization

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Teradata Performance Optimization

Încărcat de

Drepturi de autor:

Formate disponibile

*Teradata makes itself the decision to use the index or not - if you are not careful you

S-ar putea să vă placă și