Sunteți pe pagina 1din 138

TERADATA ARCHITECHTURE 1).

WELCOME TO TERADATA DATABASE Covers introduction to Teradata database, explains concepts of Data warehouse and Data marts, elaborates on Relational Database concepts and its components. 2). TERADATA DATABASE & DATAWAREHOUSE ARCHITECHTURE Covers details of Data warehouse and Data marts, describes and compares different existence of Data Marts, details on Teradata Database System and its components ( ynet, !M"s, "arsin# $n#ine, %&"rocs etc.' 3). CLIENT ACCESS Covers Teradata interfacin# with networ( attached clients, channel attached clients, details and when to use (ind of facts for different Teradata utilities. 4). TERADATA SQL Covers basic Teradata S)* syntax. 5). DATA STRUCTURE $xplains Teradata Database ob+ects, details about spool space, perm space etc with example. 6). DATA PROTECTION $xplains in detail various data protection methods used by Teradata System li(e R!,D, -allbac(, .ournals, and !ccesses etc. 7). INDICES Describes what, when, how of several (inds of indices used in Teradata Database Systems.

1. WELCOME TO TERADATA DATABASE Objecti e!


!fter completin# this module, you should be able to/

Describe the Teradata Database. Describe the advanta#es of the Teradata Database. Define the terms associated with relational databases. Describe the advanta#es of a relational database.

W"#t I! T"e Te$#%#t# D#t#b#!e&


The Teradata Database is a relational database mana#ement system (RD MS' that drives a company0s data warehouse. The Teradata Database provides the foundation to #ive a company the power to #row, to compete in today1s dynamic mar(etplace, and to evolve the business by #ettin# answers to a new #eneration of 2uestions. The Teradata Database1s scalability allows the system to #row as the business #rows, from #i#abytes to terabytes and beyond. The Teradata Database1s uni2ue technolo#y has been proven at customer sites across industries and around the world. The Teradata Database is an open system, compliant with !3S, standards. ,t is currently available on 43,5 M"&R!S and 6indows 7888 operatin# systems. The Teradata Database is a lar#e database server that accommodates multiple client applications ma(in# in2uiries a#ainst it concurrently. %arious client platforms access the database throu#h a TC"&," connection or across an , M mainframe channel connection. The ability to mana#e lar#e amounts of data is accomplished usin# the concept of parallelism, wherein many individual processors perform smaller tas(s concurrently to accomplish an operation a#ainst a hu#e repository of data. To date, only parallel architectures can handle databases of this si9e.

H'( I! T"e Te$#%#t# D#t#b#!e U!e%&


$ach Teradata Database implementation can model a company1s business. The ability to (eep up with rapid chan#es in today1s business environment ma(es the Teradata Database an ideal foundation for many applications, includin#/

$nterprise data warehousin# !ctive data warehousin# Customer relationship mana#ement ,nternet and $& usiness Data marts

1) E)te$*$i!e D#t# W#$e"'+!e


Data warehousin# is a process for properly assemblin# and mana#in# data from various servers to answer business&critical 2uestions. The Teradata Database is ideal for e)te$*$i!e %#t# (#$e"'+!i),, which is commonly characteri9ed by/

Multiple sub+ect areas Many concurrent users Many concurrent 2ueries, includin# ad&hoc 2ueries *ar#e 2uantity of tables :undreds of #i#abytes (and terabytes' of detail data :istorical data stored (months or years'

Wit"'+t an enterprise data warehouse, a financial institution may be able to identify profitable customers for separate products such as mort#a#es or credit cards, but not (now the overall profitability of each customer. !n enterprise data warehouse brin#s to#ether the different sub+ect areas into a central repository, creatin# ;one sin#le view of the business; for a complete picture of the customer. !n enterprise data warehouse environment built on the Teradata Database simplifies the system maintenance tas(, resultin# in a lower total cost of ownership. ,n addition, the Teradata Database1s ability to handle lar#e&scale, decision&support 2ueries a#ainst hu#e volumes of detail data ma(es it the obvious choice for companies wantin# to start at any level and #row.

2) Acti e D#t# W#$e"'+!e


The #cti e %#t# (#$e"'+!e extends a company1s ability beyond historical data and strate#ic decisions to brin# the decision&ma(in# capability to front&line personnel. The tactical decisions such as, ;6ho should #et the empty seat on this airplane<; or ;6hat should , offer this customer to (eep her from leavin#, based on her history with our company<; can be made more effectively with the ri#ht information. 6ith an active data warehouse, employees who interact directly with customers and suppliers are empowered with information&based decision ma(in# at their fin#ertips. The Teradata 6arehouse supports active data warehousin# with/

Capability to handle thousands of additional users and mixed wor(loads :i#h availability and reliability to support mission&critical applications Scalability to accommodate an increase in the amount of data, the number of data sources, and the number of applications supported in the data warehouse environment

3) C+!t'-e$ Re.#ti')!"i* M#)#,e-e)t


C+!t'-e$ Re.#ti')!"i* M#)#,e-e)t solutions help companies capture and analy9e data to maximi9e customer ac2uisition, retention, and profitability. =ou can use the Teradata Database1s detailed data and analysis capabilities to identify and optimi9e business relationships with the hi#hest potential of profitability and #rowth. $xamples include/

! telephone company can conduct and refine mar(etin# pro#rams tar#eted at a certain type of profitable customer. ! supermar(et can create incentives based on specific combinations of products that customers tend to buy to#ether. ! ban( can reco#ni9e chan#es in a customer0s life circumstances, such as a new baby or a colle#e&bound son or dau#hter, and offer timely services such as a new home loan, mort#a#e insurance, additional chec(in# account, extra credit card, or student loan. ! retailer can run a department store credit card sales pro#ram and filter out those customers who already have that card.

The 3CR CRM solution consists of software, professional and customer services, and the Teradata Database to create, maintain, and enhance customer relationships.

4) I)te$)et #)% E/B+!i)e!!


The Teradata Database provides a sin#le repository for customer information that helps $& usinesses build and maintain one&to&one customer relationships that are critical to their success on the ,nternet. The Teradata Database supports the fast&paced style of $& usiness by allowin# many concurrent users to as( complicated 2uestions as they thin( of them && and #et 2uic( answers. The Teradata Database allows $& usinesses to/

Capture massive amounts of clic(&stream data. $nable multiple users to as( complex 2uestions of the customer1 clic(&stream data with near real&time response. "rotect customers1 privacy with consumer opt&in>opt&out preferences and ability for consumers to chec( and revise their information stored on the Teradata Database throu#h the ,nternet or a company call center.

5) D#t# M#$t
! data mart is a special purpose subset of a company1s enterprise data used by a particular department, function, or application. ?ften, these sin#le&sub+ect area data marts contain data that was a##re#ated or transformed in some way to better handle the re2uests of a specific user community. %endors implement data marts usin# different architectures/

I)%e*e)%e)t %#t# -#$t! & Created directly from operational systems to an individual data store. De*e)%e)t %#t# -#$t! & Created from detail data in the data warehouse. ,t still re2uires movement and transformation of data, but may provide better performance for some specific user 2ueries. L',ic#. %#t# -#$t! & $xistin# parts of the data warehouse, not separate physical structures. ecause in theory the data warehouse contains the detail data of the entire enterprise, a lo#ical data mart would then provide the specific information for a specific user community. 6ith the proper technolo#y, this can be an ideal way to remove the need for massive data loadin# and transformin#.

,ndependent and dependent data marts are architectures endorsed by other database vendors and tend to be associated with hi#her maintenance costs for physically movin# and maintainin# the data, inconsistent data (and resultin# inconsistent decisions', and indirect ways to #et the complete picture of the data. The Teradata Database is ideal for the lo#ical data mart environment, where different user communities view subsets of a sin#le repository of enterprise data.

W"#t M#0e! t"e Te$#%#t# D#t#b#!e U)i1+e&


,n this 6eb& ased Trainin#, you will learn about many features that ma(e the Teradata Database, a RD MS, ri#ht for business&critical applications. To start with, this section covers these (ey features/

Sin#le data store Scalability 4nconditional parallelism (parallel architecture' !bility to model the business Mature, parallel&aware ?ptimi9er

1) Si),.e D#t# St'$e

The Teradata Database acts as a sin#le data store, with multiple client applications ma(in# in2uiries a#ainst it concurrently. ,nstead of replicatin# a database for different purposes, with the Teradata Database you store the data once and use it for all clients. The Teradata Database provides the same connectivity for an entry&level system as it does for a massive enterprise data warehouse.

2) Sc#.#bi.it2

;*inear scalability; means that as you add components to the system, the performance increase is linear. !ddin# components allows the system to accommodate increased wor(load without decreased throu#hput. *inear scalability enables the system to #row to support more users>data>2ueries>complexity of 2ueries without experiencin# performance de#radation. !s the confi#uration #rows, *e$3'$-#)ce i)c$e#!e i! .i)e#$4 !.'*e '3 1. The Teradata Database was the first commercial database system to scale to and support a trillion bytes of data. The ori#in of the name Teradata is ;tera&,; which is derived from @ree( and means ;trillion.; The chart below lists the meanin# of the prefixes/ 5$e3i6 E6*')e)t (ilo& me#a& #i#a& te$#/ peta& exa& A8
B C

Me#)i), A,888 (thousand' A,888,888 (million' A,888,888,888 (billion' 14777477747774777 8t$i..i')) A,888,888,888,888,888 (2uadrillion' A,888,888,888,888,888,888 (2uintillion'

A8 17 A8

A8D
12 AE

A8AF

The Teradata Database can scale from A88 #i#abytes to over A88 terabytes of data on a sin#le system without losin# any performance capability. The Teradata Database1s scalability provides i) e!t-e)t *$'tecti') for customer1s #rowth and application development. The Teradata Database is the only database that is t$+.2 !c#.#b.e, and this extends to data loadin# with the use of parallel loadin# utilities. The Teradata Database provides #+t'-#tic %#t# %i!t$ib+ti') and no reor#ani9ations of data are needed. The Teradata Database is scalable in multiple ways, includin# hardware, complexity, and concurrent users. H#$%(#$e @rowth is a fundamental #oal of business. !n M"" Teradata Database system

easily accommodates that #rowth whenever it happens. The Teradata Database runs on hi#hly optimi9ed 3CR servers in the followin# confi#urations/

SM5 & Symmetric multiprocessin# platforms mana#e #i#abytes of data to support an entry&level data warehousin# system. M55 & Massively parallel processin# systems can mana#e hundreds of terabytes of data. =ou can start small with a couple of nodes, and later expand the system as your business #rows.

6ith the Teradata Database, you can increase the si9e of your system without replacin#/

D#t#b#!e! / 6hen you expand your system, the data is automatically redistributed throu#h the reconfi#uration process, without manual interventions such as sortin#, unloadin# and reloadin#, or partitionin#. 5.#t3'$-! / The Teradata Database1s modular structure allows you to add components to your existin# system. D#t# -'%e. / The physical and lo#ical data models remain the same re#ardless of data volume. A**.ic#ti')! / !pplications you develop for Teradata Database confi#urations will continue to wor( as the system #rows, protectin# your investment in application development.

C'-*.e6it2 The Teradata Database is adept at complex data models that satisfy the information needs throu#hout an enterprise. The Teradata Database efficiently processes increasin#ly sophisticated business 2uestions as users reali9e the value of the answers they are #ettin#. ,t has the ability to perform lar#e a##re#ations durin# 2uery run time and can perform up to CG +oins in a sin#le 2uery. C')c+$$e)t U!e$! !s is proven in every Teradata Database benchmar(, the Teradata Database can handle the most concurrent users, who are often runnin# -+.ti*.e4 c'-*.e6 1+e$ie!. The Teradata Database has the proven ability to handle from hundreds to thousands of users on the system simultaneously. !ddin# many concurrent users typically reduces system performance. :owever, addin# more components can enable the system to accommodate the new users with e2ual or even better performance.

3) U)c')%iti')#. 5#$#..e.i!-

The Teradata Database provides exceptional performance usin# parallelism to achieve a sin#le answer faster than a non&parallel system. "arallelism uses multiple processors wor(in# to#ether to accomplish a tas( 2uic(ly. !n example of parallelism can be seen at an amusement par(, as #uests stand in line for an attraction such as a roller coaster. !s the line approaches the boardin# platform, it typically will split into multiple, parallel lines. That way, #roups of people can step into their seats simultaneously. The line moves faster than if the #uests step onto the attraction one at a time. !t the bi##est amusement par(s, the parallel loadin# of the rides becomes essential to their successful operation. "arallelism is evident throu#hout a Teradata Database, from the architecture to data loadin# to complex re2uest processin#. The Teradata Database processes re2uests in parallel without mandatory 2uery tunin#. The Teradata Database1s parallelism does not depend on limited data 2uantity, column ran#e constraints, or speciali9ed data models && The Teradata Database has 9+)c')%iti')#. *#$#..e.i!-.9

4) Abi.it2 t' M'%e. t"e B+!i)e!!

! data warehouse built on a business model contains information from across the enterprise. ,ndividual departments can use their own assumptions and views of the

data for analysis, yet these varyin# perspectives have a common basis for a ;sin#le view of the business.; 6ith the Teradata Database1s centrally located, lo#ical architecture, companies can #et a cohesive view of their operations across functional areas to/

-ind out which divisions share customers. Trac( products throu#hout the supply chain, from initial manufacture, to inventory, to sale, to delivery, to maintenance, to customer satisfaction. !naly9e relationships between results of different departments. Determine if a customer on the phone has used the company1s website. %ary levels of service based on a customer1s profitability.

=ou #et consistent answers from the different viewpoints above usin# a sin#le business model, not functional models for different departments. ,n a functional model, data is or#ani9ed accordin# to what is done with it. ut what happens if users later want to do some analysis that has never been done before< 6hen a system is optimi9ed for one department1s function, the other departments1 needs (and future needs' may not be met. ! Teradata Database allows the data to represent a business model, with %#t# '$,#)i:e% #cc'$%i), t' ("#t it $e*$e!e)t!4 not how it is accessed, so it is easy to understand. The data model should be desi#ned with re#ard to usa#e and be the same $e,#$%.e!! '3 %#t# '.+-e. 6ith a Teradata Database as the enterprise data warehouse, users can as( new 2uestions of the data that were never anticipated, throu#hout the business cycle and even throu#h chan#es in the business environment. ! (ey Teradata Database stren#th is its ability to model the customer1s business. The Teradata Database1s business models are t$+.2 )'$-#.i:e%, avoidin# the costly star schema and snowfla(e implementations that many other database vendors use. The Teradata Database can do Star Schema and other types of relational modelin#, but T"i$% ;'$-#. <'$- is the methodolo#y Teradata Division recommends to customers. The Teradata Database1s competitors typically implement Star Schema or Snowfla(e models either because they are implementin# a set of (nown 2ueries in a transaction processin# environment, or because their architecture limits them to that type of model. 3ormali9ation is the process of reducin# a complex data structure into a simple, stable one. @enerally this process involves removin# redundant attributes, (eys, and relationships from the conceptual data model. The Teradata Database supports normali9ed lo#ical models because it is able to perform 64 t#b.e j'i)! and lar#e a##re#ations durin# 2ueries.

5) M#t+$e4 5#$#..e./A(#$e O*ti-i:e$


The Teradata Database1s ?ptimi9er is the most robust in the industry, able to handle/

Multiple complex 2ueries .oins per 2uery 4nlimited ad&hoc processin#

The ?ptimi9er is parallel&aware, meanin# that it has (nowled#e of system components (how many nodes, vprocs, etc.'. ,t determines the .e#!t e6*e)!i e *.#) 8ti-e/(i!e) to process 2ueries fast and in parallel. The ?ptimi9er is further explained in the next module.

W"#t I! # Re.#ti')#. D#t#b#!e&


! database is a collection of permanently stored data that is/

*o#ically related (the data was created for a specific purpose'. Shared (many users may access the data'. "rotected (access to the data is controlled'. Mana#ed (the data inte#rity and value are maintained'.

The Teradata Database is a relational database. Relational databases are based on the relational model, which is founded on mathematical Set T"e'$2. The relational model uses and extends many principles of Set Theory to provide a disciplined approach to data mana#ement. ! relational database is desi#ned to/

Represent a business and its b+!i)e!! *$#ctice!. e extremely 3.e6ib.e in the way that it can be selected and used. e e#!2 t' +)%e$!t#)% M'%e. t"e b+!i)e!!, not the applications !ll businesses to 1+ic0.2 $e!*')% t' c"#),i), c')%iti')!

Relational databases present data as of a set of tables. ! t#b.e is a two& dimensional representation of data that consists of $'(! and c'.+-)!. !ccordin# to the relational model, a valid table does not have to be populated with data rows, it +ust needs to be defined with at least one column. ! relational database is a set of lo#ically related tables. Tables are lo#ically related to each other by a common field, so information such as customer

telephone numbers and addresses can exist in one table, yet be accessible for multiple purposes. The example below shows customer, order, and billin# statement data, related by a common field. The common field of Customer ,D lets you loo( up information such as a customer name for a particular statement number, even thou#h the data exists in two different tables.

Relational databases are more flexible than other types so businesses are able to respond more 2uic(ly to chan#in# conditions.

1) R'(!
$ach $'( contains all the columns in the table. ! row is ')e i)!t#)ce '3 #.. c'.+-)!, and each table can contain ').2 ')e $'( 3'$-#t. The order of rows is #$bit$#$2 and does not imply priority, hierarchy, or si#nificance. $ach row represents an occurrence of an entity defined by the table. !n entity is a person, place, or thin# about which the table contains information. ,n this example, the entity is the employee and each row represents a sin#le employee.

2) C'.+-)!
$ach column contains 9.i0e %#t#49 such as only part names, or only supplier names, or only employee numbers. ,n the example below, the *astH3ame column contains last names only, and nothin# else. The data in the columns is atomic data, so a telephone number mi#ht be divided into three columns/ the area code, the prefix, and the suffix, so the customer data can be analy9ed accordin# to area code, etc. Missin# data values would be represented by ;nulls.; 6ithin a table, the column position is #$bit$#$2.

3) 5$i-#$2 =e2
,n the relational model, a "rimary Iey ("I' is used to desi#nate a uni2ue identifier for each row when you desi#n a database. ! "rimary Iey can be composed of one or more columns. ,n the example below, the "rimary Iey is the employee number.

5$i-#$2 =e2 R+.e!


Rules #overnin# how "rimary Ieys must be defined and how they function are/ R+.e 1> ! "rimary Iey is re2uired. R+.e 2> ! "rimary Iey value must be uni2ue. R+.e 3> The "rimary Iey value cannot be 34**. R+.e 4> The "rimary Iey value should not be chan#ed. R+.e 5> The "rimary Iey column should not be chan#ed. R+.e 6> ! "rimary Iey may be any number of columns.

R+.e 1> A 5$i-#$2 =e2 i! $e1+i$e%


,n the lo#ical model, each table re2uires a "rimary Iey because that is how each row is able to be uni2uely identified. $ach table must have one, and ').2 ')e, "rimary Iey. ,n any #iven row, the value of the "rimary Iey +)i1+e.2 i%e)ti3ie! t"e $'(. The "rimary Iey may span more than one column, but even then, there is only one "rimary Iey.

R+.e 2> U)i1+e 5=


6ithin the column(s' desi#nated as the "rimary Iey, the values in each row must be uni2ue. 3o duplicate values are allowed. The "rimary Iey1s purpose is to uni2uely identify a row. ,n a multi&column "rimary Iey, the c'-bi)e% value of the columns must be uni2ue, even if an individual column in the "rimary Iey has duplicate values.

R+.e 3> 5= C#))'t Be ;ULL


6ithin the "rimary Iey column, each row must have a "rimary Iey value and cannot be 34** (without a value'. ecause 34** is indeterminate, it cannot ;identify; anythin#.

R+.e 4> 5= ?#.+e S"'+.% ;'t C"#),e


"rimary Iey values should not be chan#ed. ,f you chan#ed a "rimary Iey, you would lose all historical trac(in# of that row.

R+.e 5> 5= C'.+-) S"'+.% ;'t C"#),e


!dditionally, the column(s' desi#nated as the "rimary Iey should not be chan#ed. ,f you chan#ed a "rimary Iey, you would lose all the information relatin# that table to other tables.

R+.e 6> ;' C'.+-) Li-it


,n the relational model, there is no limit to the number of columns that can be desi#nated as the "rimary Iey, so it may consist of one or more columns. ,n the example below, the "rimary Iey consists of three columns/ $M"*?=$$ 34M $R, *!ST 3!M$, and -,RST 3!M$.

<'$ei,) =e2
! -orei#n Iey (-I' is an identifier that lin(s related tables. ! -orei#n Iey defines how two tables are related to each other. $ach -orei#n Iey references a matchin# "rimary Iey in another table in the database. -or example, in the table

below, the Department 3umber column that is a -orei#n Iey actually exists in another table as a "rimary Iey.

:avin# tables related to each other #ives users the flexibility to loo( at the data in different ways, without the database administrator havin# to mana#e and maintain many tables of duplicate data for different applications.

<'$ei,) =e2 R+.e!


Rules #overnin# how -orei#n Ieys must be defined and how they operate are/ R+.e 1> -orei#n Ieys are optional. R+.e 2> ! -orei#n Iey value may be non&uni2ue. R+.e 3> The -orei#n Iey value may be 34**. R+.e 4> The -orei#n Iey value may be chan#ed. R+.e 5> ! -orei#n Iey may be any number of columns. R+.e 6> $ach -orei#n Iey must exist as a "rimary Iey in the related table.

R+.e 1> O*ti')#. <=!


-orei#n Ieys are optionalJ not all tables have them. Tables that do have them can have multiple -orei#n Ieys because a table can relate to multiple other tables. ,n fact, a table can have an unlimited number of forei#n (eys. ,n the example table below/

The Department 3umber -orei#n Iey relates to the Department 3umber "rimary Iey in the Department Table elsewhere in the database. The .ob Code -I relates to the .ob Code "I in the .ob Code Table, elsewhere in the database.

:avin# tables related to each other ma(es a relational database flexible so that different users can loo( up information they need, while simplifyin# the database administration so the data doesn1t have to be duplicated for each purpose or application.

R+.e 2> U)i1+e '$ ;')/U)i1+e <=!


Duplicate -orei#n Iey values are allowed. More than one employee could be assi#ned to the same department.

R+.e 3> <=! C#) Be ;ULL


34** (missin#' -orei#n Iey values are allowed. -or example, under special circumstances, an employee mi#ht not be assi#ned to a department.

R+.e 4> <= ?#.+e C#) C"#),e


-orei#n Iey values may be chan#ed. -or example, if !rnando %ille#as moves from Department G8B to Department EFK, the -orei#n Iey value in his row would chan#e.

R+.e 5> <= H#! ;' C'.+-) Li-it


The -orei#n Iey may consist of one or more columns. ! multi&column forei#n (ey is used to relate to a multi&column "rimary Iey in the related table. ,n the relational model, there is no limit to the number of columns that can be desi#nated as a -orei#n Iey.

R+.e 6> <= M+!t Be 5= i) Re.#te% T#b.e


$ach -orei#n Iey must exist as a "rimary Iey in the related table. ! department number that does not exist in the Department Table would be invalid as a -orei#n Iey value in the $mployee Table. This rule can apply even if the -orei#n Iey is 34**, or missin#. Remember, a missin# value is defined as a non&valueJ there is no value present. So the rule could be better stated/ if a value exists in the -orei#n Iey column, it must match a "rimary Iey value in the related table.

2. TERADATA DATABASE @ DATAWAREHOUSE ARCHITECHTURE

Objecti e!
!fter completin# this module, you should be able to/

,dentify the different types of enterprise data processin#. Define a data warehouse, active data warehouse, and a data mart. *ist and define the different types of data marts. $xplain the advanta#es of detail data over summary data. Describe the overall Teradata Database parallel architecture. *ist and describe ma+or Teradata Database hardware and software components and their functions. $xplain how the architecture helps to maintain hi#h availability and reliability for Teradata Database users.

E '.+ti') t' Acti e D#t# W#$e"'+!i),

D#t# W#$e"'+!e U!#,e E '.+ti') There is an information evolution happenin# in the data warehouse environment today. Chan#in# business re2uirements have placed demands on data warehousin# technolo#y to do more thin#s faster. Data warehouses have moved from bac( room strate#ic decision support systems to operational, business& critical components of the enterprise. !s your company evolves in its use of the data warehouse, what you need from the data warehouse evolves, too.

St#,e 1 Re*'$ti),> The initial sta#e typically focuses on reportin# from a sin#le view of the business to drive decision&ma(in# across functional and>or product boundaries. )uestions are usually (nown in advance, such as a wee(ly sales report. St#,e 2 A)#.2:i),> -ocus on why somethin# happened, such as why sales went down or discoverin# patterns in customer buyin# habits. 4sers perform ad&hoc analysis, slicin# and dicin# the data at a detail level, and 2uestions are not (nown in advance. St#,e 3 5$e%icti),> Sophisticated analysts heavily utili9e the system to levera#e information to predict what will happen next in the business to proactively mana#e the or#ani9ation1s strate#y. This sta#e re2uires data minin# tools and buildin# predictive models usin# historical detail. !s an example, users can model customer demo#raphics for tar#et mar(etin#. St#,e 4 O*e$#ti')#.i:i),> "rovidin# access to information for immediate decision&ma(in# in the field enters the realm of active data warehousin#. Sta#es A to B focus on strate#ic decision&ma(in# within an or#ani9ation. Sta#e G focuses on tactical decision support.. Tactical decision support is not focused on developin# corporate strate#y, but rather on supportin# the people in the field who execute it. $xamples/ A' ,nventory mana#ement with +ust&in&time replenishment, 7' Schedulin# and routin# for pac(a#e delivery. B' !lterin# a campai#n based on current results. St#,e 5 Acti e W#$e"'+!i),> The lar#er the role an !D6 plays in the operational aspects of decision support, the more incentive the business has to automate the decision processes. =ou can automate decision&ma(in# when a customer interacts with a web site. ,nteractive customer relationship mana#ement (CRM' on a web site or at an !TM is about ma(in# decisions to optimi9e the customer relationship throu#h individuali9ed product offers, pricin#, content delivery and so on. !s technolo#y evolves, more and more decisions become executed with event&driven tri##ers to initiate fully automated decision processes. $xample/ determine the best offer for a specific customer based on a real&time event, such as a si#nificant !TM deposit.

Acti e D#t# W#$e"'+!e


Data warehouses are be#innin# to ta(e on -i!!i')/c$itic#. roles supportin# CRM, one&to&one mar(etin#, and minute&to&minute decision&ma(in#. Data warehousin# re2uirements have evolved to demand a decision capability that is not +ust oriented toward corporate staff and upper mana#ement, but actionable on a day&to&day basis. Decisions such as when to replenish arbie dolls at a

particular retail outlet may not be strate#ic at the level of customer se#mentation or lon#&term pricin# strate#ies, but when executed properly, they ma(e a bi# difference to the bottom line. 6e refer to this capability as ;tactical; decision support. Tactical decisions are the drivers for day&to&day mana#ement of the business. usinesses today want more than +ust strate#ic insi#ht from their data warehouse implementations&they want better execution in runnin# the business throu#h more effective use of information for the decisions that #et made thousands of times per day. The ori#in of the active data warehouse is the timely, inte#rated store of detail data available for analytic business decision&ma(in#. ,t is only from that source that the additional traits needed by the active data warehouse can evolve. These new ;active; traits are supplemental to data warehouse functionality. -or example, the wor( mix in the database still includes complex decision support 2ueries, but expands to ta(e on short, tactical 2ueries, bac(#round data feeds, and possibly event&driven updates all at the same time. Data volumes and user concurrency levels may explode upward beyond expectation. Restraints may need to be placed on the lon#er, analytical 2ueries in order to #uarantee tactical wor( throu#hput. 6hile accessin# the detail data directly remains an important opportunity for analytical wor(, tactical wor( may thrive on shortcuts and summaries, such as '*e$#ti')#. %#t# !t'$e 8ODS) .e e. i)3'$-#ti'). !nd for b't" !t$#te,ic #)% t#ctic#. %eci!i')! to be useful to the business, today1s data, this hour1s data, even this minute1s data has to be at hand. The Teradata Database is positioned exceptionally well for steppin# up to the challen#es related to hi#h availability, lar#e multi&user wor(loads, and handlin# c'-*.e6 1+e$ie! that are re2uired for an active data warehouse implementation. The Teradata Database technolo#y supports the evolvin# business re2uirements by providin# "i," *e$3'$-#)ce #)% !c#.#bi.it2 for/

Mixed wor(loads (both tactical and strate#ic 2ueries' for mission critical applications *ar#e amounts of detail data Concurrent users

The Teradata Database provides 7624 # #i.#bi.it2 #)% $e.i#bi.it2, as well as continuous updatin# of information so %#t# i! #.(#2! 3$e!" #)% #cc+$#te.

E '.+ti') '3 D#t# 5$'ce!!i),


Traditionally, data processin# has been divided into two cate#ories/ on&line transaction processin# (?*T"' and decision support systems (DSS'. -or either, re2uests are handled as transactions. ! transaction is # .',ic#. +)it '3 ('$0, such as a re2uest to update an account. !n RD MS is used in the followin# main processin# environments/

DSS ?*T" ?*!" Data Minin#

Deci!i') S+**'$t S2!te-! 8DSS) ,n a decision support environment, users submit re2uests to #)#2.:e "i!t'$ic#. %et#i. %#t# stored in the tables. The results are used to establish strate#ies, reveal trends, and ma(e pro+ections. ! database used as a decision support system (DSS' usually receives fewer, very complex, ad&hoc 2ueries and may involve numerous tables. Decision support systems include batch reports, which roll&up numbers to #ive business the bi# picture, and over time, have evolved. ,nstead of pre&written scripts, users now re2uire the ability to do ad hoc 2ueries, analysis, and predictive what&if type 2ueries that are often complex and unpredictable in their processin#. These types of 2uestions are essential for lon# ran#e, strate#ic strate#ic plannin#. DSS systems often process hu#e volumes of detail data.

O)/.i)e T$#)!#cti') 5$'ce!!i), 8OLT5) 4nli(e the DSS environment, an on&line transaction processin# (?*T"' environment typically has users accessin# current data to update, insert, and delete rows in the data tables. ?*T" is typified by a small number of rows (or records' or a few of many possible tables bein# accessed in a matter of seconds or less. %ery little ,>? processin# is re2uired to complete the transaction. This type of transaction ta(es place when we ta(e out money at an !TM. ?nce our card is validated, a debit transaction ta(es place a#ainst our current balance to reflect the amount of cash withdrawn. This type of transaction also ta(es place when we deposit money into a chec(in# account and the balance #ets updated.

6e expect these transactions to be performed 2uic(ly. They must occur in real time.

O)/.i)e A)#.2tic#. 5$'ce!!i), 8OLA5) ?*!" is a modern form of analytic processin# within a DSS environment. ?*!" tools (e.#. from companies li(e Microstrate#y and Co#nos' provide an easy to use @raphical 4ser ,nterface to allow Lslice and diceM analysis alon# multiple dimensions (e.#. products, locations, sales teams, inventories, etc.'. 6ith ?*!", the user may be loo(in# for historical trends, sales ran(in#s or seasonal inventory fluctuations for the entire corporation. 4sually, this involves a lot of detail data to be retrieved, processed and analy9ed. Therefore, response time can be in seconds or minutes. D#t# Mi)i), Data Minin# (predictive modelin#' involves analy9in# moderate to lar#e amounts of detailed historical data to detect behavioral patterns (e.#. buyin#, attrition, or fraud patterns' that are then used to predict future behavior. !n Lanalytic modelM is built from the historical data ("hase A/ minutes to hours' incorporatin# the detected patterns. The model is then applied a#ainst current detail data (Lscorin#M' to predict li(ely outcomes ("hase 7/ seconds or less'. *i(ely outcomes, for example, include scores on li(elihood of purchasin# a product, switchin# to a competitor, or bein# fraudulent.

A% #)t#,e! '3 U!i), S+--#$2 D#t#


4ntil recently, most business decisions were based on summary data. The problem is that summari9ed data is not as useful as detail data and cannot answer some 2uestions with accuracy. 6ith summari9ed data, pea(s and valleys are leveled when the pea(s fall at the end of reportin# period and are cut in half. :ere1s another example. Thin( of your monthly ban( statement that records chec(in# account activity. ,f it only told you the total amount of deposits and withdrawals, would you be able to tell if a certain chec( had cleared< To answer that 2uestion you need a list of every chec( received by your ban(. =ou need detail data. Decision support&answerin# business 2uestions&is the real purpose of databases.

To answer business 2uestions, decision&ma(ers must have four thin#s/


The ri#ht data $nou#h detail data "roper data structure $nou#h computer power to access and produce reports on the data

Consider your own business and how it uses data. ,s that data detailed or summari9ed< ,f it1s summari9ed, are there 2uestions it cannot answer<

T"e D#t# W#$e"'+!e


! %#t# (#$e"'+!e is a central, enterprise&wide database that contains information extracted from the operational systems. ! Data 6arehouse has a ce)t$#..2 .'c#te% .',ic#. #$c"itect+$e which minimi9es data synchroni9ation and provides a sin#le view of the business. Data warehouses have become more common in corporations where enterprise&wide detail data may be used in on& line analytical processin# to ma(e strate#ic and tactical business decisions. 6arehouses often carry many years worth of detail data so that historical trends may be analy9ed usin# the full power of the data. Many data warehouses #et their data directly from operational systems so that the data is timely and accurate. 6hile data warehouses may be#in somewhat small in scope and purpose, they often #row 2uite lar#e as their utility becomes more fully exploited by the enterprise. Data 6arehousin# is a process, not a product. ,t is a techni2ue to properly assemble and mana#e data from various sources to answer business 2uestions not previously possible or (nown.

D#t# M#$t!
! data mart is a special purpose subset of enterprise data used by a particular department, function or application. Data marts may have b't" !+--#$2 #)% %et#i. %#t# 3'$ # *#$tic+.#$ +!e rather than for #eneral use. 4sually the data has been pre&a##re#ated or transformed in some way to better handle the particular type of re2uests of a specific user community. I)%e*e)%e)t D#t# M#$t! ,ndependent data marts are created directly from operational systems, +ust as is a data warehouse. ,n the data mart, the data is usually transformed as part of the load process. Data mi#ht be a##re#ated, dimensionali9ed or summari9ed historically, as the re2uirements of the data mart dictate. L',ic#. D#t# M#$t! *o#ical data marts are not separate physical structures or a data load from a data warehouse, but rather are an existin# part of the data warehouse. ecause in theory the data warehouse contains the detail data of the entire enterprise, a lo#ical view of the warehouse mi#ht provide the specific information for a #iven user community, much as a physical data mart would. 6ithout the proper technolo#y, a lo#ical data mart can be a slow and frustratin# experience for end users. 6ith the proper technolo#y, it removes the need for massive data loadin# and transformin#, ma(in# a sin#le data store available for all user needs. De*e)%e)t D#t# M#$t! Dependent data marts are created from the detail data in the data warehouse. 6hile havin# many of the advanta#es of the lo#ical data mart, this approach still re2uires the movement and transformation of data but may provide a better

vehicle for performance&critical user 2ueries.

D#t# M#$t 5$'! #)% C')!

I)%e*e)%e)t D#t# M#$t! ,ndependent data marts are usually the easiest and fastest to implement and their paybac( value can be almost immediate. Some corporations start with several data marts before decidin# to build a true data warehouse. This approach has several inherent problems/ 6hile data marts have obvious value, they are not a true enterprise&wide solution and can become very costly over time as more and more are added. ! ma+or problem with proliferatin# data marts is that, dependin# on where you loo( for answers, there is often more than one view of the business. They do not provide the historical depth of a true data warehouse. ecause data marts are desi#ned to handle specific types of 2ueries from a specific type of user, they are often not #ood at ;what if; 2ueries li(e a data warehouse would be. L',ic#. D#t# M#$t! *o#ical data marts overcome most of the limitations of independent data marts. They provide a sin#le view of the business. There is no historical limit to the data and ;what if; 2ueryin# is entirely feasible. The ma+or drawbac( to lo#ical data marts is the lac( of physical control over the data. ecause data in the warehouse in not pre&a##re#ated or dimensionali9ed, performance a#ainst the lo#ical mart will not usually be as #ood as a#ainst an independent mart. :owever, use of parallelism in the lo#ical mart can overcome some of the limitations of the non&transformed data. De*e)%e)t D#t# M#$t! Dependent data marts provide all advanta#es of a lo#ical mart and also allow for physical control of the data as it is extracted from the data warehouse. ecause dependent marts use the warehouse as their foundation, they are #enerally considered a better solution than independent marts, but they ta(e lon#er and are more expensive to implement.

A Te$#%#t# D#t#b#!e S2!te! Teradata Database system contains one or more nodes. ! node is a term for a processin# unit under the control of a sin#le operatin# system. The node is where the processin# occurs for the Teradata Database. There are two types of Teradata Database systems/

Symmetric multiprocessin# (SM"' & !n SM" Teradata Database has a sin#le node that contains multiple C"4s sharin# a memory pool. Massively parallel processin# (M""' & Multiple SM" nodes wor(in# to#ether comprise a lar#er, M"" implementation of a Teradata Database. The nodes are connected usin# the =3$T, which allows multiple virtual processors on multiple nodes to communicate with each other.

To mana#e a Teradata Database system, you use/


SM" system/ System Console ((eyboard and monitor' attached directly to the SM" node M"" system/ !dministration 6or(station (!6S'

To access a Teradata Database system, a user typically lo#s on throu#h one of multiple client platforms (channel&attached mainframes or networ(&attached wor(stations'. Client access is discussed in the next module.

;'%e C'-*')e)t!
! node is a basic buildin# bloc( of a Teradata Database system, and contains a lar#e number of hardware and software components. ! conceptual dia#ram of a node and its ma+or components is shown below. :ardware components are shown on the left side of the node and software components are shown on the ri#ht side. <'$ # %e!c$i*ti')4 c.ic0 ') e#c" c'-*')e)t.

A**.ic#ti')
!n application is software that accesses the Teradata Database. ,t can run on various platforms/

Channel&attached client *!3&attached client 3ode

C"#))e. D$i e$
Channel driver software is the means of communication between the "$s and applications runnin# on channel&attached (mainframe' clients.

A#te(#2
The Teradata @ateway software is the means of communication between the "$s (on the node' and applications runnin# on/

3etwor(&attached clients ! node in the system

AM5
!M"s (!ccess Module "rocessors' are virtual processors (vprocs' that receive steps from "$s ("arsin# $n#ines' and perform database functions to retrieve or update data. $ach !M" is associated with one virtual dis( (vdis(', where the data is stored. !n !M" mana#es only its own vdis(, not the vdis( of any other !M".

5E
"$s ("arsin# $n#ines' are vprocs that receive S)* re2uests from the client and brea( the re2uests into steps. The "$s send the steps to the !M"s and subse2uently return the answer to the client.

Te$#%#t# D#t#b#!e
The Teradata Database is a relational database mana#ement system (RD MS' that runs as a Trusted "arallel !pplication (T"!' on the operatin# system. ! T"! implements virtual processors and runs on the operatin# system with "D$. The software components of the Teradata Database include/

Channel Driver Teradata @ateway !M" "$

! Teradata Database system can have some nodes with the Database software, and some nodes without it.

3odes that contain the Teradata Database software are called ;T"! nodes.; 3odes that do not contain Teradata Database software #enerally have applications installed on them. They are called ;non&T"! nodes,; with the software label ;3?T"!; (pronounced, ;no T"!;'.

5DE
The "D$ ("arallel Database $xtensions' software layer runs on the operatin# system on each node. This additional software layer was created by 3CR to support the parallel environment.

?%i!0 8?i$t+#. Di!0)


! vdis( (pronounced, ;%$$&dis(;' is the lo#ical dis( space that is mana#ed by an !M". Dependin# on the confi#uration, a vdis( may not be contained on the nodeJ however, it is mana#ed by an !M", which is always a part of the node.

U!i), t"e BB;ET


The =3$T (pronounced, ;bye&net;' is a hi#h&speed interconnect (networ(' that enables multiple nodes in the system to communicate. The =3$T handles the internal communication of the Teradata Database. !ll communication between "$s and !M"s is done via the =3$T. 6hen the "$ dispatches the steps for the !M"s to perform, they are dispatched onto the =3$T. The messa#es are routed to the appropriate !M"(s' where results sets and status information are #enerated. This response information is also routed bac( to the re2uestin# "$ via the =3$T. Dependin# on the nature of the dispatch re2uest, the communication between nodes may be to all nodes ( roadcast messa#e' or to one specfic node ("oint&to&point messa#e' in the system

1) BB;ET U)i1+e <e#t+$e!


The =3$T has several uni2ue features/

Sc#.#b.e> !s you add more nodes to the system, the overall networ( bandwidth scales linearly. This linear scalability means you can increase system si9e without performance penalty && and sometimes even increase performance. Hi," *e$3'$-#)ce> !n M"" system typically has two =3$T networ(s ( =3$T 8 and =3$T A'. ecause both networ(s in a system are active, the system benefits from havin# full use of the a##re#ate

bandwidth of both the networ(s.

<#+.t t'.e$#)t> $ach networ( has multiple connection paths. ,f the =3$T detects an unusable path in either networ(, it will automatically reconfi#ure that networ( so all messa#es avoid the unusable path. !dditionally, in the rare case that =3$T 8 cannot be reconfi#ured, hardware on =3$T 8 is disabled and messa#es are re&routed to =3$T A. L'#% b#.#)ce%> Traffic is automatically and dynamically distributed between both =3$Ts.

2) BB;ET H#$%(#$e #)% S'3t(#$e


The =3$T hardware and software handle the communication between the vprocs and the nodes.

H#$%(#$e> The nodes of an M"" system are connected with the =3$T hardware, consistin# of =3$T boards and cables. S'3t(#$e> The =3$T driver (software' is installed on every node. This =3$T driver is an interface between the "D$ software and the =3$T hardware. SM" systems do not contain =3$T hardware. The "D$ and =3$T software emulate =3$T activity in a sin#le&node environment.

3) C'--+)ic#ti') Bet(ee) ;'%e!


The =3$T hardware can carry the followin# types of messa#es between nodes/

roadcast messa#e to all nodes "oint&to&point messa#e from one node to another node

4) C'--+)ic#ti') Bet(ee) ?*$'c!


?n an M"" system, =3$T hardware is used to first send the communication across nodes (usin# either the point&to&point or broadcast messa#in# described previously'. ?n an SM" system, this first step is unnecessary since there is only one node. ?nce a node receives a communication, vproc communication within the node is done by the "D$ and =3$T software usin# the followin# types of messa#in#/.

"oint&to&point Multicast roadcast

5'i)t/t'/5'i)t Me!!#,e! 6ith point&to&point messa#in# between vprocs, a vproc can send a messa#e to another vproc on/

The same node (usin# "D$ and =3$T software' ! different node usin# two steps/ A. Send a point&to&point messa#e from the sendin# node to the node containin# the recipient vproc. This is a communication between

nodes usin# the =3$T hardware. 7. 6ithin the recipient node, the messa#e is sent to the recipient vproc. This is a point&to&point communication between vprocs usin# the "D$ and =3$T software.

C.i1+e!

! cli2ue (pronounced, ;(lee(;' is a #roup of nodes that share access to the same dis( arrays. $ach multi&node system has at least one cli2ue. The cablin# determines which nodes are in which cli2ues && the nodes of a cli2ue are connected to the dis( array controllers of the same dis( arrays.

C.i1+e! 5$' i%e Re!i.ie)c2


,n the rare event of a node failure, cli2ues provide for data access throu#h vproc mi#ration. 6hen a node resets, the followin# happens to the !M"s/ A. 6hen the node fails, the Teradata Database restarts across all remainin# nodes in the system. 7. The vprocs (!M"s' from the failed node mi#rate to the operational nodes in its cli2ue. B. Dis(s mana#ed by the !M" remain available and processin# continues while the failed node is bein# repaired.

C.i1+e! i) # S2!te%procs are distributed across all nodes in the system. Multiple cli2ues in the system should have the same number of nodes. The dia#ram below shows three cli2ues. The nodes in each cli2ue are cabled to the same dis( arrays. The overall system is connected by the =3$T. ,f one node #oes down in a cli2ue the vprocs will mi#rate to the other nodes in the cli2ue, so data remains available. :owever, system performance decreases due to the loss of a node. System performance de#radation is proportional to cli2ue si9e.

T$+!te% 5#$#..e. A**.ic#ti') 8T5A)


! Trusted "arallel !pplication (T"!' uses "D$ to implement virtual processors (vprocs'. The Teradata Database is classified as a T"!. The four components of the Teradata Database T"! are/

!M" (Top Ri#ht' "$ ( ottom Ri#ht' Channel Driver (Top *eft' Teradata @ateway ( ottom *eft'

Te$#%#t# D#t#b#!e S'3t(#$e> 5E


! "arsin# $n#ine ("$' is a vproc that mana#es the dialo#ue between a client application and the Teradata Database, once a valid session has been established. $ach "$ can support a maximum of 127 !e!!i')!. The "$ handles an incomin# re2uest in the followin# manner/ A. The Se!!i') C')t$'. component verifies the re2uest for session authori9ation (user names and passwords', and either allows or disallows the re2uest. 7. The 5#$!e$ does the followin#/ o ,nterprets the S)* statement received from the application. o %erifies S)* re2uests for the proper syntax and evaluates them semantically. o Consults the Data Dictionary to ensure that all ob+ects exist and that the user has authority to access them. B. The O*ti-i:e$ develops the least expensive plan (in terms of time' to return the re2uested response set. "rocessin# alternatives are evaluated and the fastest alternative is chosen. This alternative is converted into e6ec+t#b.e !te*!, to be *e$3'$-e% b2 t"e AM5!, which are then passed to the Dispatcher. The ?ptimi9er is 9*#$#..e. #(#$e49 meanin# that it has (nowled#e of the system components (how many nodes, vprocs, etc.', which enables it to determine the fastest way to process the 2uery. ,n order to maximi9e throu#hput and minimi9e resource contention, the ?ptim9er must (now about system confi#uration, available units of parallelism (!M"s and "$s', and data demo#raphics. The Teradata Database ?ptimi9er is robust and intelli#ent, and enables the Teradata Database to handle multiple complex, ad&hoc 2ueries efficiently. G. The Di!*#tc"e$ controls the se2uence in which the steps are executed and passes the steps received from the "arser on to the =3$T for execution by the !M"s. E. !fter the !M"s process the steps, the "$ receives their responses over the =3$T. C. The Dispatcher builds a response messa#e and sends the messa#e bac( to the user.

Te$#%#t# D#t#b#!e S'3t(#$e> AM5

The !M" is a vproc that controls its portion of the data on the system. !M"s do the physical wor( associated with ,e)e$#ti), #) #)!(e$ !et 8'+t*+t) includin# !'$ti),4 #,,$e,#ti),4 3'$-#tti),4 #)% c') e$ti),. The !M"s perform all database mana#ement functions on the re2uired rows in the system. The !M"s wor( in parallel, each !M" -#)#,i), t"e %#t# $'(! !t'$e% ') it! !i),.e %i!0. !M"s are involved in data distribution and data access in different ways. !M"s perform all tas(s in parallel providin# exceptional performance. D#t# Di!t$ib+ti') 6hen data is loaded, inserted, and updated, the !M"/

Receives incomin# data from the "$. -ormats rows and distributes them on its vdis(.

D#t# Acce!! 6hen data is accessed, the !M" retrieves the rows re2uested by the "$ in the followin# manner/ A. The database mana#ement subsystem receives the steps from the Dispatcher over the =3$T. 7. The database mana#ement subsystem processes the steps. The subsystem on the !M" can/ o *oc( databases and tables o Create, modify, or delete definitions of tables o .oin tables o ,nsert, delete, or modify rows within tables o Sort, a##re#ate, or format data o Retrieve information from definitions and rows from tables B. The database mana#ement subsystem returns responses over the =3$T to the Dispatcher

3. CLIE;T ACCESS
C.ie)t C'))ecti')!

4sers can access data in the Teradata Database throu#h an application on both channel&attached and networ(&attached clients. !dditionally, the node itself can act as a client. Teradata client software is installed on each client (channel& attached, networ(&attached, or node' and communicates with RD MS software on the node. =ou may occasionally hear either type of client referred to by the le#acy term of ;host,; thou#h this term is not typically used in documentation or product literature.

C"#))e./Att#c"e% C.ie)t

Channel&attached clients are , M&compatible mainframe systems supported by the Teradata Database. The followin# software components installed on the mainframe are responsible for communications between client applications and the Channel Driver on a Teradata Database node/

Teradata Director "ro#ram (TD"' software to mana#e session traffic, installed on the channel&attached client. Call&*evel ,nterface (C*,', a library of routines that are the lowest&level interface to the Teradata Database.

C'--+)ic#ti') (it" t"e Te$#%#t# D#t#b#!e S2!teCommunication from client applications on the mainframe #oes throu#h the mainframe channel, to the :ost Channel !dapter on the node, to the Channel Driver software.

;et('$0 Att#c"e% C.ie)t

The Teradata Database supports networ(&attached clients connected to the node over a *!3. The followin# software components installed on the networ(& attached client are responsible for communication between client applications and the Teradata @ateway on a Teradata Database node/

?D C C*,v7

C'--+)ic#ti') (it" t"e Te$#%#t# D#t#b#!e S2!teCommunication from applications on the networ(&attached client #oes over the *!3, to the $thernet card on the node, to the Teradata @ateway software.

?n the database side, the Teradata @ateway software and the "$ provide the connection to the Teradata Database. The Teradata Database is confi#ured with 2 LA; c'))ecti')! for redundancy. This ensures hi#h availability

;'%e

The node is considered a networ(&attached client. ,f you install application software on a node, it will be treated li(e an application on a networ(&attached client. ,n other words, communications from applications on the node #o throu#h the Teradata @ateway. !n application on a node can be executed throu#h/

System Console that mana#es an SM" system. Remote lo#in, such as over a networ(&attached client connection.

Re1+e!t 5$'ce!!i),

! re2uest li(e the one above is processed a little differently, dependin# on whether the user is accessin# the Teradata Database throu#h a channel&attached or networ(&attached client/ A. S)* re2uest is sent from the client to the appropriate component on the node/ o Channel&attached client/ re2uest is sent to Channel Driver (throu#h the TD"'. o 3etwor(&attached client/ re2uest is sent to Teradata @ateway (throu#h C*,v7 or ?D C'. 7. Re2uest is passed to the "$(s'. B. "$s parse the re2uest into !M" steps. G. "$ Dispatcher sends steps to the !M"s over the =3$T. E. !M"s perform operations on data on the vdis(s. C. Response is sent bac( to "$s over the =3$T. K. "$ Dispatcher receives response. F. Response is returned to the client (channel&attached or networ(& attached'.

Te$#%#t# C.ie)t Uti.itie!

Teradata has a robust suite of client utilities that enable users and system administrators to en+oy optimal response time and system mana#eability. %arious client utilities are available for tas(s from loadin# data to mana#in# the system. Teradata utilities levera#e the Teradata Database 0s hi#h performance capabilities and are fully parallel and scalable. The same utilities run on smaller entry&level systems, as well as the lar#est M"" implementations. Teradata Database client utilities include the followin#, described in this section/

C+e$2 S+b-itti), Uti.itie! o T$) o Teradata S)* !ssistant L'#% #)% U).'#% Uti.itie! o -ast*oad o Multi*oad o T"ump o -ast$xport o Teradata 6arehouse uilder A%-i)i!t$#ti e Uti.itie! o Teradata Mana#er o Teradata Dynamic )uery Mana#er (TD)M' o Teradata !nalyst "ac( A$c"i e Uti.itie! o !RC o 3et%ault
o

3et ac(up

C+e$2 S+b-itti), Uti.itie!

The Teradata Database provides a number of tools that are front&end interfaces for submittin# S)* 2ueries. Two mentioned in this section are T$) and Teradata S)* !ssistant.

BTEC
T$) ( asic Teradata )uery' && often pronounced L $$&tee(M && is a Teradata Database tool used for submittin# S)* 2ueries on all platforms. T$) provides the followin# functionality/

Standard report writin# and formattin# asic import and export of small amounts of data to and from the Teradata Database across all platforms. -or tables more than a few thousand rows, the Teradata Database load utilities are recommended for more efficiency. !bility to submit S)* re2uests in the followin# ways/ o ,nteractive o atch

Te$#%#t# SCL A!!i!t#)t


Teradata S)* !ssistant (formerly (nown as )ueryman' is an information discovery>2uery tool that runs on Microsoft 6indows. Teradata S)* !ssistant enables you to access t"e Te$#%#t# D#t#b#!e as well as other ODBC/b#!e% %#t#b#!e!. Some of its features include/

!bility to save data in "C&based formats, such as Microsoft $xcel, Microsoft !ccess, and text files. :istory of submitted S)* syntax, to help you build scripts for data minin# and (nowled#e discovery. :elp with S)* syntax. ,mport and export of small amounts of data to and from ?D C& compliant databases. -or tables more than a few thousand rows, the Teradata Database load utilities are recommended for more efficiency.

D#t# L'#% #)% U).'#% Uti.itie!


,n a data warehouse environment, the database tables are populated from a variety of sources, such as mainframe applications, operational data marts, or other distributed systems throu#hout a company. These systems are the source of data such as daily transaction files, orders, usa#e records, $R" (enterprise resource plannin#' information, and ,nternet statistics. Teradata Division has a suite of data load and unload utilities optimi9ed for use with the Teradata Database. They run on any of the supported client platforms/

Channel&attached client 3etwor(&attached client 3ode

U!i), Te$#%#t# L'#% #)% U).'#% Uti.itie! Teradata load and unload utilities are fully parallel. ecause the utilities are scalable, they accommodate the si9e of the system. "erformance is not limited by the capacity of the load and unload tools. The utilities have full restart capability. This feature means that if a load or unload +ob should be interrupted for some reason, it can be restarted a#ain from the last chec(point, without havin# to start the +ob from the be#innin#. The load and unload utilities are/

-ast*oad Multi*oad T"ump -ast$xport Teradata 6arehouse uilder

y default, you can run up to AE instances of -ast*oad, Multi*oad, and -ast$xport in any combination. There is no limit to the number of concurrent T"ump +obs.

<#!tL'#%
4se the -ast*oad utility to load data into empty tables. -ast*oad can only wor( on one table at a time. -ast*oad loads data into an empty table in parallel, usin# multiple sessions to transfer bloc(s of data. -ast*oad achieves hi#h performance by fully exploitin# the resources of the system. !fter the data load is complete, the table can be made available to users.

M+.tiL'#%
4se the Multi*oad utility to maintain tables by/

,nsertin# rows into a *'*+.#te% '$ e-*t2 table 4pdatin# rows in a table Deletin# multiple rows from a table

Multi*oad can load multiple input files concurrently and wor( on up to five tables at a time, usin# multiple sessions. Multi*oad is optimi9ed to apply multiple rows in b.'c0/.e e. '*e$#ti')!. Multi*oad usually is run durin# a batch window, and places a loc( on on the destination table(s' to prevent user 2ueries from #ettin# inconsistent results before the data load or update is complete.

T5+-*
4se T"ump to/

Constantly load data into a table Continuously load, update, or delete data in tables 4pdate lower volumes of data usin# fewer system resources than other load utilities %ary the resource consumption and speed of the data loadin# activity over time

The T"ump utility complements Multi*oad as a data loadin# utility. ! ma+or difference is that T"ump uses row hash loc(s, which eliminates the need for table loc(s and ;batch windows; typical with Multi*oad. 4sers can continue to run 2ueries durin# T"ump data loads. ,n addition, T"ump is desi#ned for smaller volumes of data than Multi*oad, and maintains up to C8 tables at a time. T"ump has a dynamic throttle that operators can set to specify the percenta#e of system resources to be used for an operation. This enables operators to set when T"ump should run at full capacity durin# low system usa#e, or within limits when T"ump may affect other business users of the Teradata Database.

<#!tE6*'$t
4se the -ast$xport utility to export data from -+.ti*.e t#b.e! '$ ie(! on the Teradata Database to a client&based application. =ou can export data from any table or view to which you have the S$*$CT access privele#e. The destination for the exported data can be a/

H'!t 3i.e> ! file on your channel&attached or networ(&attached client system U!e$/($itte) #**.ic#ti')> !n ?utput Modification (?4TM?D' routine you write to select, validate, and preprocess the exported data.

-ast$xport is a data extract utility. ,t transfers lar#e amounts of data usin# bloc( transfers over multiple sessions and ($ite! t"e %#t# t' # "'!t 3i.e on the networ(&attached or channel&attached client. Typically, -ast$xport is run durin# a batch window, and the tables bein# exported are loc(ed.

Te$#%#t# W#$e"'+!e B+i.%e$


Teradata 6arehouse uilder (T6 ' is a data warehouse loadin# tool that enables data extraction, transformation and loadin# processes common to all data warehouses. 4sin# b+i.t/i) '*e$#t'$!, Teradata 6arehouse uilder combines the functionality of the Teradata utilities (-ast*oad, Multi*oad, -ast$xport, and T"ump' in a sin#le *#$#..e. e) i$')-e)t. ,ts extensible environment supports -ast*oad ,3M?Ds, -ast$xport ?4TM?Ds, and !ccess Modules to provide access to all the data sources you use today. There is a set of open !",s (!pplication "ro#ammer ,nterface' to add third party or custom data transformation to Teradata 6arehouse uilder scripts. 4sin# multiple, parallel tas(s, a sin#le Teradata 6arehouse uilder script can load data from disparate sources into the Teradata Database in the same +ob. Teradata 6arehouse uilder is scalable and enables end&to&end parallelism. The

previous versions of utilities (li(e -ast*oad' allow you to load data into the Teradata Database in parallel, but with a sin#le input stream. Teradata 6arehouse uilder allows you to run multiple instances of the extract, optional transformation, and load operators. =ou can have as many loads as you have sources in the same +ob. 6ith multiple sources of data comin# from multiple platforms inte#ration is important in a parallel environment. Teradata 6arehouse uilder eliminates the need for persistent stora#e. ,t stores data into data buffers so you no lon#er need to write data into a flat file. Since you don1t need flat files, there is no lon#er a 7@ file limit. Teradata 6arehouse uilder provides a sin#le, S)*&li(e scriptin# lan#ua#e, as well as a @4, to ma(e scriptin# faster and easier. =ou can do the extract, some transformation, and loads all in one S)*&li(e scriptin# lan#ua#e. ?nce the dynamics of the lan#ua#e are learned, you can perform multiple tas(s with a sin#le script. =ou can use script converters to convert scripts on existin# systems for utilities (-ast*oad, Multi*oad, -ast$xport, and T"ump' to Teradata 6arehouse uilder scripts.

A single Teradata Warehouse Builder job can load data from multiple disparate sources into the Teradata Database, as indicated by the green arrow.

Te$#%#t# W#$e"'+!e B+i.%e$ O*e$#t'$! The operators are components that ;plu#; into the T6 infrastructure and actually perform the functions.

The -ast*oad ,3M?D and -ast$xport ?4TM?D operators support the current -ast*oad and -ast$xport ,3M?D>?4TM?D features. The Data Connector operator is an adapter for the !ccess Module or non&Teradata files. The S)* Select and ,nsert operators submit the Teradata S$*$CT and ,3S$RT commands. The *oad, 4pdate, $xport and Stream operators are similar to the

current -ast*oad, Multi*oad, -ast$xport and T"ump utilities, but built for the T6 parallel environment. The ,3M?D and ?4TM?D adapters, Data Connector operator, and the S)* Select>,nsert operators are included when you purchase the ,nfrastructure. The *oad, 4pdate, $xport and Stream operators are purchased separately. To simplify these new concepts, let1s compare the Teradata 6arehouse uilder ?perators with the classic utilities that we +ust covered. TWB O*e$#t'$ Te$#%#t# Uti.it2 De!c$i*ti') TWB O*e$#t'$ Te$#%#t# Uti.it2 De!c$i*ti') ! consumer&type operator that uses the Teradata -ast*oad protocol. Supports $rror limits and Chec(point> Restart. oth support Multi&%alue Compression and "",.

*?!D

-ast*oad

4"D!T$

4tili9es the Teradata Multi*oad protocol to enable +ob based table updates. This Multi*oad allows hi#hly scalable and parallel inserts and updates to a pre&existin# table. -ast$xport T"ump ! producer operator that emulates the -ast$xport utility 4ses multiple sessions to perform DM* transactions in near real&time. This operator emulates the Data Connector !",. Reads external data files, writes data to external data files, reads an unspecified number of data files. Reads data from an ?D C "rovider.

$5"?RT STR$!M

DataConnector 3>!

?D C

3>!

A%-i)i!t$#ti e Uti.itie!
!dministrative utilities use a #raphical user interface (@4,' to monitor and mana#e various aspects of a Teradata Database system. The administrative utilities are/

Teradata Mana#er Teradata Dynamic )uery Mana#er (TD)M' Teradata !nalyst "ac(

Te$#%#t# M#)#,e$
Teradata Mana#er is a production and performance monitorin# system that helps a D ! or system mana#er to monitor, control, and administer one or more Teradata Database systems throu#h a @4,. Runnin# on *!3&attached clients, Teradata Mana#er has a variety of tools and applications to #ather, manipulate, and analy9e information about each Teradata Database bein# administered. -or examples of Teradata Mana#er functions, clic( here/ Teradata Mana#er $xamples

Te$#%#t# D2)#-ic C+e$2 M#)#,e$ 8TDCM)


Teradata Dynamic )uery Mana#er (TD)M', formerly (nown as Database )uery Mana#er (D )M', is a 2uery wor(load mana#ement tool that dynamically tunes the Teradata Database. TD)M can run, suspend, reschedule, or re+ect a 2uery based on current ('$0.'#% and !et t"$e!"'.%!. -or example, with TD)M a re2uest can be scheduled to run periodically or durin# a specified time period without an active system connection. Results can be retrieved any time after the re2uest has been submitted by TD)M and executed. TD)M can restrict 2ueries based on factors such as/

!nalysis control thresholds ?b+ect control thresholds $nvironmental factors

Te$#%#t# A)#.2!t 5#c0


Teradata !nalyst "ac( is a suite of the followin# products. Te$#%#t# ?i!+#. E6*.#i) Teradata %isual $xplain ma(es 2uery plan analysis easier by providin# the ability to c#*t+$e #)% ,$#*"ic#..2 $e*$e!e)t t"e !te*! '3 t"e *.#) #)% *e$3'$- c'-*#$i!')! of two or more plans. ,t is intended for application developers, database administrators and database support personnel to better understand why the Teradata Database ?ptimi9er chooses a particular plan for a #iven S)* 2uery. !ll information re2uired for 2uery plan analysis such as database ob+ect definitions, data demo#raphics and cost and cardinality estimates is available throu#h the Teradata %isual $xplain interface. The tool is very helpful in identifyin# the performance implications of data s(ew and bad or missin# statistics. %isual $xplain uses a C+e$2 C#*t+$e D#t#b#!e to store 2uery plans which can then be visuali9ed or manipulated with other Teradata

!nalyst "ac( tools. Te$#%#t# S2!te- E-+.#ti') T''. 8Te$#%#t# SET) Teradata S$T simplifies the tas( of e-+.#ti), # t#$,et !2!te- by providin# the ability to export and import all information necessary to fa(e out the optimi9er in a test environment. This information can be used alon# with the Teradata1s Tar#et *evel $mulation feature to #enerate 2uery plans on the test system as if they were run on the tar#et system. This feature is useful for verifyin# 2ueries and reproducin# optimi9er related issues in a test environment. Teradata S$T allows the user to capture the followin# by database, 2uery, or wor(load/

System cost parameters ?b+ect definitions Random !M" samples Statistics )uery execution plans Demo#raphics

This tool does not export user data. Te$#%#t# I)%e6 Wi:#$% Teradata ,ndex 6i9ard automates the process of manual index desi#n by $ec'--e)%i), !ec')%#$2 i)%e6e! for a particular wor(load. Teradata ,ndex 6i9ard provides a simple, easy&to&use #raphical user interface (@4,' that #uides the user how to #o about analy9in# a database wor(load and provides recommendations for improvin# performance throu#h the use of indexes. Te$#%#t# St#ti!tic! Wi:#$% Teradata Statistics 6i9ard is a #raphical tool that has been desi#ned to #+t'-#te t"e c'..ecti') #)% $e/c'..ecti') '3 !t#ti!tic!, resultin# in better 2uery plans and helpin# the D ! to efficiently mana#e statistics. The Statistics 6i9ard enables the D ! to/

Specify a wor(load to be analy9ed for recommendations specific to improvin# the performance of the 2ueries in a wor(load. Select an arbitrary database or selection of tables, indexes, or columns for analysis, collection, or re&collection of statistics.

!s chan#es are made within a database, the Statistics 6i9ard identifies those chan#es and recommends which t#b.e! should have statistics collected, based on a#e of data and table #rowth, and what columns>indexes would benefit from

havin# statistics defined and collected for a specific ('$0.'#%. The D ! is then #iven the opportunity to accept or re+ect the recommendations.

A$c"i #. Uti.itie!

Teradata has utilities specifically desi#ned for data archive and recovery purposes. There are different utilities for channel&attached clients and networ(& attached clients.

A$c"i i), ') C"#))e./Att#c"e% C.ie)t!


,n a channel&attached (mainframe' client environment, the A$c"i e Rec' e$2 8ARC) +ti.it2 is used to bac( up data. ,t supports commands written in .ob

Control *an#ua#e (.C*'. The !RC utility archives and restores database ob+ects, allowin# recovery of data that may have been dama#ed or lost. There are several scenarios where restorin# ob+ects from external media may be necessary/

Restorin# non&-allbac( tables after a dis( failure. Restorin# tables that have been corrupted by batch processes that may have left the data in an uncertain state. Restorin# tables, views, or macros that have been accidentally dropped by the user. Miscellaneous user errors resultin# in dama#ed or lost database ob+ects.

6ith the !RC utility you can copy a table and restore it to another Teradata Database. ,t is scalable and parallel, and can run on a channel&attached client (or networ(&attached client' or a node.

A$c"i i), ') ;et('$0/Att#c"e% C.ie)t!


,n a networ(&attached client environment, the A$c"i e Rec' e$2 8ARC) +ti.it2 is used to bac( up data, alon# with either of the followin# tape stora#e subsystems/

;et?#+.t (from a( one Software ,nc.' ;etB#c0+* (from %$R,T!S Software Corporation'

3et%ault and 3et ac(up have modules created for Teradata Database systems

for use in a scalable, parallel, enterprise environment. They run on networ(& attached clients or a node (Microsoft 6indows or 43,5 M"&R!S'. Data is bac(ed up into the 3et%ault or 3et ac(up tape stora#e subsystems usin# the !RC utility.

4. TERADATA SCL W"#t I! SCL&

The Teradata Database is accessed usin# S)* (Structured )uery *an#ua#e'. S)* is the industry standard access lan#ua#e for communicatin# with a relational database. S)* is a set&oriented lan#ua#e for relational database mana#ement. ! user or application can use S)* statements to perform operations on the data and define how an answer set should be returned from an RD MS. The Teradata Database supports two types of S)*/

A;SI SCL> Teradata S)* is compliant with !3S, standards (an industry standard'. Te$#%#t# SCL E6te)!i')!> 3CR has added Teradata S)* extensions above and beyond standard S)* capabilities, includin# one&step S)* statements for complex administrative operations.

Te$#%#t# SCL Be)e3it!

Teradata S)* is the set of S)* commands used with the Teradata Database. Some benefits of Teradata S)* are/

5#$#..e. E6ec+ti') / The ?ptimi9er brea(s up an S)* statement into tas(s that can be executed in parallel to minimi9e resource contention. The desi#n of the Teradata Database, alon# with its automatic data distribution, balances the wor(load and reduces bottlenec(s. A;SI C'-*.i#)t / Teradata S)* is compliant with !3S, standards. ,f you have pro#rams already written with !3S,&compliant S)* for a different relational database, you can run them with the Teradata Database, as well. Hi,"/5e$3'$-#)ce E6te)!i')! / 3CR has added Teradata S)* extensions that are above and beyond the standard S)* capabilities, includin# one&step S)* statements for complex administrative operations.

T2*e! '3 SCL St#te-e)t!


S)* statements commonly are cate#ori9ed as follows/

Data Definition *an#ua#e (DD*' Data Manipulation *an#ua#e (DM*' Data Control *an#ua#e (DC*'

D#t# De3i)iti') L#),+#,e 8DDL)


Data Definition *an#ua#e (DD*' is used to define and create 4sers, Databases, and the ob+ects they contain (tables, views, macros, tri##ers, and stored procedures'. $xamples/ CR$!T$ & Define a new Database, 4ser, database ob+ect, or index. DR?" & Remove an existin# Database, 4ser, database ob+ect, index, or statistics. !*T$R & Chan#e table structure and protection definition, or enable and disable tri##ers.

D#t# M#)i*+.#ti') L#),+#,e 8DML)


Data Manipulation *an#ua#e (DM*' is used to wor( with data, includin# tas(s such as insertin# data into a table, updatin# an existin# record, or performin# 2ueries. $xamples/ S$*$CT & "erform relational 2uery functions (Select, .oin, 4nion, ,ntersect, Minus'. ,3S$RT & "lace a new row into a table. 4"D!T$ & Modify values in an existin# row. D$*$T$ & Remove a row from a table.

D#t# C')t$'. L#),+#,e 8DCL)


Data Control *an#ua#e (DC*' is used for administrative tas(s such as #rantin# and revo(in# privile#es to database ob+ects or controllin# ownership of those ob+ects. $xamples/ @R!3T & @ive user privile#es. R$%?I$ & Remove user privile#es. @,%$ & Transfer database ownership.

T"e SELECT St#te-e)t


The SELECT statement is the most commonly used S)* statement. ,t is a DM* statement that allows you to retrieve data from one or more tables. ,n its most common form, you specify certain rows to be returned as shown.
SELECT * FROM employee WHERE department_number = 401

The asteris(, ;N;, is a ;wild card; character. ,n this example, it specifies that when the result is displayed, we want to see all the columns of the rows where the department number is G8A. The FROM clause specifies from which table in our database to retrieve the rows. The WHERE clause acts as a filter that passes only rows meetin# the specified condition && in this case, rows of employees in department G8A. ;OTE> S)* does not re2uire a trailin# semicolon to end a statement, but the asic Teradata )uery ( T$)' utility that can be used to enter S)* statements does. The semicolon is used in the examples, as if it were entered in T$). ,f you do not specify a WHERE clause, the 2uery would return all columns and all rows from the employee table, for example/
SELECT * FROM employee
MA%A(E'$ EM !"#EE$ DE A'TME%T$ EM !"#EE$ %&MBE' %&MBE' %&MBE' /001 /0/2 30/ /008 /0/2 30/ /006 080/ 703 /007 /003 70/ /005 /006 703 /003 080/ 70/ )"B$ *"DE 3/4/0/ 3/4/04 73//00 7/4/0/ 734/0/ 7///00 !A+T$ ,-'+T$ .-'E$ B-'T.$ +A!A'#$ %AME %AME DATE DATE AM"&%T +tein 9anies:i 'yan )ohnson ;illegas Trader )ohn *arol !oretta Darlene Arnando )ames 51/0/6 55040/ 51/0/6 51/0/6 550/04 51053/ 63/0/6 6806/5 6602/0 710743 350/3/ 7501/2 4276000 4246000 3/40000 3130000 7250000 3586000

Ret+$)i), # S+b!et '3 C'.+-)! ,nstead of usin# the asteris( symbol to specify all columns, we could name specific columns separated by a comma/
SELECT employee_number , !"re_date # la$t_name # %"r$t_name employee department_number = 401

FROM WHERE

U)!'$te% Re!+.t!

Results include the columns named in the S)* statement. The results are unsorted unless you specify that you want them sorted in a certain way. :ow to retrieve ordered results is covered in the followin# section.
employee_number 1004 100, 101, 1010 1000 1001 1000 !"re_date &'(10(1) &'(0&(,1 &&(04(01 &&(0,(01 &1(0,(01 &'(0'(14 &'(0&(,1 la$t_name *o!n$on Trader -!"ll"p$ Ro.er$ Ma2!ado Hoo5er 6ro7n %"r$t_name +arlene *ame$ C!arle$ Fran/ 3lbert W"ll"am 3lan

T"e ORDER BB C.#+!e


To have your results displayed in a sorted order, use the OR+ER 68 clause, for example/
OR+ER 68 !"re_date

S'$t O$%e$ 4sin# this example, results are returned in ascendin# order. ,f a sort order is not specified, we #et results in ascendin# order by default. To specify ascendin# or descendin# order, add 3SC or +ESC to the end of your OR+ER 68 clause. The followin# is an example of specifyin# the results in ascendin# order.
SELECT employee_number #la$t_name #%"r$t_name #!"re_date FROM employee WHERE department_number = 401 OR+ER 68 !"re_date 3SC

O+t*+t
employee_number !"re_date la$t_name %"r$t_name 1001 &'(0'(14 Hoo5er W"ll"am 100, &'(0&(,1 Trader *ame$ 1000 1004 1010 101, 1000 &'(0&(,1 &'(10(1) &&(0,(01 &&(04(01 &1(0,(01 6ro7n *o!n$on Ro.er$ -!"ll"p$ Ma2!ado 3lan +arlene Fran/ C!arle$ 3lbert

;#-i), Specify the column to sort on by either namin# it directly (for example, hireHdate' or by namin# its position within the SELECT statement. Since hireHdate is the fourth column in the SELECT clause, the followin# S)* statement is e2uivalent to the one in the example above/
OR+ER 68 4 3SC

U!e$ A!!i!t#)ce St#te-e)t! #)% M'%i3ie$!


S)* user assistance statements (and modifiers' vary widely from database vendor to database vendor. The Teradata Database1s user assistance statements are commonly called Teradata extensions. These Teradata extensions are additions to the DD*, DM*, and DC* statements in standard S)*, and ma(e some operations less time consumin#. This pa#e discusses the followin# Teradata S)* user assistance commands/

:$*" :$*" S$SS,?3 S:?6

This pa#e also discusses the statement modifier/

$5"*!,3

T"e HEL5 St#te-e)t


The :$*" statement is used to display information about database ob+ects. =ou can #et help on the followin#/ :$*" D!T! !S$ :$*" 4S$R :$*" T! *$ :$*" %,$6 :$*" M!CR? :$*" TR,@@$R :$*" "R?C$D4R$ :$*" C?*4M3 :$*" ,3D$5 :$*" ST!T,ST,CS . . . and much moreO E6#-*.e> :$*" D!T! !S$ databasename Displays all the ob+ects in the specified database.

T"e HEL5 SESSIO; St#te-e)t


4se the :$*" S$SS,?3 statement to see specific information about your S)* session. E6#-*.e> :$*" S$SS,?3J Displays the user name with which you lo##ed in, the lo#&on date and time, your default database, and other information related to your current session.

T"e SHOW St#te-e)t


4se the S:?6 statement to display the data definition lan#ua#e (DD*' associated with database ob+ects (tables, views, macros, tri##ers, or stored procedures'. =ou can show the DD* for the followin#/ S:?6 T! *$ S:?6 %,$6 S:?6 M!CR? S:?6 TR,@@$R S:?6 "R?C$D4R$ S:?6 .?,3 ,3D$5 E6#-*.e> S:?6 T! *$ tablename Displays the CR$!T$ T! *$ statement that was used to create the specified table.

T"e ED5LAI; M'%i3ie$


The $5"*!,3 modifier allows you to preview how the Teradata Database will execute an S)* re2uest. ,t is a #ood way to see what database resources will be used in processin# the re2uest. 4se the $5"*!,3 modifier precedin# any S)* statement to see a plan with/

$n#lish text describin# a plan for how the statement will be processed. !n estimate of the number of rows involved. ! relative cost of the re2uest.

The relative cost is shown in units of time, and should not be used to predict actual response time for an S)* re2uest. This time estimate can be used to compare the duration of re2uest processin# relative to other plans. 6hen you execute a re2uest preceded by the $5"*!,3 modifier, the re2uest is )'t executed. ,nstead, the system/

-ully parses the re2uest. ?ptimi9es the re2uest. Reports the complete plan for executin# the re2uest in readable $n#lish.

E6#-*.e> $5"*!,3 S$*$CT N -R?M tablenameJ Displays the steps involved in processin# the re2uest, S$*$CT N -R?M the specified table.

5. DATA STRUCTURE A Te$#%#t# D#t#b#!e


,n Teradata Database systems, the words ;database; and ;user; have specific definitions. D#t#b#!e> T"e Te$#%#t# De3i)iti') ,n Teradata, a ;database; provides a lo#ical #roupin# of information. ! Teradata Database also provides a (ey role in space allocation and access control. ! Teradata Database is a defined, lo#ical repository that can contain ob+ects, includin#/

D#t#b#!e!> ! defined ob+ect that may contain a collection of Teradata Database ob+ects. U!e$!> Databases that each have a lo#on ,D and password for lo##in# on to the Teradata Database. T#b.e!> Two&dimensional structures of columns and rows of data stored on the dis( drives. (Re2uire "erm Space' ?ie(!> ! virtual ;window; to subsets of one or more tables or other views, pre&defined usin# a sin#le S$*$CT statement. (4se no "erm Space' M#c$'!> Definitions of one or more Teradata S)* and report formattin# commands. (4se no "erm Space' T$i,,e$!> ?ne or more Teradata S)* statements associated with a table and executed when specified conditions are met. (4se no "erm Space' St'$e% 5$'ce%+$e!> Combinations of procedural and non&procedural statements run usin# a sin#le C!** statement. (Re2uire "erm Space'

;'te> ! Database (it" )' 5e$- S*#ce can have views, macros, and tri##ers, but no tables or stored procedures. These Teradata Database ob+ects are created, maintained, and deleted usin# S)*, and are described in further detail in this section. U!e$> A S*eci#. =i)% '3 D#t#b#!e ! 4ser can be thou#ht of as a colllection of tables, views, macros, tri##ers, and stored procedures. ! 4ser is a specific type of Database, and has attributes in

addition to the ones listed above/


*o#on ,D "assword

So, a 4ser is the same as a Database except that a 4ser can actually lo# on to the RD MS. To lo# on to a Teradata Database, you need to specify a 4ser to lo# on to (which is simply a Database with a password'. =ou cannot lo# on to a Database because it has no password. ;'te> ,n this course, we will use uppercase ;4; for 4ser and uppercase ;D; for Database when referrin# to these specific Teradata Database ob+ects.

T#b.e!
! table in a relational database mana#ement system is a two&dimensional structure made up of c'.+-)! and physical $'(! stored in data bloc(s on the dis( drives. $ach column represents an attribute of the table. !ttributes identify, describe, or 2ualify the table. $ach column is named and all the information contained within it is of the same type, for example, Department 3umber. $ach row represents an instance of the table. ! row could represent a particular person, thin#, or event.

?ie(!
! view is li(e a ;window; into tables that allows multiple users to loo( at portions of the same base data. ! view may access one or more tables, and may show only a subset of columns from the table(s'. ! view does not exist as a real table and does not occupy dis( space. ,t serves as a reference to existin# tables or views. ! view is a lo#ical structure with no actual data && it accesses data that is stored in a table and returns the re2uested

rows from the table to the user. 4ser privile#es determine which views a user can see, and what the user can do with each view. ,f you have the privile#es to chan#e data in a table, then you can chan#e it through its associated view. The system ma(es the chan#es to the underlyin# table, and the chan#es are reflected in the view. %iews are useful in an enterprise&wide data warehouse environment. :i#her levels of mana#ement in an or#ani9ation may want to see the ;bi# picture; contained in the sin#le, lar#e storehouse of information, but various departments want or need to see only the portion they are concerned with. 4sin# views, all levels of the or#ani9ation are still accessin# the same underlyin# data, for consistent results.%iews are often used for the followin# purposes/

! view can be defined for a user (or #roup of users' to have read&only access, insulatin# the ori#inal table from inadvertent or unwelcome chan#es. ! view can filter out extraneous columns for a user (or #roup of users'. The view would contain a subset of table columns, or a combination of columns from different tables, that are appropriate for a specific tas(. %iews can simplify or standardi9e data access techni2ues for different users across the company.

M#c$'!
! -#c$' is a Teradata Database extension to !3S, S)* that defines a se2uence of prewritten Teradata S)* statements. Macros are *$e/%e3i)e%4 !t'$e% !et! '3 ')e '$ -'$e SCL c'--#)%! #)%E'$ $e*'$t/3'$-#tti), 8BTEC) c'--#)%!. Macros can also contain comments. Macros can be a convenient !"'$tc+t 3'$ e6ec+ti), ,$'+*! '3 3$e1+e)t.2/$+) '$ c'-*.e6 SCL !t#te-e)t! 81+e$ie!) '$ !et! '3 '*e$#ti')!. 6hen you execute the macro, the statements execute as a sin#le transaction. Macros reduce the number of (eystro(es needed to perform a complex tas(. This saves you time, reduces the chance for errors, and reduces the communication volume to the Teradata Database. Macros also have a performance benefit && they are #+t'-#tic#..2 $e/'*ti-i:e% each time they are run. !s the database demo#raphics evolve over time, you can

be sure that the macros are optimi9ed for the current data, not for data that existed when the macro was created. =ou can use the $5"*!,3 function to compare a macro1s execution plan as your data demo#raphics chan#e. Macros can be executed interactively or by batch applications, and simplify access to the system. Macros are database ob+ects. ecause they are stored in the Teradata Database1s Data Dictionary, they are available to all connected clients. Macros also control access to the system. ! Database !dministrator can use macros to/

*imit the tas(s a user can perform, for example, by #ivin# the user access to only a macro and not a whole database. Control which users can execute a macro. Restrict users to specific rows and columns of the database throu#h the macro code.

Macros are owned by a 4ser or a Database and can be run by 4sers who have $5$C4T$ privile#es. "arameters allow you to customi9e a macro to suit your individual needs at run time. To execute the macro, you use one $5$C4T$ statement, and the statements in the macro are processed as a sin#le transaction. To wor( with macros, a 4ser must have the followin# privile#es/

$5$C DR?" CR$!T$

T$i,,e$!
! tri##er is a set of S)* statements usually associated with a column or table that are pro#rammed to be run (or ;fired;' when specified chan#es are made to the column or table. The pre&defined chan#e is (nown as a tri##erin# event, which causes the S)* statements to be processed.

!s an example, a user with the appropriate privile#es can create a tri##er to (eep company records consistent. The tri##er would be associated with the D$"!RTM$3T table, which contains each department number in the company, as well as the employee number of the mana#er assi#ned to that department. This tri##er has a t$i,,e$i), event and a t$i,,e$e% event/

T$i,,e$i), e e)t> 6hen a new mana#er is assi#ned to a department, the mana#er1s employee number chan#es for that department in the D$"!RTM$3T table. T$i,,e$e% e e)t> S)* statements will be executed that update each affected employee1s information in the $M"*?=$$ table, which lists each employee, his or her employee number, and the employee number of his or her mana#er.

St'$e% 5$'ce%+$e!
! stored procedure is a pre&defined set of statements invo(ed throu#h a sin#le C!** statement in S)*. 6hile a stored procedure may seem li(e a macro, it is different in that it can contain/

Teradata S)* data manipulation statements (non&procedural' "rocedural statements (in the Teradata Database, referred to as Stored "rocedure *an#ua#e'

A St'$e% 5$'ce%+$e Be)e3it> Abi.it2 t' U!e S5L ! stored procedure provides the benefit of S"* control and condition handlin# statements unavailable in Teradata S)*. Teradata Database macros can contain only Teradata S)* statements. The combined functionality of Teradata S)* and S"* statements in stored procedures provides a computationally complete pro#rammin# lan#ua#e. $xamples of S"* functionality include/

,-&T:$3&$*S$ D? 6:,*$ *??" $@,3 & $3D

3ote that in the !3S, S)* standard, the procedural statements are included as part of S)*. ,n the Teradata Database, the procedural statements are allowed only in a stored procedure, so the terms S)* and S"* are used to differentiate between the non&procedural and procedural statements. A)'t"e$ St'$e% 5$'ce%+$e Be)e3it> Le!! IEO O e$"e#% oth macros and stored procedures eliminate the overhead of sendin# commands from a client over a connection to the "$ and down to the !M"s. The commands for macros and stored procedures are resident on the Teradata Database, so there is less ,>? (input>output' traffic used to execute them.

C$e#ti), D#t#b#!e! #)% U!e$!


,n the Teradata Database, Databases (includin# that special cate#ory of Databases called 4sers' have attributes assi#ned to them/

Acce!! Ri,"t!> "rivile#es that allow a 4ser to perform operations (such as CR$!T$, DR?", and S$*$CT' a#ainst database ob+ects. ! 4ser must have the correct access ri#hts to a database ob+ect in order to access it. 5e$- S*#ce> The maximum amount of "ermanent Space assi#ned and available to a 4ser or Database to store tables. 4nli(e some other relational databases, t"e Te$#%#t# D#t#b#!e %'e! )'t *"2!ic#..2 *$e/ #..'c#te 5e$- S*#ce for Databases and 4sers when they are defined durin# ob+ect definition time. ?nly the "ermanent Space limit is defined, then the space is consumed dynamically as needed. A.. D#t#b#!e! "# e # %e3i)e% +**e$ .i-it of "ermanent Space. S*''. S*#ce> The amount of space assi#ned and available to a 4ser or Database to #ather answer sets. -or example, when executin# a conditional 2uery, 2ualifyin# rows are temporarily stored usin# Spool Space. Dependin# on how the system is set up, a sin#le 2uery could

temporarily use all available System Space to store its result in spool. "ermanent Space not bein# used for tables is available for Spool Space.

Te-* S*#ce> The amount of space used for #lobal temporary tables, and these results remain available to the 4ser until the session is terminated. Tables created in Temp Space will survive a restart. "ermanent Space not bein# used for tables is available for Temp Space as well as Spool Space.

A L',ic#. D#t#b#!e Hie$#$c"2


,n a lo#ical, hierarchical or#ani9ation, Databases (includin# 4sers' are created subordinate to existin# Databases or 4sers. The ownin# Database or 4ser is called the parent. The subordinate Database or 4ser is called the child. 5e$-#)e)t S*#ce for the new Database or 4ser comes from its immediate parent. 6hen the Teradata Database software is first installed, all "ermanent Space is assi#ned to Database D C (also a 4ser in Teradata Database terminolo#y, because you can lo# on to it with a userid and password'. Durin# installation, the followin# Databases are created/

Database Crashdumps (initially empty' 4ser System-$ (with its views and macros' 4ser Sys!dm (with its views and macros'

ecause Database D C is the immediate parent of these child Databases, "ermanent Space limits for the children are subtracted from Database D C.

C$e#ti), # ;e( D#t#b#!e

!fter the initial setup, customer tables can then be created. ?ne way to set up a database hierarchy would be to create a Database !dministrator 4ser directly subordinate to Database D C. Most of the system "ermanent Space would be assi#ned to the Database !dministrator 4ser. This setup #ives you the freedom to have multiple administrators lo##in# on to the Database !dministrator 4ser, and limit the number of people lo##in# on directly to Database D C (which has more access ri#hts than any other 4ser'. 3ext, all other 4sers and Databases would be created from the database administrator 4ser, and their "ermanent Space limits would be subtracted from the Database !dministrator 4ser1s space limit. =our hierarchy would loo( li(e this/

Database D C at the hi#hest level, the parent of all other Databases (includin# 4sers'. 4ser SysD ! (we called it SysD !J you can assi#n it any name' with the ma+ority of the system1s "erm Space assi#ned to it. !ll customer Databases and 4sers in the system created from 4ser SysD ! . $ach table, view, macro, stored procedure, and tri##er are owned by a Database (or 4ser'. =ou specify the ownin# Database when creatin# the ob+ects. -or example, when creatin# a table, you specify the table1s owner in the CR$!T$ T! *$ statement. ,f no owner is specified, the system uses the 4ser you are lo##ed on to as the table1s owner.

M#6i-+- 5e$- S*#ce A..'c#ti')!> A) E6#-*.e

elow is an example of how "ermanent Space limits for 4sers and Databases come from the immediate parent 4ser or Database. ,n this case, the 4ser SysD ! has E88 @ of maximum "ermanent Space assi#ned to it.

The 4ser :R is created from SysD ! with 788 @ of maximum "ermanent Space. The 788 @ for :R is subtracted from SysD !, which becomes B88 @ (E88 @ minus 788 @ '.

The 4ser "ayroll is created as a child of :R with A88 @ of "ermanent Space. The A88 @ for "ayroll is subracted from :R, which becomes A88 @ (788 @ minus A88 @ '.

!t a different level under SysD !, Database Mar(etin# is created as a child of SysD !, with A88 @ of maximum "ermanent Space. The A88 @ for Mar(etin# comes from its parent, SysD !, which becomes 788 @ (B88 @ minus A88 @ '.

S*''. S*#ce
M#6i-+- S*''. S*#ce !s mentioned previously in ;Creatin# Databases and 4sers,; Spool Space is wor(in# space used to hold intermediate answer sets. !ny "erm Space currently unassi#ned is available as Spool Space. Definin# Spool Space is not re2uired when 4sers and Databases are created. ,f it is not defined, the Spool Space for the 4ser or Database is inherited from its parent. Thus, if no Spool Space limit were defined for any 4sers or Databases, an erroneous S)* re2uest could create a ;runaway transaction; that unintentionally consumes all of a system1s resources. -or this reason, definin# maximum Spool Space for a 4ser or Database is hi#hly recommended. The Spool Space limit for a Database or 4ser is not subtracted from its immediate parent, but the Database or 4ser1s maximum spool allocation can only be as lar#e as its immediate parent. -or example/

Database ! has a Spool Space limit of E88 @ . Database is created as a child of Database !. The maximum Spool Space that can be allocated to Database is E88 @ . Database C is created as another child of Database !. The maximum Spool Space that can be allocated to Database C is also E88 @ .

ecause Spool Space is wor(in# space, temporarily used and released by the system as needed, the total maximum Spool Space allocated for all the Databases and 4sers on the system can actually exceed the total system dis( space. ut this is not the amount of Spool Space actually consumed.

C')!+-i), S*''. S*#ce


The maximum Spool Space for a Database (or 4ser' is merely an upper limit of the Spool Space that the Database can use while processin# a transaction. There are two limits to Spool Space utili9ation/

The maximum Spool Space assi#ned to a 4ser or Database. ,f a transaction is #oin# to exceed its assi#ned limit, it is aborted and an error messa#e is #iven statin# that the maximum Spool Space was exceeded. "hysical limitation of dis( space. -or a specific transaction, the system can only use the amount of Spool Space #ct+#..2 # #i.#b.e ') t"e %i!0 %$i e! at that particular time, whether a maximum spool limit has been defined or not. ,f a +ob is #oin# to exceed the Spool Space available on the system, an error messa#e is #iven statin# that there is not enou#h space to process the +ob.

!s the amount of "ermanent Space used to store data varies over a lon# period of time, so will the amount of space available for spool (wor(in# space'.

Te-*'$#$2 S*#ce

Temporary Space is "ermanent Space currenlty not bein# used. Temporary Space is the amount of space used for #loabal temporary tables, and these results remain available to the 4ser until the session is terminated. Tables created in Temp Space will survive a restart.

D#t# Dicti')#$2

The Data Dictionary is a set of relational tables that contains information about the RD MS and database ob+ects within it. ,t is the metadata or ;data about the data; for a Teradata Database. The Data Dictionary resides in Database D C. Some of the ma+or items it trac(s are/

Dis( space !ccess authori9ations ?wnership Data definitions

Di!0 S*#ce
The Data Dictionary stores information about how much space is allocated for perm and spool for each Database and 4ser. The table below shows an example of Data Dictionary information for space allocations. ,n this example, the 4sers "ayroll and enefits have no "ermanent Space allocated or consumed because they do not contain tables.

Acce!!
The Data Dictionary also stores information about which 4sers can access which database ob+ects. System !dministrators often are responsible for archivin# the system. ,n the example below, it is li(ely that the Sys!dm 4ser would have access to the tables in the $mployee and Crashdumps databases, as well as other ob+ects. 6hen you #rant and revo(e access to any 4ser for any database ob+ect, privile#es are stored in the Data Dictionary.

O()e$!
The Data Dictionary also stores information about which Databases and 4sers own each database ob+ect.

De3i)iti')!
The Data Dictionary stores definitions of all database ob+ects, their names, and their place in the hierarchy.

-or macros, the Data Dictionary also stores the actual S)* statements of the macro. 6hile stored procedures also contain statements (S)* and S"* statements', the statements for each stored procedure are (ept in a separate table and distributed amon# the !M"s (li(e re#ular user data', not (ept in the Data Dictionary.

6. DATA 5ROTECTIO; 5$'tecti), D#t#

Several types of data protection are available with the Teradata Database. !ll the data protection methods shown on this pa#e are covered in further detail later in this module.

RAID

Redundant !rray of ,nexpensive Dis(s (R!,D' is a stora#e technolo#y that provides data protection at the dis( drive level. ,t uses #roups of dis( drives called ;arrays; to ensure that data is available in the event of a failed dis( drive or other component. The word, ;redundant,; implies that either data, functions, and>or components have been duplicated in the array1s architecture. The industry has a#reed on six R!,D confi#uration levels (R!,D 8 throu#h R!,D E'. The

classifications do not imply superiority of one mode over the other, but differentiate how data is stored on the dis( drives. 6ith the Teradata Database, the two R!,D technolo#ies used most commonly are R!,D A and R!,D E. ?n systems usin# $MC dis( drives, R!,D E is called R!,D S.

<#..b#c0

-allbac( is a Teradata Database feature that protects data a#ainst !M" failure. !s shown later in this module, -allbac( uses #roups of !M"s that provide for data availability and consistency if an !M" is unavailable.

L'c0!
Temporary loc(s can be placed on data to prevent multiple users from simultaneously chan#in# it/

$xclusive *oc( 6rite *oc( Read *oc( !ccess *oc(

F'+$)#.!
The Teradata Database has +ournals that can be used for specific types of data or process recovery/

"ermanent .ournals Recovery .ournals

RAID 1
R!,D A is a data protection scheme that uses mirrored pairs of dis(s to protect data from a sin#le drive failure.

RAID 1> E33ect! ') B'+$ S2!teR!,D A re2uires double the number of dis(s because every data bloc( has an identical copy. R!,D A is usually faster than R!,D E. The hi#hest level of data protection is R!,D A with -allbac(.

RAID 1> H'( It W'$0!


R!,D A protects a#ainst a sin#le dis( failure usin# the followin# principles/

Mirrorin# Readin#

Mi$$'$i),> R!,D A maintains a duplicate dis( for each dis( in the system.

;'te> ,f you confi#ure more than one pair of dis(s per !M", the RD!C stripes the data across both the re#ular and mirror dis(s. 3ote that while striped mirrorin# (also called R!,D A P 8' is available, it is not recommended for use with the Teradata Database. Striped mirrorin# is a method used to create parallelism for a non&parallel environment. ecause the Teradata Database is already parallel, there is no benefit #ained from usin# striped mirrors.

Re#%i),> 4sin# both copies of the data, the system reads data bloc(s from the first available dis(.

RAID 1> H'( It H#)%.e! <#i.+$e!


,f a dis( fails, the Teradata Database is unaffected and the followin# are each handled in a different way/

Reads 6rites Replacements

Re#%!> 6hen a drive is down, the system reads the data from the functional drive only. There is a minor performance penalty because reads can occur from only one drive instead of two.

W$ite!> 6hen a drive is down, the system writes to the functional drive. 3o mirror ima#e exists at this time. Re*.#ce-e)t!> !fter you replace the failed dis(, the dis( array controller automatically reconstructs the data on the new dis( from the mirror ima#e. 3ormal system performance is affected durin# the reconstruction of the failed dis(

RAID 5
R!,D E is a data protection scheme that uses dis( arrays to protect data from the failure of a sin#le drive. ;'te> R!,D S is the name for R!,D E implemented on $MC dis( drives.

Dis( arrays contain the followin# ma+or components/


SCS, bus "hysical dis(s Dis( array controllers

-or maximum availability and performance, the Teradata Database always uses dual redundant dis( array controllers. :avin# two dis( array controllers adds a level of protection in case one controller fails, and provides parallelism for dis( access

RAID 5> E33ect! ') B'+$ S2!teThe number of dis(s per ran( varies from vendor to vendor. The number of

dis(s in a ran( impacts space utili9ation/


G drives per ran( re2uires a BBQ increase in data space. E drives per ran( re2uires a 7EQ increase in data space.

R!,D E also re2uires some performance overhead durin# a write operation, because it has to read the data, then calculate and write the parity

RAID 5> H'( It W'$0!


R!,D E uses a data parity scheme to provide data protection. R#)0> -or the Teradata Database, R!,D E uses the concept of a ran(, which is a set of dis(s wor(in# to#ether. 3ote that the dis(s in a ran( are not directly cabled to each other.

5#$it2> ,n R!,D E, data is handled as follows/


D#t# i! !t$i*e% across a ran( of dis(s (spread across the dis( drives' one se#ment at a time, usin# a binary ;exclusive&or; (5?R' al#orithm. 5#$it2 i! #.!' !t$i*e% across all dis( drives, interleaved with the data. ! ;parity byte; is an extra byte written to a drive in a ran(. The process of writin# data and parity to the dis( drives includes a read&modify&write operation for each new se#ment/ A. Read existin# data on the dis( drives in the ran(. 7. Read existin# parity in that ran( for the correspondin# se#ment. B. Calculate the parity/ existin# data P new data P existin# parity R

new parity. G. 6rite new data. E. 6rite new parity. ,f one of the dis( drives in the ran( becomes unavailable, the system uses the parity byte to calculate the missin# data from the down drive so the system can remain operational. 6ith a ran( of G dis(s, if a dis( fails, #)2 -i!!i), %#t# b.'c0 -#2 be $ec')!t$+cte% usin# the other B dis(s.

,n the example below, data bytes are written to dis( drives A, 7, and B. The system calculates the parity byte usin# the binary 5?R al#orithm and writes it to dis( drive G.

RAID 5> H'( It H#)%.e! <#i.+$e!


,f a dis( fails, the Teradata Database is unaffected and the followin# are each handled in different ways/

Reads 6rites Replacements

Re#%!> Data is reconstructed on&the&fly as users re2uest data usin# the binary 5?R al#orithm. W$ite!> 6hen a drive is down, the system writes to the functional drives, but not to the failed drive. Re*.#ce-e)t!> !fter you replace the failed dis(, the dis( array controller automatically reconstructs the data on the new dis(, usin# (nown data values to calculate the missin# data. 3ormal system performance is affected durin# reconstruction of the failed dis(.

Di!0 A..'c#ti')

The operatin# system, "D$, and the Teradata Database do not reco#ni9e the physical dis( hardware. $ach software component reco#ni9es and interacts with different components of the data stora#e environment/

O*e$#ti), !2!te-> Reco#ni9es a lo#ical unit (*43'. The operatin# system reco#ni9es the *43 as its ;dis(,; and is not aware that it is actually writin# to spaces on multiple dis( drives. This techni2ue enables the use of R!,D technolo#y to provide data availability without affectin# the operatin# system. 5DE> Translates *43s into vdis(s usin# slices (in 43,5' or partitions (in Microsoft 6indows' in con+unction with a Teradata utility/ o 43,5 & pdeconfi# o Microsoft 6indows & "4T ("arallel 4p#rade Tool' Te$#%#t# D#t#b#!e> Reco#ni9es a virtual dis( (vdis('. 4sin# vdis(s instead of direct connections to physical dis( drives enables the use of R!,D technolo#y without affectin# the Teradata Database.

C$e#ti), LU;!
Space on the physical dis( drives is or#ani9ed into *43s. The R!,D level determines how the space is or#ani9ed. -or example, if you are usin# R!,D E, a *43 includes a re#ion of space from each of the physical dis( drives in a ran(.

5%i!0!> U!e$ D#t# S*#ce

!fter a *43 is created, it is divided into partitions.

,n 43,5 systems, a *43 consists of one partition, which is further divided into slices/ o oot slice (a very small slice, ta(in# up only BE sectors' o 4ser slices for storin# data. These user slices are called ;pdis(s; in the Teradata Database.

,n Microsoft 6indows systems, a *43 consists of multiple partitions, there are no slices. Thus, *43s in Microsoft 6indows do not have a boot slice. ,nstead, they contain a ;Master oot Record; that includes information such as the partition layout. The partitions store data and are called ;pdis(s; in the Teradata Database.

,n summary, pdis(s are the user slices (43,5' or partitions (Microsoft 6indows' and are used for stora#e of the tables in a database. ! *43 may have one or more pdis(s.

A!!i,)i), 5%i!0! t' AM5!

The pdis(s (user slices or partitions, dependin# on the operatin# system' are assi#ned to an !M" throu#h the software. 3o cablin# is involved. The combined space on the pdis(s is considered the !M"1s vdis(. !n !M" mana#es only its own vdis( (dis( space assi#ned to it', not the vdis( of any other !M". !ll !M"s can then wor( in parallel, processin# their portion of the data $ach !M" in the system is assi#ned one vdis(. !lthou#h numerous confi#urations are possible, #enerally all pdis(s from a ran( (R!,D E' or mirrored pair (R!,D A' are assi#ned to the same !M" for optimal performance. :owever, an !M" reco#ni9es only the vdis(. The !M" has no control over the physical dis(s or ran(s that compose the vdis(.

<#..b#c0

-allbac( is a Teradata Database feature that protects data in the case of an !M" vproc failure. -allbac( is used to #uarantee the maximum availability of data. =ou can use -allbac( protection on a table&by&table basis. ,t is especially useful in applications that re2uire hi#h availability. -allbac( protects your data by storin# a second copy of each row of a table on an alternate, -allbac( !M" in the same c.+!te$. ,f an !M" fails, the sytem accesses the -allbac( rows to meet re2uests. -allbac( provides !M" fault tolerance at the t#b.e .e e.. 6ith -allbac( tables, if one !M" fails, all table data is still available. 4sers may continue to use -allbac( tables without any loss of available data. Durin# table creation or after a table is created, you may specifiy whether or not the system should (eep a -allbac( copy of the table. ,f -allbac( is specified, it is automatic and transparent. -allbac( #uarantees that the two copies of a row will always be on different !M"s. ,f either !M" fails, the alternate row copy is still available on the other !M".

<#..b#c0> E33ect! ') B'+$ S2!te-allbac( has the followin# effects on your system/ S*#ce ,n addition to the ori#inal database si9e, you need space for/

-allbac(&protected tables (A88Q additional stora#e space for each -allbac(&protected table' R!,D protection of -allbac(&protected tables

5e$3'$-#)ce There will be twice as much input>output (,>?' for ,nserts, 4pdates, and Deletes of rows in -allbac(&protected tables. 3o extra ,>? is re2uired for Select operations, as the -allbac( ,>? is performed in parallel with the "rimary ,>?. -allbac( be)e3it! include/

!dds a level of protection beyond R!,D dis( array protection. Can be specified on a table&by&table basis to protect data re2uirin# the hi#hest availability. "ermits access to data while an !M" is off&line. !utomatically restores data that was chan#ed durin# the !M" off&line period.

The hi#hest level of data protection is <#..b#c0 #)% RAID1.

<#..b#c0> S'3t(#$e T''.!


The followin# Teradata utilities are used to recover a failed !M"/

?*$'c M#)#,e$> $nables you to/ o Display and modify vproc states. o ,nitiate Teradata Database restarts. T#b.e Reb+i.%> Reconstructs tables on an !M" from data on other !M"s in the cluster. Rec' e$2 M#)#,e$> *ets you monitor recovery processin#.

<#..b#c0> H'( It W'$0!


-allbac( is accomplished by #roupin# !M"s into clusters. 6hen a table is defined as -allbac(&protected, the system stores a second copy of each row in the table on the dis( space mana#ed by an alternate ;-allbac( !M"; in the !M" cluster. elow is a cluster of four !M"s. $ach !M" has a combination of "rimary and -allbac( data rows/

5$i-#$2 D#t# R'(> ! record in a database table that is used in normal system operation. <#..b#c0 D#t# R'(> The online bac(up copy of a "rimary data row that is used in the case of an !M" failure.

W$ite> $ach "rimary data row has a duplicate -allbac( data row on another !M". The "rimary and -allbac( data rows are written in parallel.

"R"rimary -R-allbac(

Re#%> 6hen you access data and all !M"s are operational, only "rimary rows are read. M'$e C.+!te$!> The dia#ram below shows how -allbac( data is distributed amon# multiple clusters.

"R"rimary -R-allbac(

<#..b#c0> H'( It H#)%.e! <#i.+$e!


,f two physical dis(s fail in the same R!,D E ran( or R!,D A mirrored pair, the associated !M" vproc fails. -allbac( protects a#ainst the failure of a sin#le !M" in a cluster. ,f two !M"s in a cluster fail, the system halts and must be restarted manually.

Re#%!> 6hen an !M" fails, the system reads all rows it needs from the dis( space of the remainin# !M"s in the cluster. ,f the system needs to find a "rimary row from the failed !M", it reads the -allbac( copy of that row, which is on the dis( space of another !M". W$ite!> ! failed !M" is not available, so the system cannot access any of the failed !M"1s dis( space. Copies of all its rows are available on dis( space for other !M"s in the cluster (either as "rimary or -allbac( rows', and are updated there. Re*.#ce-e)t> Repairin# the failed !M" re2uires replacin# the failed physical dis(s and brin#in# the !M" online. ?nce the !M" is online, the system uses the -allbac( data on the other !M"s to automatically re&create data on the newly replaced dis(s.

F'+$)#.! 3'$ D#t# A #i.#bi.it2

The followin# +ournals are (ept on the system to provide data availability in the event of a component or process failure in the system/

"ermanent .ounals Recovery .ournals

5e$-#)e)t F'+$)#.!
"ermanent .ournals are an optional feature of the Teradata Database to provide an additional level of data protection. =ou specify the use of "ermanent .ournals at the table level. ,t provides full&table recovery to a specific point in time. ,t also can reduce the need for costly and time& consumin# full&table bac(ups. "ermanent .ournals are tables stored on dis( arrays li(e user data is, so they ta(e up additional dis( space on the system. The Teradata Database !dministrator maintains the "ermanent .ournal entries (deletin#, archivin#, and so on.' H'( 5e$-#)e)t F'+$)#.! W'$0 ! Database (ob+ect' has a maximum of one "ermanent .ournal.6hen you create a table with "ermanent .ournalin#, you must specify whether the "ermanent .ournal will capture/

efore ima#es && for rollbac( to ;undo; a set of chan#es to a previous state. !fter ima#es && for rollforward to ;redo; to a specific state.

=ou can also specify that the system (eep both before ima#es and after ima#es. ,n addition, you can choose that the system captures/

Sin#le ima#es (the default' && this means that the "ermanent .ournal table is not -allbac( protected. Dual ima#es && this means that the "ermanent .ournal table is -allbac( protected.

The "ermanent .ournal captures ima#es concurrently with standard table maintenance and 2uery activity. The additional dis( space re2uired may be calculated in advance to ensure ade2uate resources. "eriodically, the Teradata Database !dministrator can dump the "ermanent .ournal to external media, thus reducin# the need for full&table bac(ups since only chan#es are bac(ed up and not the entire database.

Rec' e$2 F'+$)#.!

The Teradata Database uses Recovery .ournals to automatically maintain data inte#rity in the case of/

!n interrupted transaction 8T$#)!ie)t F'+$)#.) !n !M" failure 8D'()/AM5 Rec' e$2 F'+$)#.)

Recovery .ournals are created, maintained, and pur#ed by the system automatically, so no D ! intervention is re2uired. Recovery .ournals are tables stored on dis( arrays li(e user data is, so they ta(e up additional dis( space on the system. T$#)!ie)t F'+$)#. ! Transient .ournal maintains data inte#rity when in&fli#ht transactions are interrupted (due to aborted transactions, system restarts, and so on'. Data is $et+$)e% t' it! '$i,i)#. !t#te after transaction failure. ! Transient .ournal is used durin# normal system operation to (eep ;before ima#es; of chan#ed rows so the data can be restored to its previous state if the transaction is not completed. This happens ') e#c" AM5 as chan#es occur. 6hen a transaction is started, the system #+t'-#tic#..2 stores a copy of all the rows affected by the transaction in the Transient .ournal until the transaction is committed (completed'. ?nce the transaction is complete, the ;before ima#es; are pur#ed. ,n the event of a transaction failure, the ;before ima#es; are reapplied to the affected tables and deleted from the +ournal, and the ;rollbac(; operation is completed. D'()/AM5 Rec' e$2 F'+$)#. ! Down&!M" Recovery .ournal allows c')ti)+e% !2!te- '*e$#ti') while an !M" is down (for example, when two dis( drives fail in a ran( or mirrored pair'. ! Down&!M" Recovery .ournal is used with -allbac(&protected tables to maintain a record of write transactions (updates, creates, inserts, deletes, etc.' on the failed !M" while it is unavailable. ! Down&!M" Recovery .ournal starts automatically after the loss of an !M" in a cluster, !ny chan#es to the data on the failed !M" are lo##ed into the Down&!M" Recovery .ournal by the other !M"s in the cluster. 6hen the failed !M" is brou#ht bac( online, the restart process includes applyin# the chan#es in the Down&!M" Recovery .ournal to the recovered !M". The +ournal is discarded once the process is complete, and the !M" is brou#ht online, fully recovered.

L'c0!

*oc(in# prevents multiple users who are tryin# to access or chan#e the same data simultaneously from violatin# data inte#rity. This concurrency control is implemented by loc(in# the tar#et data. *oc(s are automatically ac2uired durin# the processin# of a re2uest and released when the re2uest is terminated.

Le e.! '3 L'c0i),


*oc(s may be applied at three levels/

D#t#b#!e L'c0!> !pply to all tables and views in the database. T#b.e L'c0!> !pply to all rows in the table. R'( H#!" L'c0!> !pply to a #roup of one or more rows in a table.

T2*e! '3 L'c0!


The four types of loc(s are described below. E6c.+!i e $xclusive loc(s are applied only to databases or tables, never to rows. They are the most restrictive type of loc(. 6ith an exclusive loc(, no other user can access the database or table. $xclusive loc(s are used rarely, most often when structural chan#es are bein# made to the database. !n exclusive loc( on a database or table prevents other users from obtainin# the followin# type of loc(s on the loc(ed data/

$xclusive loc(s 6rite loc(s

Read loc(s !ccess loc(s

W$ite 6rite loc(s enable users to modify data while maintainin# data consistency. 6hile the data has a write loc( on it, other users can obtain an access loc( only. Durin# this time, all other loc(s are held in a 2ueue until the write loc( is released. 6rite loc(s prevent other users from obtainin# the followin# loc(s on the loc(ed data/

$xclusive loc(s 6rite loc(s Read loc(s

Re#% Read loc(s are used to ensure consistency durin# read operations. Several users may hold concurrent read loc(s on the same data, durin# which time no data modification is permitted. Read loc(s prevent other users from obtainin# the followin# loc(s on the loc(ed data/

$xclusive loc(s 6rite loc(s

Acce!! !ccess loc(s can be specified by users unconcerned about data consistency. The use of an access loc( allows for readin# data while modifications are in process. !ccess loc(s are desi#ned for decision support on lar#e tables that are updated only by small, sin#le&row chan#es. !ccess loc(s are sometimes called ;stale read; loc(s, because you may #et ;stale data; that has not been updated. !ccess loc(s prevent other users from obtainin# the followin# loc(s on the loc(ed data/

$xclusive loc(s

7. I;DICES I)%e6e! i) t"e Te$#%#t# D#t#b#!e

,ndexes are used to access rows from a table without havin# to search the whole table. ,n the Teradata Database, an index is made up of one or more columns in a table. ?nce Teradata Database indexes are selected, they are maintained by the system. 6hile other vendors may re2uire data partitionin# or index maintenance, these tas(s are unnecessary with the Teradata Database. ,n the Teradata Database, there are two types of indexes/

5$i-#$2 I)%e6e! define the way the data is distributed. 5$i-#$2 I)%e6e! and Sec')%#$2 I)%e6e! are used to locate the data rows more efficiently than scannin# the whole table.

=ou specify which column or columns are used as the "rimary ,ndex when you create a table. Secondary ,ndex columns can be specified when you create a table or at any time durin# the life of the table.

D#t# Di!t$ib+ti')
6hen the "rimary ,ndex for a table is well chosen, the table rows are evenly distributed across the !M"s for the best performance. The way to #uarantee even distribution of data is by choosin# a "rimary ,ndex whose columns contain uni2ue values. The values do not have to be evenly spaced, or even ;truly random,; they +ust have to be uni2ue to be evenly distributed. The even distribution enables each !M" to be responsible for only a subset of the rows in a table. ,f the data is evenly distributed, the wor( is evenly divided amon# the !M"s so they can wor( in parallel and complete their processin# about the same time. $ven data distribution is critical to performance because it optimi9es the parallel access to the data.

4nevenly distributed data, also called ;s(ewed data,; causes slower response

time as the system waits for the !M"(s' with the most data to finish their processin#. The slowest !M" becomes a bottlenec(. I3 %i!t$ib+ti') i! !0e(e%, an all&!M" operation will ta(e lon#er than if all !M"s were evenly utili9ed.

6hen data is loaded into the Teradata Database/


The system automatically distributes the data across the !M"s based on row content (the "rimary ,ndex values'. The distribution is the same re#ardless of the data volume bein# loaded. ,n other words, lar#e tables are distributed the same way as small tables.

Data is not distributed in any particular order. The benefits of havin# +)'$%e$e% %#t# are that they %')Gt )ee% #)2 -#i)te)e)#ce t' *$e!e$ e '$%e$, and they are i)%e*e)%e)t '3 #)2 1+e$2 bei), !+b-itte%. The automatic, unordered distribution of data eliminates tas(s for a Teradata Database !dministrator that are necessary with some other relational database systems. The D ! does not waste time on labor&intensive data maintenance tas(s.

Te$#%#t# D#t#b#!e M#)#,e#bi.it2


?ne of the (ey benefits of the Teradata Database is its mana#eability. The list of tas(s that Teradata Database !dministrators do not have to do is lon#, and illustrates why the Teradata Database system is so easy to mana#e and maintain compared to other databases. T"i),! Te$#%#t# D#t#b#!e A%-i)i!t$#t'$! ;e e$ H# e t' D' Teradata Database !dministrators )e e$ have to do the followin# tas(s/

Reor#ani9e data or index space. "re&allocate table>index space "hysical partitionin# of dis( space o 6hile it is possible to have partitioned indexes in the Teradata Database, they are not re2uired. "re&prepare data for loadin# (convert, sort, split, etc.'. 4nload>reload data spaces due to expansion. 6ith the Teradata Database, the data can be redistributed on the lar#er confi#uration with no offloadin# and reloadin# re2uired. 6rite or run pro#rams to split input source files into partitions for loadin#.

6ith the Teradata Database, the wor(load for creatin# a table of A88 rows is the same as creatin# a table with A,888,888,888 rows. Teradata Database !dministrator (now that if data doubles, the system can expand easily to accommodate it. The Teradata Database provides hu#e cost advanta#es, especially when it comes to staffin# Database !dministrators. Customers tell us that their D ! staff re2uirements for administerin# non&Teradata databases are three to four times hi#her. H'( Ot"e$ D#t#b#!e! St'$e R'(! #)% M#)#,e D#t# $ven data distribution is not easy for most databases to do. Many databases use $#),e %i!t$ib+ti'), which creates intensive maintenance tas(s for the D !. ?thers may use i)%e6e! as a way to select a small amount of data to return the answer to a 2uery. They use them to avoid accessin# the underlyin# tables if possible. The assumption is that the index will be smaller than the tables so they will ta(e less time to read. ecause they scan indexes and use only part of the data in the index to search for answers to a 2uery, they can carry extra data in the indexes, duplicatin# data in the tables. This way they do not have to read the table at all in some cases. !s you will see, this is not nearly as efficient as the Teradata Database1s method of data stora#e and access. ?ther D !s have to as( themselves 2uestions li(e/

:ow should , partition the data< :ow lar#e should , ma(e the partitions< 6here do , have data contention< :ow are the users accessin# the data<

Many other databases re2uire the D !s to -#)+#..2 *#$titi') the data. They mi#ht place an entire table in a sin#le partition. The disadvanta#e of this approach is it creates a bottlenec( for all 2ueries a#ainst that data. ,t is not the most efficient way to either store or access data rows.

6ith other databases, addin#, updatin# and deletin# data affects manual data distribution schemes thereby reducin# 2uery performance and re2uirin# reor#ani9ation. ! Te$#%#t# D#t#b#!e provides hi#h performance because it distributes the data evenly across the !M"s for parallel processin#. ;' *#$titi')i), '$ %#t# $e/'$,#)i:#ti')! #$e )ee%e%. 6ith the Teradata Datsabase, your D ! can spend more time with users developin# strate#ic applications to beat your competitionO

5$i-#$2 I)%e6
! "rimary ,ndex (",' is the *"2!ic#. -ec"#)i!- 3'$ #!!i,)i), # %#t# $'( t' #) AM5 #)% # .'c#ti') ') t"e AM5! %i!0!. ,t is also used to #cce!! $'(! (it"'+t "# i), t' !e#$c" t"e e)ti$e t#b.e. ! "rimary ,ndex operation is always a ')e/AM5 '*e$#ti'). =ou specify the column(s' that comprise the "rimary ,ndex for a table when the table is created. -or a #iven row, the "rimary ,ndex value is the combination of the data values in the "rimary ,ndex columns.

Choosin# a "rimary ,ndex for a table is perhaps the -'!t c$itic#. %eci!i') a database desi#ner ma(es, because this choice affects both data distribution and access.

5$i-#$2 I)%e6 R+.e!


The followin# rules #overn how "rimary ,ndexes implemented in a Teradata Database must be defined as well as how they function/ R+.e 1> ?ne "rimary ,ndex per table. R+.e 2> ! "rimary ,ndex value can be uni2ue or non&uni2ue. R+.e 3> The "rimary ,ndex value can be 34**. R+.e 4> The "rimary ,ndex value can be modified. R+.e 5> The "rimary ,ndex of a populated table cannot be modified. R+.e 6> ! "rimary ,ndex has a limit of CG columns.

R+.e 1> O)e 5I 5e$ T#b.e


$ach table must have a "rimary ,ndex. The "rimary ,ndex is the only way for the system to determine where a row will be physically stored. 6hile a "rimary ,ndex may be composed of multiple columns, the table can have only one (sin#le& or multiple&column' "rimary ,ndex.

R+.e 2> U)i1+e '$ ;')/U)i1+e 5I


There are two types of "rimary ,ndex/

U)i1+e 5$i-#$2 I)%e6 8U5I) / -or a #iven row, the combination of the

data values in the columns of a 4ni2ue "rimary ,ndex are not duplicated in other rows within the table, so the c'.+-)! +!e% #$e +)i1+e. This uni2ueness ,+#$#)tee! e e) %#t# %i!t$ib+ti') and %i$ect #cce!!. -or example, in the case where old employee numbers are sometimes recycled, the combination of the *ast 3ame and $mployee 3umber columns would be a 4",. 6ith a 4",, there is no duplicate row chec(in# done durin# a load, which ma(es it a faster operation.

;')/U)i1+e 5$i-#$2 I)%e6 8;U5I) / -or a #iven row, the combination of the data values in the columns of a 3on&4ni2ue "rimary ,ndex can be duplicated in other rows within the table. So, t"e$e c#) be -'$e t"#) ')e $'( (it" t"e !#-e 5I #.+e. ! 34", c#) c#+!e !0e(e% %#t#, but in specific instances can still be a #ood "rimary ,ndex choice. -or example, either the Department 3umber column or the :ire Date column mi#ht be a #ood choice for a 34", if you will be accessin# the table most often via these columns.

R+.e 3> 5I C#) Be ;ULL

,f the "rimary ,ndex is uni2ue, you could have one row with a null value. ,f you have multiple rows with a null value, the "rimary ,ndex must be 3on&4ni2ue.

R+.e 4> 5I ?#.+e C#) Be M'%i3ie%


The "rimary ,ndex value can be modified. ,n the table below, if *oretta Ryan chan#es departments, the "rimary ,ndex value for her row chan#es. 6hen you update the index value in a row, the Teradata Database re&hashes it

and redistributes the row to its new location based on its new index value.

R+.e 5> 5I C#))'t Be M'%i3ie%


The "rimary ,ndex of a table cannot be modified. ,n the event that you need a new "rimary ,ndex, you must drop the table, recreate it with the new "rimary ,ndex, and reload the table. The !*T$R T! *$ statement allows you to chan#e the ", of a table if the table is empty.

R+.e 6> 5I H#! 64/C'.+-) Li-it

=ou can desi#nate a "rimary ,ndex that is composed of A to CG columns.

SCL S2)t#6 3'$ C$e#ti), # 5$i-#$2 I)%e6


6hen a table is created, it must have a "rimary ,ndex specified. The "rimary ,ndex is created in the CR$!T$ T! *$ statement in S)*. I3 2'+ %' )'t !*eci32 # 5$i-#$2 I)%e6 in the CR$!T$ T! *$ statement, the system will use the "rimary Iey as the "rimary ,ndex. ,f a "rimary Iey has not

been specified, the system will choose the first uni2ue column. ,f there are no uni2ue columns, the system will use the first column in the table and desi#nate it as a 3on&4ni2ue "rimary ,ndex. C$e#ti), # U)i1+e 5$i-#$2 I)%e6 The S)* syntax to create a 4ni2ue "rimary ,ndex is/
CRE3TE T36LE $ample_1 92ol_a :;T #2ol_b :;T #2ol_2 :;T< =;:>=E -R:M3R8 :;+E? 92ol_b<

C$e#ti), # ;')/U)i1+e 5$i-#$2 I)%e6 The S)* syntax to create a 3on&4ni2ue "rimary ,ndex is/
CRE3TE T36LE $ample_0 92ol_@ :;T #2ol_y :;T #2ol_A :;T< -R:M3R8 :;+E? 92ol_@<

M'%i32i), t"e5$i-#$2 I)%e6 '3 # T#b.e !s mentioned in the "rimary ,ndex rules, you cannot modify the "rimary ,ndex of a table. ,n the event that you need a new "rimary ,ndex, you must drop the table, recreate it with the new "rimary ,ndex, and reload the table.

D#t# Mec"#)ic! '3 5$i-#$2 I)%e6e!


This section describes how "rimary ,ndexes are used in/

Data distribution Data access

Di!t$ib+ti), R'(! t' AM5!

The Teradata Database uses hashin# to $#)%'-.2 #)% e e).2 %i!t$ib+te %#t# #c$'!! #.. AM5! for balanced performance. -or example, in a t(' c.i1+e !2!te-, data is hashed across all !M"s in the system for even datat districution, which results in evenly distributed wor(loads. $ach !M" is desi#ned to hold a portion of the rows of each table. !n !M" is responsible for the stora#e, maintenance, and retrieval of the data under its control. The Teradata Database1s #+t'-#tic "#!" %i!t$ib+ti') eliminates costly data maintenance tas(s. Rows are distributed to !M"s durin# the followin# operations/

*oadin# data into a table (one or more rows, usin# a data loadin# utility' ,nsertin# or updatin# rows (one or more rows, usin# S)*' Chan#in# the system confi#uration (redistribution of data, caused by reconfi#urations to add or delete !M"s'

6hen loadin# data or insertin# rows, the data bein# affected by the load or insert is not available to other users until the transaction is complete. Durin# a reconfi#uration, no data is accessible to users until the system is operational in its new confi#uration. R'( Di!t$ib+ti') 5$'ce!! The process the system uses for insertin# a row on an !M" is described below/

A. The system uses the "rimary ,ndex value in each row as input to the hashin# al#orithm. 7. The output of the hashin# al#orithm is the row hash value (in this example, CGC'. B. The system loo(s at the hash map, which identifies the specific !M"

where the row should be stored (in this example, !M" B'. G. The row is stored on the tar#et !M". o 4",/ The system automatically chec(s for duplicate 4", values when rows are loaded or inserted. ,f a row already exists with the 4", value, the new row is not added. o 34",/ The system does not chec( for duplicate 34", values. ,f a row already exists with the 34", value, the new row is added to the same !M". D+*.ic#te R'( H#!" ?#.+e! ,t is possible for the hashin# al#orithm to end up with the same row hash value for two different rows. There are two ways this could happen/

Duplicate 34", values/ ,f a 3on&4ni2ue "rimary ,ndex is used, duplicate 34", values will produce the same row hash value. :ash synonym/ !lso called a hash collision, this occurs when the hashin# al#orithm calculates an identical row hash value for two different "rimary ,ndex values. :ash synonyms are very rare. 6hen usin# a 4ni2ue "rimary ,ndex, you will still #et uniform data distribution.

To differentiate each row in a table, every row is assi#ned a +)i1+e R'( ID. The Row ,D is the combination of the row hash value and a uni2ueness value.

'ow -D < 'ow .ash ;alue = &ni>ueness ;alue


The +)i1+e)e!! #.+e is used to differentiate between rows whose "rimary ,ndex values #enerate identical row hash values. ,n most cases, only the row hash value portion of the Row ,D is needed to locate the row.

6hen each row is inserted, the !M" adds the row ,D, stored as a prefix of the row. The first row inserted with a particular row hash value is assi#ned a uni2ueness value of A. The uni2ueness value is incremented by A for any additional rows inserted with the same row hash value.

D+*.ic#te R'(!

! duplicate row is a row in a table whose column values are identical to another row in the same table. ,n other words, the entire row is the same, not +ust an index. !lthou#h duplicate rows are )'t #..'(e% in the relational model (because every "rimary Iey must be uni2ue', t"e Te$#%#t# D#t#b#!e %'e! #..'( %+*.ic#te $'(! because the capability is a part of the !3S, standard. ecause duplicate rows are allowed in the Teradata Database, how does it affect the 4",, which, by definition, is uni2ue< 6hen you create a table, the followin# definitions determine whether or not it can contain duplicate rows/

M4*T,S$T tables/ May contain duplicate rows. The Teradata Database will not chec( for duplicate rows. S$T tables/ The default. The Teradata Database chec(s for and does not permit duplicate rows. ,f a S$T table is created with a 4ni2ue "rimary ,ndex, the chec( for duplicate rows is replaced by a chec( for duplicate index values.

Acce!!i), # R'( Wit" # 5$i-#$2 I)%e6


6hen a user submits an S)* re2uest usin# the table name and "rimary ,ndex, the re2uest becomes a ')e/AM5 '*e$#ti'), which is the most direct and efficient way for the system to find a row. The process is explained below.

H#!"i), 5$'ce!! A. 7. B. G. E. The primary index value #oes into the hashin# al#orithm. The output of the hashin# al#orithm is the row hash value. The hash map points to the specific !M" where the row resides. The "$ sends the re2uest directly to the identified !M". The !M" locates the row(s' on its vdis(.

C. The row data is sent over the =3$T to the "$, and the "$ sends the answer set on to the client application.

C"''!i), # U)i1+e '$ ;')/U)i1+e 5$i-#$2 I)%e6


Criteria for choosin# a "rimary ,ndex include/

U)i1+e)e!!> ! U5I is often a #ood choice because it/ o @uarantees even data distribution. o $liminates duplicate row chec(in# durin# a load, which ma(es it a faster operation. ! ;U5I with few duplicate values could provide #ood (if not perfectly uniform' distribution, and mi#ht meet the other criteria better.

=)'() Acce!! 5#t"! / U!e i) #.+e #cce!!> Retrievals, updates, and deletes that specify the "rimary ,ndex are much faster than those that do not. ecause a 5$i-#$2 I)%e6 is a 0)'() #cce!! *#t" to the data, it is best to choose column(s' that will be fre2uently used for access. -or example, the followin# S)* statement would directly access a row based on the e2uality 6:$R$ clause/
SELECT * FROM employee WHERE employee_:+ = 36C4)'&41

! ;U5I -#2 be # bette$ c"'ice if the access is based on another, mostly uni2ue column. -or example, the table may be used by the Mail Room to trac( pac(a#e delivery. ,n that case, a column containin# room numbers or mail stops may not be uni2ue if employees share offices, but a better choice for access.

F'i) 5e$3'$-#)ce / U!e i) j'i) #cce!!> S)* re2uests that use a .?,3 statement perform the best when the +oin is done on a 5$i-#$2 I)%e6. Consider "rimary Iey and -orei#n Iey columns as potential candidates for "rimary ,ndexes. -or example, if the $mployee table and the "ayroll table are related by the $mployee ,D column, then the $mployee ,D column could be a #ood "rimary ,ndex choice for one or both of the tables. -or +oin performance, a ;U5I can be a better choice than a 4",.

;')/ '.#ti.e #.+e!/ *oo( for columns where the values do not chan#e fre2uently. -or example, in an ,nvoicin# table, the outstandin# balance column for all customers probably has few duplicates, but probably chan#es too fre2uently to ma(e a #ood "rimary ,ndex. ! customer ,D, statement number, or other more stable columns may be better choices.

6hen choosin# a "rimary ,ndex, try to find the column(s' that best fit these criteria and the business need. 6hat do you thin( are (ey considerations in choosin# a "rimary ,ndex< (Choose three.'

!. Column(s' containin# uni2ue (or nearly uni2ue' values for uniform distribution. . Column(s' with values in se2uential order for best load and access performance. C. Column(s' fre2uently used in 2ueries to access data or to +oin tables. D. Column(s' with values that are stable (do not chan#e fre2uently', to minimi9e redistribution of table rows. $. Column(s' with many duplicate values for redundancy.

5#$titi')e% 5$i-#$2 I)%e6


The Teradata Database provides an indexin# mechanism called "artitioned "rimary ,ndex ("",'. "", is used to/

,mprove performance for .#$,e t#b.e! when you submit 2ueries that specify a $#),e c')!t$#i)t. Reduce the number of rows to be processed by usin# a new techni2ue called *#$titi') e.i-i)#ti'). ,ncrease performance for incremental data loads, deletes, and data access when wor(in# with lar#e tables with ran#e constraints. ,nstantly %$'* '.% %#t# and rapidly #%% )e( %#t#. !void full&table scans without the overhead of a Seconday ,ndex.

H'( D'e! 55I W'$0&

Data distribution with "", is still based on the 5$i-#$2 I)%e6/ "rimary ,ndex :ash %alue Determines which !M" #ets the row

6ith "",, the ORDER in which the rows are stored on the !M" is affected. 4sin# the traditional method, 3o "artitioned "rimary ,ndex (3"",', the rows are stored in row hash order.
4 AM5! (it" O$%e$! T#b.e De3i)e% (it" ;55I

4sin# "",, the rows are stored first by partition and then by row hash. ,n our example, there are four partitions. 6ithin the partitions, the rows are stored in row hash order.
4 AM5! (it" O$%e$! T#b.e De3i)e% (it" 55I ') OHD#te

D#t# St'$#,e U!i), 55I

To store rows usin# "",/ specify 5#$titi')i), in the CREATE TABLE statement. The 2uery will run throu#h the hashin# al#orithm as normal, and come out with the ase Table ,D, the "artition number(s', the Row :ash, and the "rimary ,ndex values.
D#t# St'$#,e U!i), 55I

Acce!! Wit"'+t # 55I


*et1s say you have a table with Store information by *ocation and did not use a "",. ,f you 2uery on *ocation B on this 3"", table, the entire table will be scanned to find records for *ocation (-ull&Table Scan'.
Acce!! Wit"'+t # 55I CUERB 5LA; S$*$CT N -R?M $mployeeH3"", 6:$R$ *ocationH3umber R BJ !**&!M"s & -ull&Table Scan

Acce!! Wit" # 55I


,n the same example for a "", table, you would partition the table with as many *ocations as you have (or will soon have in the future.' Then if you 2uery on *ocation B, each !M" will use partition elimination and each !M" only has to scan partition B for the 2uery. This 2uery will run much faster than the -ull& Table Scan in the previous example. !lso, if you had a 3"", table with a Secondary ,ndex, there would not be a -ull&Table Scan, but there would be the overhead of usin# a Secondary ,ndex, which is not a factor in a "", table.
Acce!! Wit" # 55I CUERB 5LA; S$*$CT N -R?M $mployee 6:$R$ *ocationH3umber R BJ

!**&!M"s & Sin#le "artition Scan

Sec')%#$2 I)%e6
! Secondary ,ndex (S,' is an alternate data access path. ,t allows you to access the data without havin# to do a full&table scan. Secondary indexes do not affect how rows are distributed amon# the !M"s. =ou can drop and recreate secondary indexes dynamically, as they are needed. 4nli(e "rimary ,ndexes, Secondary ,ndexes are stored in separate subtables that re2uire extra overhead in terms of dis( space, and maintenance which is handled automatically by the system. So, Secondary ,ndexes do re2uire some system resources ,n what instances would it be a #ood idea to define a Secondary ,ndex for a table< (This information will be covered in this module, but here is a preview.' The "rimary ,ndex exists for even data distribution and data access, but a Secondary ,ndex is defined to efficiently #enerate monthly reports based on a different set of columns. The "roduct table is accessed by the retailer (who accesses data based on the retailer1s product code column', and by a vendor (who access the same data based on the vendor1s product code column'. The table already has a 4ni2ue "rimary ,ndex, but a second column must also have uni2ue values. The column is specified as a 4ni2ue Secondary ,ndex (4S,' to enforce uni2ueness on the second column. !ll of the above.

Sec')%#$2 I)%e6 R+.e!


Several rules that #overn how Secondary ,ndexes must be defined and how they function are/ R+.e 1> Secondary ,ndexes are optional. R+.e 2> Secondary ,ndex values can be uni2ue or non&uni2ue. R+.e 3> Secondary ,ndex values can be 34**. R+.e 4> Secondary ,ndex values can be modified. R+.e 5/ Secondary ,ndexes can be chan#ed. R+.e 6> ! Secondary ,ndex has a limit of CG columns.

R+.e 1> O*ti')#. SI


6hile a "rimary ,ndex is re2uired, a Secondary ,ndex is optional. ,f one path to the data is sufficient, no Secondary ,ndex need be defined. =ou can define 8 to B7 Secondary ,ndexes on a table for multiple data access paths. Different #roups of users may want to access the data in various ways. =ou can define a Secondary ,ndex for each heavily used access path.

R+.e 2> U)i1+e '$ ;')/U)i1+e SI


*i(e "rimary ,ndexes, Secondary ,ndexes can be uni2ue or non&uni2ue.

! U)i1+e Sec')%#$2 I)%e6 (4S,' serves two possible purposes/


o

E)3'$ce! +)i1+e)e!! in a column or #roup of columns. The database will chec( 4S,s to see if the values are uni2ue. -or example, if you have chosen different columns for the "rimary Iey and "rimary ,ndex, you can ma(e the "rimary Iey a 4S, to enforce uni2ueness on the "rimary Iey. S*ee%! +* access to a row (data retrieval speed'. !ccessin# a row with a 4S, re2uires one or two !M"s, which is less direct than a 4", (one !M"' access, but -'$e e33icie)t than a full& table scan.

! ;')/U)i1+e Sec')%#$2 I)%e6 (34S,' is usually specified to prevent full&table scans, in which every row of a table is read. The ?ptimi9er determines whether a full&table scan or 34S, access will be more efficient, then pic(s the best method. !ccessin# a row with a 34S, re2uires #.. !M"s.

R+.e 3> SI C#) Be ;ULL

!s with the "rimary ,ndex, the Secondary ,ndex column may contain 34** values.

R+.e 4> SI ?#.+e C#) Be M'%i3ie%

The values in the Secondary ,ndex column may be modified as needed.

R+.e 5> SI C#) Be C"#),e%

Secondary ,ndexes can be chan#ed. Secondary ,ndexes can be created and dropped dynamically as needed. 6hen the index is dropped, the system physically drops the subtable that contained it.

R+.e 6> SI H#! 64/C'.+-) Li-it


=ou can desi#nate a Secondary ,ndex that is composed of A to CG columns. To use the Secondary ,ndex below, the user would specify both ud#et and Mana#er $mployee 3umber.

Ot"e$ Sec')%#$2 I)%e6e!


F'i) I)%e6 .oin indexes have several uses/

Define a pre&+oin table on fre2uently +oined columns (with optional a##re#ation' without denormali9in# the database. Create a full or partial replication of a base table with a primary index on a forei#n (ey column table to facilitate +oins of very lar#e tables by hashin# their rows to the same !M" as the lar#e table. Define a summary table without denormali9in# the database.

=ou can define a +oin index on one or several tables. Sin#le&table +oin index functionality is an extension of the ori#inal intent of +oin indexes, hence the confusin# ad+ective ;+oin; used to describe a sin#le&table +oin index. S*#$!e I)%e6 !ny +oin index, whether simple or a##re#ate, multi&table or sin#le&table, can be sparse. ! sparse +oin index uses a constant expression in the 6:$R$ clause of its definition to narrowly filter its row population. This is (nown as a Sparse .oin ,ndex. H#!" I)%e6 :ash indexes are used for the same purposes as sin#le&table +oin indexes. :ash indexes create a full or partial replication of a base table with a primary index on a forei#n (ey column table to facilitate +oins of very lar#e tables by hashin# them to the same !M". =ou can define a hash index on one table only. :ash indexes are not indexes in the usual sense of the word. They are base tables that cannot be accessed directly by a 2uery. ?#.+e/O$%e$e% ;USI %alue&ordered 34S,s are very efficient for ran#e conditions and conditions with an ine2uality on the secondary index column set. ecause the 34S, rows are sorted by data value, it is possible to search only a portion of the index subtable for a #iven ran#e of (ey values. Thus, the ma+or advanta#e of a value&ordered 34S, is in the performance of ran#e 2ueries.

%alue&ordered 34S,s have the followin# limitations/


The sort (ey is limited to a sin#le numeric column. The sort (ey column cannot exceed four bytes. They count as two indexes a#ainst the total of B7 non&primary indexes you can define on a base or +oin index table.

U!i), Sec')%#$2 I)%e6e!


,n the table below, users will be accessin# data based on the Department 3ame column. The values in that column are uni2ue, so it has been made a 4S, for efficient access. ,n addition, the company wants reports on how many departments each mana#er is responsible for, so the Mana#er $mployee 3umber can also be made a secondary index. ,t has duplicate values, so it is a 34S,.

H'( Sec')%#$2 I)%e6e! A$e St'$e%


Secondary indexes are stored in index subtables. The subtables for 4S,s and 34S,s are distributed differently/

USI> The 4ni2ue Secondary ,ndexes are hash distributed separately from the data rows, based on their 4S, value. (!s you remember, the base table rows are distributed based on the "rimary ,ndex value'. The subtable row may be stored on the same !M" or a different !M" than the base table row, dependin# on the hash value. ;USI> The 3on&4ni2ue Secondary ,ndexes are stored in subtables on the same !M"s as their data rows. This reduces activity on the =3$T and essentially ma(es 34S, 2ueries an !M"&local operation & the processin# for the subtable and base table are done on the same !M". :owever, in all 34S, access re2uests, all !M"s are activated because the non&uni2ue value may be found on multiple !M"s.

D#t# Acce!! Wit"'+t # 5$i-#$2 I)%e6


=ou can submit a re2uest without specifyin# a "rimary ,ndex and still access the data. The followin# access methods do not use a "rimary ,ndex/

4ni2ue Secondary ,ndex (4S,' 3on&4ni2ue Secondary ,ndex (34S,' -ull&Table Scan

Acce!!i), D#t# (it" # USI


6hen a user submits an S)* re2uest usin# the table name and a 4ni2ue Secondary ,ndex, the re2uest becomes a ')e/ '$ t('/AM5 '*e$#ti'), as explained below.

USI Acce!! A. The S)* is submitted, specifyin# a 4S, (in this case, a customer number of EC'. 7. The hashin# al#orithm calculates a row hash value (in this case, C87'. B. The hash map points to the !M" containin# the subtable row correspondin# to the row hash value (in this case, !M" 7'. G. The subtable indicates where the base row resides (in this case, row KKF on !M" G'. E. The messa#e #oes bac( over the =3$T to the !M" with the row and the !M" accesses the data row (in this case, !M" G'. C. The row is sent over the =3$T to the "$, and the "$ sends the answer set on to the client application. !s shown in the example above, accessin# data with a 4S, is typically a two&!M" operation. :owever, it is possible that the subtable row and base table row could end up

bein# stored on the same !M", because both are hashed separately. ,f both were on the same !M", the 4S, re2uest would be a one&!M" operation.

Acce!!i), D#t# (it" # ;USI


6hen a user submits an S)* re2uest usin# the table name and a 3on&4ni2ue Secondary ,ndex, the re2uest becomes an all&!M" operation, as explained below.

;USI Acce!! A. The S)* is submitted, specifyin# a 34S, (in this case, a last name of ;!dams;'. 7. The hashin# al#orithm calculates a row hash value for the 34S, (in this case, ECK'. B. !ll !M"s are activated to find the hash value of the 34S, in their index subtables. The !M"s whose subtables contain that value become the participatin# !M"s in this re2uest (in this case, !M"A and !M"7'. The other !M"s discard the messa#e. G. $ach participatin# !M" locates the row ,Ds (row hash value plus uni2ueness value' of the base rows correspondin# to the hash value (in this case, the base rows correspondin# to hash value ECK are CG8, 777, and AAE'. E. The participatin# !M"s access the base table rows, which are located on

the same !M" as the 34S, subtable (in this case, one row from !M" A and two rows from !M" 7'. C. The 2ualifyin# rows are sent over the =3$T to the "$, and the "$ sends the answer set on to the client application (in this case, three 2ualifyin# rows are returned'.

Acce!!i), D#t# Wit"'+t I)%e6e!


,n the Teradata Database, you can access data on any column, whether that column is an index or not. =ou can as( any 2uestion, of any data, at any time. ,f the re2uest does not use a defined index, the Teradata Database does a 3+../ t#b.e !c#). ! full&table scan is another way to access data without usin# "rimary or Secondary ,ndexes. ,n evaluatin# an S)* re2uest, the ?ptimi9er examines all possible access methods and chooses the one it believes to be the most efficient. 6hile Secondary ,ndexes #enerally provide a more direct access path, in some cases the ?ptimi9er will choose a full&table scan because it is more efficient. ! re2uest could turn into a full&table scan when/

!n S)* re2uest searches on a 34S, column with many duplicates. -or example, if a re2uest usin# last names in a Customer database searched on the very prevalent ;Smith; in the 4nited States, then the ?ptimi9er may choose a full table scan to efficiently find all the many matchin# rows in the result set. !n S)* re2uest uses a non&e2uality 6:$R$ clause on an index column. -or example, if a re2uest searched an $mployee database for all employees whose annual salary i! ,$e#te$ t"#) SA88,888, then a full& table scan would be used, even if the Salary column is an index. ,n this example, a full&table scan can be avoided by usin# an e2uality 6:$R$ clause on a defined index column. !n S)* re2uest uses a ran#e 6:$R$ clause on an index column. -or example, if a re2uest searched an $mployee database for all employees hired between .anuary 788A and .une 788A, then a full&table scan would be used, even if the :ireHDate column is an index.

-or all re2uests, you must specify a value for each column in the index or the Teradata Database will do a full&table scan. ! full&table scan is an all&!M" operation. E e$2 %#t# b.'c0 -+!t be $e#% and e#c" %#t# $'( i! #cce!!e% ').2 ')ce. !s lon# as the choice of "rimary ,ndex has caused the table rows to distribute evenly across all of the !M"s, the parallel processin# of the !M"s

wor(in# simultaneously can accomplish the full&table scan 2uic(ly. H'(e e$4 i3 # 5$i-#$2 I)%e6 c#+!e! !0e(e% %#t# %i!t$ib+ti')4 #.. AM5 '*e$#ti')! (i.. t#0e .'),e$. 6hile full&table scans are impractical and even disallowed on some commercial database systems, the Teradata Database routinely permits ad&hoc 2ueries with full&table scans.

S+--#$2 '3 =e2! #)% I)%e6e!


Some fundamental differences between Ieys and ,ndexes are shown below/
Keys Indexes

! relational modelin# convention used in a .',ic#. data model. 4ni2uely identify a row ("rimary Iey'. $stablish relationships between tables (-orei#n Iey'.

! Teradata Database mechanism used in a *"2!ic#. database desi#n. 4sed for row distribution ("rimary ,ndex'. 4sed for row access ("rimary ,ndex and Secondary ,ndex'.

6hile most commercial database systems use the "rimary Iey as a way to retrieve data, a Teradata Database system does not. ,n a Teradata Database system, you use the "rimary Iey only when desi#nin# a database, as a mechanism for maintainin# referential inte#rity accordin# to relational theory. The Teradata Database itself does not re2uire (eys in order to mana#e the data, and can function fully with no awareness of "rimary Ieys. The Teradata Database1s parallel architecture uses "rimary ,ndexes to distribute and access the data rows. ! "rimary ,ndex is always re2uired when creatin# a Teradata Database table. ! "rimary ,ndex may include the same columns as the "rimary Iey, but does not have to. ,n some cases, you may want the "rimary Iey and "rimary ,ndex to be different. -or example, a credit card account number may be a #ood "rimary Iey, but customers may prefer to use a different (ind of identification to access their accounts.

R+.e! 3'$ =e2! #)% I)%e6e!


! summary of the rules for (eys (in the relational model' and indexes (in the Teradata Database' is shown below.
Rule / 4 3 7 Primary Key "ne 9 &ni>ue ?alues %o %&!!s ;alues should not change *olumn should not change Foreign Key Multiple ,9s &ni>ue or non@ uni>ue %&!!s allowed ;alues may be changed *olumn should not change Primary Index "ne &ni>ue or non@ uni>ue %&!!s allowed ;alues may be changed Aredistributes rowB *olumn cannot be changed Adrop and recreate tableB 17@column limit nDa Secondary Index 0 to 34 +-s &ni>ue or non@ uni>ue %&!!s allowed ;alues may be changed -ndeC may be changed Adrop and recreate indeCB 17@column limit nDa

1 5

%o column limit nDa

%o column limit ,9 must eCist as 9 in the related table

De3i)i), 5$i-#$2 #)% <'$ei,) =e2! i) t"e Te$#%#t# D#t#b#!e


!lthou#h "rimary ,ndexes are re2uired and "rimary Ieys are not, you do have the option to define a "rimary Iey or -orei#n Iey for any table. 6hen you define a "rimary Iey in a Teradata Database table, the RD MS will implement the specified column(s' as an index. ecause a "rimary Iey re2uires uni2ue values, a defined "rimary Iey is implemented as one of the followin#/

U)i1+e 5$i-#$2 I)%e6 (,f the D ! did not specify the "rimary ,ndex in the CR$!T$ T! *$ satement.' U)i1+e Sec')%#$2 I)%e6 (,f columns other than the "rimary ,ndex are chosen'

6hen a "rimary Iey is defined in Teradata S)* and implemented as an index, the rules that #overn that type of index now apply to the "rimary Iey. -or example, in relational theory, there is no limit to the number of columns in a

"rimary Iey. :owever, if you specify a "rimary Iey in Teradata S)*, the CG& column limit for indexes now applies to that "rimary Iey.

THA;= BOU

S-ar putea să vă placă și