De Normalization

Data structure denormalization with Oracle9i
The process of removing redundancy from tables is called data normalization, which attempts to minimize the amount of duplication within the database design. Although normalization was an excellent technique during the 1980s, when disk space was very expensive, the rules have changed in the 21st century, with disk costs dramatically lower. Today, adding redundancy is a very important aspect of designing high-performance Oracle databases. The introduction of redundancy to avoid costly table joins can dramatically improve the speed at which Oracle SQL queries are serviced. It is the challenge of the Oracle design professional to choose the appropriate database design to ensure that SQL queries are serviced as quickly as possible. Instead of removing redundancy, the Oracle designer controls the introduction of redundancy using specific rules. When to add redundancy Essentially, the introduction of redundancy is a function of the size of the redundant column and the frequency with which the column is updated. The ideal candidates for redundant duplication are table columns that meet the following criteria: 1. The introduction of redundancy will eliminate the need to repeatedly join two tables together. 2. The data column is small. 3. The data column is static and rarely updated. Planned data denormalization Oracle was one of the first databases to introduce tools for planned data denormalization. As hard drives became cheaper throughout the 1990s, Oracle recognized that significant performance improvements could be introduced by deliberately introducing redundant data items into the Oracle table and index structures. Snapshots One of Oracle's first forays into data redundancy was the introduction of Oracle snapshots. With Oracle's advanced replication option, copies of tables could be made on remote database servers and refreshed at specific intervals. This redundant duplication of Oracle tables across widely dispersed geographical areas ensured that users were able to retrieve information quickly on a local server without the need to travel across a large network. VARRAYs Oracle also allows the introduction of redundant information using VARRAY table structures. In a VARRAY table, Oracle provides for the introduction of non-first-normal form data structures by inserting repeating groups of values directly within a single Oracle table row. This avoids the overhead of joining the base table into a subordinate table to retrieve the solution set.
Lets look at a simple VARRAY table example. Assume we have a Student table for a university. One of the table requirements is storing student SAT and ACT scores. The students may take the test only three times, and the test scores are a very small repeating group that repeats for only a specific number of values. Using traditional database design structures, we would be required to create a Test_scores table and join the Student table with it to see both the student data and the repeating values of their SAT and ACT test scores. Using Oracle8 VARRAY tables, you can create a table structure where repeating groups are automatically stored within the Oracle table itself (Figure A). Figure A
An Oracle8 VARRAY table Frequently updated large data columns can be very cumbersome for Oracle VARRAY tables. In the example of our test scores, the VARRAY tables allow Oracle to retrieve both the student and test information within a single disk I/O operation. Another important VARRAY table characteristic is that the repeating information may be stored in presorted order. Upon retrieval, the information will always be displayed in sorted order. This alleviates the additional overhead of re-sorting the test scores every time a student row is retrieved. Oracle Materialized views After the popularity of snapshot replication, Oracle recognized that complex queries could be prebuilt to provide end users with the illusion of instantaneous response time. The precompilation process allowed five-way table joins, complex presummarization of aggregation operations, and a host of other timeconsuming and I/O expensive SQL queries that can be precalculated. Basically, materialized views boil down to a build it now or build it later philosophy. Using this philosophy, you can preexecute Oracle queries in anticipation of the end users query, thereby allowing the end user to retrieve complex information on a single disk I/O. However, simply prebuilding complex queries is only a portion of the answer. A mechanism had to be created to make Oracle SQL aware of a query that had been prebuilt and to tell it to use the precreated summary. Oracle called this exciting new feature query rewrite. Using the Oracle parameter query_rewrite_enabled, Oracle automatically checks for materialized views whenever it notices a match between an incoming SQL statement and a prebuilt aggregate. If Oracle notices that the information has been presummarized, the cost-based optimizer goes directly to the presummarized information, thus saving thousands of expensive disk I/Os. For data warehouse applications and Oracle systems requiring complex SQL queries, materialized views can be the difference between subsecond response times and queries that may run for 30 minutes.
Here is a simple example of a materialized view: create materialized view sum_sales build immediate refresh complete enable query rewrite as select product_nbr,sum(sales) sum_sales from sales; When any query summarizes sales, that query will be dynamically rewritten to reference the summary table: alter session set query_rewrite_enabled=true; set autotrace on select sum(sales) from sales; In the execution plan for this query, we see that the sum_sales table is referenced: Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=83) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'SUM_SALES' (Cost=1 Card=423 Bytes=5342) Materialized views, being redundant, need to be updated when their base tables change. Just as a snapshot needs to specify a refresh interval, an Oracle materialized view has to specify the rate at which the materialized view is recreated when any of the information that constitutes the materialized view has changed. Oracle offers a wealth of options for the frequency of rebuilding the views, ranging from instantaneous rebuilds (commit refresh) to more sophisticated refresh intervals that can be done according to the volatility of the base data. Conclusion Because disk prices are falling by orders of magnitude every year, Oracle professionals are very conscious of introducing redundancy into their Oracle data models to improve performance. A thirdnormal-form database design in the 21st century may be very efficient from a disk-storage point of view, but it will perform very poorly because everything has to be built from its atomic pieces every time the queries are executed. Using Oracle's denormalization tools such as replication, VARRAY tables, and materialized views, the Oracle database designer can deliberately introduce redundancy into the data model, thereby avoiding expensive table joins and large-table full-table scan operations that are required to recompute the information at runtime. BENEFITS OF DENORMALIZED RELATIONAL DATABASE TABLES
ABSTRACT Heuristics for denormalizing relational database tables are examined with an objective of improving processing performance for data insertions, deletions and selection. Client-server applications necessitate consideration of denormalized database schemas as a means of achieving good system performance where the client-server interface is graphical (GUI) and the network capacity is limited by the network channel. INTRODUCTION Relational database table design efforts encompass both the conceptual and physical modeling levels of the three-schema architecture. Conceptual diagrams, either entity-relationship or object-oriented, are a precursor to designing relational table structures. CASE tools will generate relational database tables at least to the third normal form (3NF) based on conceptual models, but have not advanced to the point that they produce table structures that guarantee acceptable levels of system processing performance. As firms move away from the mainframe toward cheaper client-server platforms, managers face a different set of issues in the development of information systems. One critical issue is the need to continue to provide a satisfactory level of system performance, usually reflected by system response time, for mission-critical, on-line, transaction processing systems. A fully normalized database schema can fail to provide adequate system response time due to excessive table join operations. It is difficult to find formal guidance in the literature that outlines approaches to denormalizing a database schema, also termed usage analysis. This paper focuses on identifying heuristics or rules of thumb that designers may use when designing a relational schema. Denormalization must balance the need for good system response time with the need to maintain data, while avoiding the various anomalies or problems associated with denormalized table structures. Denormalization goes hand-in-hand with the detailed analysis of critical transactions through view analysis. View analysis must include the specification of primary and secondary access paths for tables that comprise end-user views of the database. Additionally, denormalization should only be attempted under conditions that allow designers to collect detailed performance measurements for comparison with system performance requirements [1]. Further, designers should only denormalize during the physical design phase and never during conceptual modeling. Relational database theory provides guidelines for achieving an idealized representation of data and data relationships. Conversely, client-server systems require physical database deign measures that optimize performance for specific target systems under less than ideal conditions [1]. The final database schema must be adjusted for characteristics of the environment such as hardware, software, and other constraints. We recommend following three general guidelines to denormalization [1]. First, perform a detailed view analysis in order to identify situations where an excessive number of table joins appears to be required to produce a specific end-user view. While no hard rule exists for defining "excessive," any view requiring more than three joins should be considered as a candidate for denormalization. Beyond this, system performance testing by simulating the production environment is necessary to prove the need for denormalization.
Second, the designer should attempt to reduce the number of foreign keys in order to reduce index maintenance during insertions and deletions. Reducing foreign keys is closely related to reducing the number of relational tables. Third, the ease of data maintenance provided by normalized table structures must also be provided by the denormalized schema. Thus, a satisfactory approach would not require excessive programming code (triggers) to maintain data integrity and consistency. DENORMALIZING FIRST NORMAL FORM (1NF) TO UNNORMALIZED TABLES There are more published formal guidelines for denormalizing 1NF than for any other normal form. This stems from the fact that repeating fields occur fairly often in business information systems; therefore, designers and relational database theoreticians have been forced to come to grips with the issue. It is helpful to examine an example problem for those new to the concept of normalization and denormalization. Consider a situation where a customer has several telephone numbers that must be tracked. The third normal form (3NF) solution and the denormalized table structure is given below with telephone number (Phone1, Phone2, Phone3, ) as a repeating field: 3NF: CUSTOMER (CustomerId, CustomerName,...) CUST_PHONE (CustomerId, Phone) Denormalized: CUSTOMER (CustomerId, CustomerName, Phone1, Phone2, Phone3, ...) Which approach is superior? The answer is that it depends on the processing and coding steps needed to store and retrieve the data in these tables. In the denormalized solution, code must be written (or a screen designed) to enable any new telephone number to be stored in any of the three Phone fields. Clearly this is not a difficult task and can be easily accomplished with modern CASE tools; still, this denormalized solution is usually only appropriate if one can guarantee that a customer will have a limited finite number of telephone numbers, or if the firm makes a management decision not store more than "X" number of telephone numbers. Both solutions provide good data retrieval of telephone numbers as indicated by the following SQL statements. The denormalized solution requires a simpler, smaller index structure for the CustomerId key field. The normalized solution would require at least two indices for the Cust_Phone table - one on the composite key to ensure uniqueness integrity, and one on the CustomerId field to provide a fast primary access path to records. Select * from Cust_Phone where CustomerId = '3344'; Select * from Customer where CustomerId = '3344';
If the customer name is also required in the query, then the denormalized solution is superior as no table joins are involved. Select * from Cust_Phone CP, Customer C where CP.CustomerId = '3344' and CP.CustomerId = C.CustomerId; The effort required to extract telephone numbers for a specific customer based on the customer name is also more difficult for the 3NF solution as a table join is required. Select * from Cust_Phone CP, Customer C where C.CustomerName = 'Tom Thumb' and CP.CustomerId = C.CustomerId; If the telephone number fields represent different types of telephones, for example, a voice line, a dedicated fax line, and a dedicated modem line, then the appropriate table structures are: 3NF: CUSTOMER (CustomerId, CustomerName, ... ) CUST_PHONE (CustomerId, Phone) PHONE (Phone, PhoneType) Denormalized: CUSTOMER (CustomerId, CustomerName, VoicePhone, FaxPhone, ModemPhone, ...) Again the denormalized solution is simpler for data storage and retrieval and, as before, the only factor favoring a 3NF solution is the number of potential telephone numbers that an individual customer may have. The Phone table would be at least 30 to 50 percent the size of the Cust_Phone table. The decision to denormalize is most crucial when the Customer table is large in a client-server environment; for example, one hundred thousand customers, each having two or three telephone numbers. The join of Customer, Cust_phone, and Phone may be prohibitive in terms of processing efficiency. DENORMALIZING TO SECOND NORMAL FORM (2NF) TO 1NF The well-known order entry modeling problem involving Customers, Orders, and Items provides a realistic situation where denormalization is possible without significant data maintenance anomalies. Consider the following 3NF table structures. 3NF: CUSTOMER (CustomerId, CustomerName,...) ORDER (OrderId, OrderDate, DeliveryDate, Amount, CustomerId) ORDERLINE (OrderId, ItemId, QtyOrdered, OrderPrice) ITEM (ItemId, ItemDescription, CurrentPrice) The many-to-many relationship between the Order and Item entities represents a business transaction that occurs repeatedly over time. Further, such transactions, once complete, are essentially "written in stone" since the transaction records would never be deleted. Even if an order is partially or fully deleted at the request of the customer, other tables not shown above will be used to record the deletion of an item or an order as a separate business transaction for archive purposes. The OrderPrice field in the Orderline table represents the storage of data that is time-sensitive. The OrderPrice is stored in the Orderline table because this monetary value may differ from the value in theCurrentPrice field of the Item table since prices may change over time. While the OrderPrice field is functionally determined by the combination of ItemId and OrderDate, storing the OrderDate field in the Orderline table is not necessary, since the OrderPrice is recorded at the time that the transaction takes place. Therefore, while Orderline is not technically in 3NF, most designers would consider the above solution 3NF for all practical purposes. A true 3NF alternative solution would store price data in an ItemPriceHistory table, but such a solution is not central to the discussion of denormalization.
In the proposed denormalized 1NF solution shown below (the Customer and Order tables remain unchanged) the Item.ItemDescription field is duplicated in the Orderline table. This solution violates second normal form (2NF) since the ItemDescription field is fully determined by ItemId and is not determined by the full key (OrderId + ItemId). Again, this denormalized solution must be evaluated for the potential effect on data storage and retrieval. Denormalized 1NF: ORDERLINE (OrderId, ItemId, QtyOrdered, OrderPrice, ItemDescription) ITEM (ItemId, ItemDescription, CurrentPrice) Orderline records for a given order are added to the appropriate tables at the time that an order is made. Since the OrderPrice for a given item is retrieved from the CurrentPrice field of the Item table at the time that the sale takes place, the additional processing required to retrieve and store the ItemDescription value in the Orderline table is negligible. The additional expense of storing Orderline.ItemDescription must be weighed against the processing required to produce various end-user views of these data tables. Consider the production of a printed invoice. The 3NF solution requires joining four tables to produce an invoice view of the data. The denormalized 1NF solution requires only the Customer, Order, and Orderline tables. Clearly, theOrder table will be accessed by an indexed search based on the OrderId field. Similarly, the retrieval of records from the Orderline and Item tables may also be via indexes; still, the savings in processing time may offset the cost of extra storage. An additional issue concerns the storage of consistent data values for the ItemDescription field in the Orderline and Item tables. Suppose, for example, an item description of a record in the Item table is changed from "Table" to "Table, Mahogany." Is there a need to also update corresponding records in the denormalized Orderline table? Clearly an argument can be made that these kinds of data maintenance transactions are unnecessary since the maintenance of the non-key ItemDescription data field in the Orderline table is not critical to processing the order. DENORMALIZING THIRD NORMAL FORM (3NF) TO 2NF An example for denormalizing from a 3NF to a 2NF solution can be found by extending the above example to include data for salespersons. The relationship between the Salesperson and Order entities is one-to-many (many orders can be processed by a single salesperson, but an order is normally associated with one and only one salesperson). The 3NF solution for the Salesperson and Order tables is given below, along with a denormalized 2NF solution. 3NF: SALESPERSON (SalespersonId, SalespersonName,...) ORDER (OrderId, OrderDate, DeliveryDate, Amount, SalespersonId) Denormalized 2NF: SALESPERSON (SalespersonId, SalespersonName,...) ORDER (OrderId, OrderDate, DeliveryDate, Amount, SalespersonId, SalespersonName) Note that the SalespersonId in the Order table is a foreign key linking the Order and Salesperson tables. Denormalizing the table structures by duplicating the SalespersonName in the Order table results in a solution that is 2NF because the nonkey SalespersonName field is determined by the non-key Salesperson field. What is the effect of this denormalized solution? By using view analysis for a typical printed invoice or order form, we may discover that most end-user views require printing of a salesperson's name, not their identification number on order invoices and order forms. Thus, the 3NF solution requires joining the Salesperson and Order two tables, in addition to the Customer and Orderline tables from the denormalized 1NF solution given in the preceding section, in order to meet processing requirements. As before, a comparison of the 3NF solution and the denormalized 2NF solution reveals that the
salesperson name could easily be recorded to the denormalized Order table at the time that the order transaction takes place. Once a sales transaction takes place, the probability of changing the salesperson credited with making the sales is very unlikely. One should also question the need to maintain the consistency of data between the Order and Salesperson tables. In this situation, we find that the requirement to support name changes for salespeople is very small, and only occurs, for the most part, when a salesperson changes names due to marriage. Furthermore, the necessity to update the Order table in such a situation is a decision for management to make. An entirely conceivable notion is that such data maintenance activities may be ignored, since the important issue for salesperson's usually revolves around whether or not they get paid their commission, and the denormalized 2NF solution supports payroll activities as well as the production of standard end-user views of the database. DENORMALIZING HIGHER NORMAL FORMS The concept of denormalizing also applies to the higher order normal forms (fourth normal form - 4NF or fifth normal form - 5NF), but occurrences of the application of denormalization in these situations are rare. Recall that denormalization is used to improve processing efficiency, but should not be used where there is the risk of incurring excessive data maintenance problems. Denormalizing from 4NF to a lower normal form would almost always lead to excessive data maintenance problems. By definition, we normalize to 4NF to avoid the problem of having to add multiple records to a single table as a result of a single transaction. Data anomalies associated with 4NF violations only tend to arise when sets of binary relationships between three entities have been incorrectly modeled as a ternary relationship. The resulting 4NF solution, when modeled in the form of an E-R diagram usually results in two binary one-tomany relationships. If denormalization offers the promise of improving performance among the entities that are paired in these binary relationships, then the guidance given earlier under each of the individual 1NF, 2NF, and 3NF sections applies; thus, denormalization with 4NF would not require new heuristics. While denormalization may also be used in 5NF modeling situations, the tables that result from the application of 5NF principles are rarely candidates for denormalization. This is because the number of tables required for data storage have already been minimized. In essence, the 5NF modeling problem is the mirror-image of the 4NF problem. A 5NF anomaly only arises when a database designer has modeled what should be a ternary relationship as a set of two or more binary relationships. SUMMARY This article has described situations where denormalization can lead to improved processing efficiency. The objective is to improve system response time without incurring a prohibitive amount of additional data maintenance requirements. This is especially important for client-server systems. Denormalization requires thorough system testing to prove the effect that denormalized table structures have on processing efficiency. Furthermore, unseen ad hoc data queries may be adversely affected by denormalized table structures. Denormalization must be accomplished in conjunction with a detailed analysis of the tables required to support various end-user views of the database. This analysis must include the identification of primary and secondary access paths to data. Additional consideration may be given table partitioning that goes beyond the issues that surround table normalization. Horizontal table partitioning may improve performance by minimizing the number of rows involved in table joins. Vertical partitioning may improve performance by minimizing the size of rows involved in table joints. A detailed discussion of table partititioning my be found elsewhere [1].
Oracle Varray and Nested table performance Oracle Tips by Burleson Consulting
April, 24, 2002, updated 1 November 2008
Oracle offers a variety of data structures to help create robust database systems. Oracle supports the full use of binary large objects (BLOB), nested tables, nonfirst-normal-form table structures (VARRAY tables), and object-oriented table structures. It even treats flat data files as if they were tables within the Oracle database. For a full treatment of the performance of nested tables vs. varray tables, see my book "Oracle Tuning: The Definitive Reference". It is a challenge to many Oracle design professionals to know when to use these Oracle data model extensions. This article provides a brief review of advanced Oracle topics and how they are used to design high-performance Oracle databases. The ability of Oracle to support object types (sometimes called user-defined datatypes) has profound implications for Oracle design and implementation. User-defined datatypes will enable the database designer to:
Create aggregate datatypes - Aggregate datatypes are datatypes that contain other datatypes. For example, you could create a type called FULL_ADDRESS that contains all of the subfields necessary for a complete mailing address. Nest user-defined datatypes - Datatypes can be placed within other user-defined datatypes to create data structures that can be easily reused within Oracle tables and PL/SQL. For example, you could create a datatype called CUSTOMER that contains a datatype called CUSTOMER_DEMOGRAPHICS, which in turn contains a datatype called JOB_HISTORY, and so on.
One of the new user-defined data types in the Oracle object-relational model is a "pointer" data type. Essentially, a pointer is a unique reference to a row in a relational table. The ability to store these row IDs inside a relational table extends the traditional relational model and enhances the ability of an object-relational database to establish relationships between tables. The new abilities of pointer data types include:
Referencing "sets" of related rows in other tables - It is now possible to violate first normal form and have a cell in a table that contains a pointer to repeating table values. For example, an EMPLOYEE table could contain a pointer called JOB_HISTORY_SET, which in turn could contain pointers to all of the relevant rows in a JOB_HISTORY table. This technique also lets you prebuild aggregate objects, such that you could preassemble all of the specific rows that comprise the aggregate table. Allow "pointers" to non-database objects in a flat file - For example, a table cell could contain a pointer to a flat file that contains a non-database object such as a picture in .gif or .jpeg format. The ability to establish one-to-many and many-to-many data relationships without relational foreign keys - This would alleviate the need for relational
JOIN operations, because table columns could contain references to rows in other tables. By dereferencing these pointers, you could retrieve rows from other tables without ever using the time-consuming SQL JOIN operator.
Data model extension capabilities The Oracle table data model extensions provide the following capabilities:
Modeling real-world objectsIt is no longer required for the relational database designer to model complex objects in their smallest components and rebuild them at run-time. Using Oracle's object-oriented constructs, real-world objects can have a concrete existence just like c++ objects. Oracle can use arrays of pointers to represent these complex objects. Removing unnecessary table joinsThis is achieved by deliberately introducing redundancy into the data model. Queries that required complex and time-consuming table joins can now be retrieved in a single disk I/O operation. Coupling of data and behaviorOne of the important constructs of object orientation is the tight coupling of object behaviors with the objects themselves. In Oracle, a member method can be created upon the Oracle object, and all processes that manipulate the object are encapsulated inside Oracle's data dictionary. This functionality has huge benefits for the development of all Oracle systems. Prior to the introduction of member methods, each Oracle developer was essentially a custom craftsman writing custom SQL to access Oracle information. By using member methods, all interfaces to the Oracle database are performed using pre-tested methods with known interfaces. This way, the Oracle developers role changes from custom craftsman to more of an assembly-line coder. You simply choose from a list of prewritten member methods to access Oracle information.
Object orientation and Oracle Oracle offers numerous choices for the introduction of object-oriented data model constructs into relational database design. Oracle offers the ability to dereference table row pointers, abstract data types, and limited polymorphism and inheritance support. In Oracle, data model constructs used in C++ or Smalltalk programming can be translated directly into an Oracle structure. In addition, Oracle supports abstract data typing whereby you create customized data types with the strong typing inherent in any of the standard Oracle data types like NUMBER, CHAR, VARCHAR, and DATE. For example, below is an Oracle table created with abstract data types and a nested table. CREATE OR REPLACE TYPE employee AS OBJECT ( last_name varchar(40), full_address full_mailing_address_type, prior_employers prior_employer_name_arr ); create table emp of employee; Next, we use extensions to standard Oracle SQL to update these abstract data types. insert into emp values ( 'Burleson', full_mailing_address_type('7474 Airplane Ave.','Rocky Ford','NC','27445'), prior_employer_name_arr( employer_name('IBM'), employer_name('ATT'), employer_name('CNN') ) );
Oracle nested tables Using the Oracle nested table structure, subordinate data items can be directly linked to the base table by using Oracle's newest construct:, the object ID (OID). One of the remarkable extensions of Oracle is the ability to reference Oracle objects directly by using pointers as opposed joining relational. Proponents of the object-oriented database model criticize standard relational databases because of the requirement to reassemble an object every time it is used. (They make statements such as It doesnt make sense to dismantle your car every time you are done driving it and rebuild the car each time you want to drive it.)
Nested and varray tables use internal pointers Oracle has moved toward allowing complex objects to have a concrete existence. In order to support the concrete existence of complex objects, Oracle introduced the ability to build arrays of pointers with row references directly to Oracle tables. Just as a C++ program can use the char** data structure to have a pointer to an array of pointers, Oracle allows similar constructs whereby the components of the complex objects reside in real tables with pointers to the subordinate objects. At runtime, Oracle simply needs to dereference the pointers, and the complex object can be quickly rebuilt from its component pieces. A nested table example In this example, a nested table is used to represent a repeating group for previous addresses. Whereas a person is likely to have a small number of previous employers, most people have a larger number of previous addresses. First, we create a type using our full_mailing_address_type: create type prev_addrs as object (prior_address full_mailing_address_type ); Next, we create the nested object:
create type nested_address as table of prev_addrs; Now, we create the parent table with the nested table. create table emp1 ( last_name char(40), current_address full_mailing_address_type, prev_address nested_address ) nested table prev_address store as nested_prev_address return as locator; A nested table appears as a part of the master table. Internally, it is a separate table. The store as clause allows the DBA to give the nested table a specific name:
The nested_prev_address subordinate table can be indexed just like any other Oracle table. Also, notice the use of the return as locator SQL syntax. In many cases, returning the entire nested table at query time can be time-consuming. The locator enables Oracle to use the pointer structures to dereference pointers to the location of the nested rows. A pointer dereference happens when you take a pointer to an object and ask the program to display the data the pointer is pointing to. In other words, if you have a pointer to a customer row, you can dereference the OID and see the data for that customer. The link to the nested tables uses an Oracle OID instead of a traditional foreign key value. A varray table example Before Oracle8, we would need to represent repeating groups in a table in a very clumsy and non-elegant fashion. create table employee ( full_name full_mailing_address_type, last_name varchar(40), previous_employer_one varchar(40), previous_employer_two varchar(40), previous_employer_three varchar(40) );
We begin by creating a Oracle type to hold the repeating group of prior employers. CREATE OR REPLACE TYPE employer_name AS OBJECT (e_name varchar(40)) ; CREATE OR REPLACE TYPE prior_employer_name_arr AS VARRAY(10) OF employer_name; Next, we create the employee type, embedding our varray of prior employers. CREATE OR REPLACE TYPE employee AS OBJECT ( last_name varchar(40), full_address full_mailing_address_type, prior_employers prior_employer_name_arr ); Next, we create the emp table, using the employee type. SQL> create table emp of employee; Table Created. Now we insert rows into the object table. Note the use of the full_mailing_address_type reference for the ADT and the specification of the repeating groups of previous employers. insert into emp values ( 'Burleson', full_mailing_address_type('7474 Airplane Ave.','Rocky Ford','NC','27445'), prior_employer_name_arr( employer_name('IBM'), employer_name('ATT'), employer_name('CNN') ) ); insert into emp values ( 'Lavender', full_mailing_address_type('7474 Bearpond Ave.','Big Lick','NC','17545'), prior_employer_name_arr( employer_name('Oracle'),
employer_name('Sybase'), employer_name('Computer Associates') ) ); Next, we perform the select SQL. Note that we can select all of the repeating groups with a single reference to the prior_employers column. select p.prior_employers from emp p where p.last_name = 'Burleson'; PRIOR_EMPLOYERS(E_NAME) ----------------------------------------------------------------PRIOR_EMPLOYER_NAME_ARR(EMPLOYER_NAME('IBM'), EMPLOYER_NAME('ATT'), EMPLOYER_NAM E('CNN')) This output can be difficult to interpret because of the nature of the repeating groups. In the example below, we use a new BIF called table that will flatten-out the repeating groups, re-displaying the information. column l_name heading "Last Name" format a20;
SELECT emp.last_name l_name, prior_emps.* FROM emp emp, table(p.prior_employers) prior_emps WHERE p.last_name = 'Burleson'; Here we see a flattened output from the query, and the single information is replicated onto each table row. Last Name E_NAME -------------------- ----------------------------------Burleson Burleson ATT Burleson CNN IBM
Performance of Oracle nested and varray tables To fully understand Oracle advanced design, we need to take a look at the SQL performance ramifications of using object extensions. Overall, the performance of
Abstract Data Type (ADT) tables is the same as any other Oracle table, but we do see significant performance differences when implementing varray tables and nested tables:
ADT tablesCreating user-defined datatypes simplifies Oracle database design. Doing ADTs also provides uniform data definitions for common data items. There is no downside for SQL performance, and the only downside for SQL syntax is the requirement that all references to ADTs be fully qualified. Nested tablesNested tables have the advantage of being indexed, and the repeating groups are separated into another table so as not to degrade the performance of full-table scans. Nested tables allow for an infinite number of repeating groups. However, it sometimes takes longer to dereference the OID to access the nested table entries as opposed to ordinary SQL tables join operations. Most Oracle experts see no compelling benefit of using nested tables over traditional table joins. Varray tablesVarray tables have the benefit of avoiding costly SQL joins, and they can maintain the order of the varray items based upon the sequence when they were stored. However, the longer row length of varray tables causes fulltable scans to run longer, and the items inside the varray cannot be indexed. More importantly, varrays cannot be used when the number of repeating items is unknown or very large. Varray tables are also problematic because the nonstandard SQL is very clumsy and hard to use: SQL> SELECT * FROM person; NAME DOB ------------------------------ --------ADDRESS_V ---------------------------------------------ADDRESS_N ------------------------------------------------------Jones 01-JAN-60 ADDRESSES_V('Line 1', 'Line 2', 'Line 3') ADDRESSES_N('Line 1', 'Line 2', 'Line 3')
Conclusion The evolution of Oracle into an object-relational database has provided a huge number of extensions to the relational database model. It is the challenge of all Oracle design professionals to use these Oracle extensions to improve the performance and maintainability of Oracle databases. Relational professionals can no longer stay content with a basic understanding of relational algebra. The successful Oracle designers must master all object-oriented concepts, including abstract data typing, nested tables, array tables, and those unique data structure extensions that make Oracle clearly one of the fastest and most robust databases in the marketplace

De Normalization

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

De Normalization

Încărcat de

Drepturi de autor:

Formate disponibile

Data structure denormalization with Oracle9i

April, 24, 2002, updated 1 November 2008

S-ar putea să vă placă și