Sunteți pe pagina 1din 29

Design of files and use of auxiliary storage devices

Introduction
Information systems in business are file- and database-oriented. Data are accumulated into files that are processed and maintained by the system. Databases, draw on data accumulated in transaction and other kinds of files and are designed to share data for different applications. The system analyst is responsible for designing the files, determining their contents, and selecting a method for organizing the data. At same time, if the proposed applications will draw on database resources, the analyst must develop the means of interacting with the database.

Basic File Terminology


Data Item: Individual elements of data are called data items. Each data item is identified by name and has a specific value associated with it. The association of a value with a field creates one instance of data item. Data items can comprise subitems or subfields. E.g Date is often used as a single data item, consisting of three subitems of month, day and year.

Basic File Terminology


Record: The complete set of related data pertaining to an entry, is a record. Each field has a defined length and type. When the number and size of data item in a record are constant for every record, the record is called a fixed-length record. Variable-length records are less common in most business applications than fixed-length designs.

Basic File Terminology


Record Key: To distinguish one specific record from another, systems analysts select one data item in the record that is likely to be unique in all records of a file and use it for identification purposes. This item, called the record key, key attribute, or simply key, is already part of the record, not additional data added to it just for the purpose of identification.

Basic File Terminology


Entity: An entity is any person, place, thing or event of interest to the organization and about which data are captured, stored, or processed. Patients and tests are entities of interest in hospitals, while banking entities include customers and checks.

Basic File Terminology


File: A file is a collection of related records. Each record in a file is included because it pertains to the same entity. A file of checks The number of records in the file determines the file size. If each record is fixed-length and uses 200 characters of storage, the file uses 6 times 200 characters of storage.

Databases: A database is an integrated collection of data stored in different types of records, and in a way that makes them accessible for multiple applications. The interrelation of the records derives from the relationship in the data, not from their physical storage location. Records for different entities are typically stored in a database (whereas files store record for single entity). In a university database, for example, records for students, courses, and faculty are interrelated in the same database. Databases donot eliminate the need for files in an information system. Different types of files are still needed to capture the details of events and business activities, to prepare reports, or to store data that are not in the database.

Basic File Terminology

Purpose: Data structure diagrams are graphic tools that show the logical data structure requirements of an information system application. They serve four purposes: 1. Verify information requirements 2. Describe data associated with entities. 3. Show the relationship between entities. 4. Communicate the data requirements to a file designer or database administrator. Each data item either identifies the entities or describe an important attribute. Data structure diagrams organize the data.

Data Structure Diagrams

Notation: A common notation is used in preparing data structure diagrams. Entities are represented by rectangles, with entity name at the top and a list of attributes (data items or fields) describing the entity. Each entity is identifiable by a key attribute, which by convention is the first data item listed.

Data Structure Diagrams

Use in file design: The use of data structure diagrams requires the analyst to address the important questions about the entity being described: What data items will uniquely identify an occurrence of the entity? By what means will information about the entity be accessed? What other data items describes the attributes of the entity?

Data Structure Diagrams

Data Structure Diagrams


Entity name Check

Key

Account number

Data items

Check number Date Payee Amount of transaction

Figure: Data structure diagram for checking examples

Figure includes a simple data structure diagram for checking example introduced. As illustration shows, the record key, which in this case is the account number, uniquely identifies the account. Other details , including check number, date, payee, and amount of the transaction, are attributes. Analyzing the use of the checking information through the data structure diagram shows that the actual check number must be used for identification purposes. Since the account number, uniquely identifies the account but doesnot describe transactions involving it, a combined key of account number and check number must be used to trace individual transactions.

Data Structure Diagrams

There are mainly four types of files: Master file Transaction file Table file Report file

Types of Files

A master file is a collection of records about an important aspect of an organizations activities. It may contain data describing the current status of specific events or business indicators. E.g the master file in accounts payable system shows the balance owed to every vendor and supplier from whom the organization purchases supplies or services. A second type of master file reflects the history of events affecting a particular entity.

Master File

A transaction file is a temporary file with two purposes: 1. accumulating data about events as they occur 2. updating master files to reflect the results of current transactions.

Transaction File

Table files contain reference data used in processing transactions, updating master files, or producing output. Table files conserve storage space and ease program maintenance by storing in a file data that otherwise would be included in programs or master file records.

Table File

Report files are temporary files used when printing time is not available for all the reports produced, a situation that frequently arises in overlapped processing. (the capability of a computer to simultaneously carry out input, processing, and output operations which increases throughput time considerably). The computer writes the report or document to a file on magnetic disk or tape, where it remains until it can be printed. This process is known as spooling, i.e. , output that cannot be printed when it is produced is spooled into a report file.

Report File

Other kinds of files play a role in information systems. A backup file is a copy of a master, transaction, or table file made to ensure that a duplicate is available if anything happens to the original. Archival files, copies made for long-term storage of data, usually are stored away from the computer center to prevent their being inadvertently accessed or retrieved for use, thus ensuring their preservation.

Other Files

Methods of file organization


Sequential organization Direct-Access organization Indexed organization

Sequential organization
Sequential organization is the simplest way to store and retrieve records in a file. In sequential file, records are stored one after the another without concern for the actual value of data in the records. The first record is stored at the beginning of the file. The second is stored right after the first, the third after the second and so on. This order never changes in sequential file organization.

Sequential organization 1
Reading sequential files: To read sequential file, the system always starts at the beginning of the file and reads its way up to the record, one record at a time. Searching for records: Sequential files do not use physical record keys; records are accessed in order of their appearance in file. Evaluation of sequential files: when there is need to access every record in a file then it is a good method. If on the average of about one-half of the records in the file is to be used then it is also acceptable. On the other hand, where the requirement is to find one particular record in a very large file, sequential file organization becomes a disadvantage.

Direct-Access organization
Direct-access files are keyed files. They associate a record with a specific key value and a particular storage location. All records are stored by key at addresses rather than by position. If the program knows the record key, it can determine the location address of a record and retrieve it independently of every other record in a file. In general, if fewer than 10 percent of the records in a file will be needed during a typical processing run, the file should not be established as a sequential file. On the other hand, if more than 40 percent of the records will be accessed, the analyst should select the sequential organization.

Direct-Access organization 2
Using the record key as the storage address is called addressing. When this method can be used, it is simple and quick. However, the requirements of this method often prevent its use. Direct addressing should have a data set with the following characteristics: 1. The key set (i.e., the range of key values assigned) is in a dense, ascending order with few unused values (unused values mean wasted storage space). Therefore few open gaps in key values are wanted. 2. The record keys correspond to the numbers of the storage addresses: there is a storage address for each actual or possible key value in the file and there are no duplicate key values.

Direct-Access organization 3
Hash addressing: When direct addressing is not possible but direct access is necessary the analyst specifies the alternative access method of hashing. Hashing (also called Key transformation or randomizing) refers to the process of deriving a storage address from a record key. An algorithm (an arithmetic procedure) is devised to change a key value into another value that serves as a storage address. (The data value in the record itself does not change).

Direct-Access organization 4
Types of hashing algorithms: A simple hashing algorithm for changing the social security number into a suitable storage address follows: 1. Strip off the first three digits of the number. 456821455 becomes 821455. 2. Divide the new key by prime number. Here we are using 41. 3. Modular division is used. 4. 19. Folding: Split the key into pieces and process them further (add, subtract, divide, etc). 821 + 455 1276 storage location

Direct-Access organization 5
Extraction: Select specific digits from the key and process them with the remaining digits. 814 (1st, 3rd , 4th digits) - 255 (2nd, 5th , 6th digits) 599 storage location Squaring: Multiply the number by itself and then apply other hashing methods. 821,455 * 821,455 = 67,478, 831 Fold first half with second half. Extract 1st and 2nd to other digits 6747 578 8831 15 15,578 storage location 593 storage location

Indexed organization
A third way of accessing records is through an index. The basic form of index includes a record key and a storage address for a record. To find a record when a storage address is unknown (as with direct access and hashing structures), it is necessary to scan the records. However the search will be faster if an index is used, since it takes less time to search an index than an entire file of data.

Characteristics of an Index
An index is a separate file from master file to which it pertains. Each record in the index contains only two items of data: a record key and a storage address. To find a specific record when the file is stored under an indexed organization, the index is first searched to find the key of record wanted. When it is found, the corresponding storage address is noted and then the program accesses the record directly. This method uses a sequential scan of the index, followed by direct access to the appropriate record. The index help speed the search compared with a sequential file, but it is slower than direct addressing

S-ar putea să vă placă și