Sunteți pe pagina 1din 77

INTRODUCTION TO DATA PROCESSING

1.0 Objectives
At the conclusion of this lesson you should be able to know:
• Data Processing
• Data & Information
• Types of Data
• Input, Processing and Output
• Architecture of Computer System
• Input Devices
• Output Devices

1.1 Introduction

Data processing is any computer process that converts data into information.
The processing is usually assumed to be automated and running on a
mainframe, minicomputer, microcomputer, or personal computer. Because
data are most useful when well-presented and actually informative, data-
processing systems are often referred to as information systems to
emphasize their practicality. Nevertheless, both terms are roughly
synonymous, performing similar conversions; data-processing systems
typically manipulate raw data into information, and likewise information
systems typically take raw data as input to produce information as output.

To better market their profession, a computer programmer or a systems


analyst that might once have referred, such as during the 1970s, to the
computer systems that they produce as data-processing systems more often
than not nowadays refers to the computer systems that they produce by some
other term that includes the word information, such as information systems,
information technology systems, or management information systems.

In the context of data processing, data are defined as numbers or characters


that represent measurements from the real world. A single datum is a single
measurement from the real world. Measured information is then
algorithmically derived and/or logically deduced and/or statistically calculated
from multiple data. Information is defined as either a meaningful answer to a
query or a meaningful stimulus that can cascade into further queries.

More generally, the term data processing can apply to any process that
converts data from one format to another, although data conversion would be
the more logical and correct term. From this perspective, data processing
becomes the process of converting information into data and also the
converting of data back into information. The distinction is that conversion
doesn't require a question (query) to be answered. For example, information
in the form of a string of characters forming a sentence in English is converted
or encoded from a keyboard's key-presses as represented by hardware-
oriented integer codes into ASCII integer codes after which it may be more
easily processed by a computer—not as merely raw, amorphous integer data,
but as a meaningful character in a natural language's set of graphemes—and
finally converted or decoded to be displayed as characters, represented by a
font on the computer display. In that example we can see the stage-by-stage
conversion of the presence of and then absence of electrical conductivity in
the key-press and subsequent release at the keyboard from raw substantially-
meaningless integer hardware-oriented data to evermore-meaningful
information as the processing proceeds toward the human being.

A more conventional example of the established practice of using the term


data processing is that a business has collected numerous data concerning
an aspect of its operations and that this multitude of data must be presented
in meaningful, easy-to-access presentations for the managers who must then
use that information to increase revenue or to decrease cost. That conversion
and presentation of data as information is typically performed by a data-
processing application.

When the domain from which the data are harvested is a science or an
engineering, data processing and information systems are considered too
broad of terms and the more specialized term data analysis is typically used,
focusing on the highly-specialized and highly-accurate algorithmic derivations
and statistical calculations that are less often observed in the typical general
business environment. This divergence of culture is exhibited in the typical
numerical representations used in data processing versus numerical; data
processing's measurements are typically represented by integers or by fixed-
point or binary-coded decimal representations of real numbers whereas the
majority of data analysis's measurements are often represented by floating-
point representation of real numbers.

Practically all naturally occurring processes can be viewed as examples of


data processing systems where "real world" information in the form of
pressure, light, etc. are converted by human observers into electrical signals
in the nervous system as the senses we recognise as touch, sound, and
vision. Even the interaction of non-living systems may be viewed in this way
as rudimentary information processing systems. Conventional usage of the
terms data processing and information systems restricts their use to refer to
the algorithmic derivations, logical deductions, and statistical calculations that
recur perennially in general business environments, rather than in the more
expansive sense of all conversions of real-world measurements into real-
world information in, say, an organic biological system or even a scientific or
engineering system.

1.1.1 Data

Data are any facts, numbers, or text that can be processed by a computer.
Today, organizations are accumulating vast and growing amounts of data in
different formats and different databases. This includes:
• operational or transactional data such as, sales, cost, inventory,
payroll, and accounting

• non-operational data, such as industry sales, forecast data, and


macro economic data

• meta data - data about the data itself, such as logical database
design or data dictionary definitions

1.1.2 Information

The patterns, associations, or relationships among all this data can provide
information. For example, analysis of retail point of sale transaction data can
yield information on which products are selling and when.

1.1.3 Types of Data

Think about any collected data that you have experience of; for example,
weight, sex, ethnicity, job grade, and consider their different attributes. These
variables can be described as categorical or quantitative.

The table summarizes data types and their associated measurement level,
plus some examples. It is important to appreciate that appropriate methods for
summary and display depend on the type of data being used. This is also true
for ensuring the appropriate statistical test is employed.

Type of data Level of measurement Examples

Nominal
Eye color, ethnicity,
(no inherent order in
diagnosis
categories)

Categorical Ordinal
(categories have inherent Job grade, age groups
order)

Binary Gender
(2 categories – special
case of above)

Discrete
Size of household (ratio)
(usually whole numbers)
Quantitative
(Interval/Ratio) Continuous

(can, in theory, take any Temperature °C/°F (no


(NB units of value in a range, although absolute zero) (interval)
measurement necessarily recorded to a
used)
predetermined degree of Height, age (ratio)
precision)

Table 1.1 Types of Data

1.2. Input, Processing and output

Whenever a computer is used it must work its way through three basic stages
before any task can be completed. These are input, processing and output. A
Computer works through these stages by running a program. A program is a
set of step-by-step instructions which tells the computer exactly what to do
with the input in order to produce the required output.

1.2.1 Input

The input stage of computing is concerned with getting the data needed by
the program into the computer. Input devices are used to do this. The most
commonly used input devices are the mouse and the keyboard.

1.2.2 Processing

The program contains instructions about what to do with the input. During the
processing stage the compute follows these instructions using the data which
has just been input. What the computer produces at the end of this stage, the
output, will only be as good as the instructions given in the program. In other
words if garbage has been put in to the program, garbage is what will come
out of the computer. This is known as GIGO, or Garbage In Garbage Out.
1.2.3 Output

The output stage of computing is concerned with giving out processed data as
information in a form that is useful to the user. Output devices are used to do
this. The most commonly used output devices are the screen, which is also
called a monitor or VDU and the printer.

1.3. Architecture of Computer System

This is the 'brain' of the computer. It is where all the searching, sorting,
calculating and decision making takes place. The CPU collects all of the raw
data from various input devices (such a keyboard or mouse) and converts it
into useful information by carrying out software instructions. The result of all
that work is then sent to output devices such as monitors and printers.

The CPU is a microprocessor - a silicon chip - composed of tiny electrical


switches called 'transistors'. The speed at which the processor carries out its
operations is measured in megahertz (MHz) or Gigahertz (GHz). The higher
the number of MHz the faster the computer can process information. A
common CPU today runs at around 3 GHz or more.

The Intel Pentium processor and the Athlon are examples of a CPU.
Figure 1.1 Block diagram of CPU

1.3.1 The Control Unit (CU)

The Control Unit (CU) co-ordinates the work of the whole computer system.

It has three main jobs:

1. It controls the hardware attached to the system. The Control Unit


monitors the hardware to make sure that the commands given to it by the
current program are activated.

2. It controls the input and output of data, so all the signals go to the right
place at the right time.

3. It controls the flow of data within the CPU.

1.3.2 The Immediate Access Store (IAS)

The Immediate Access Store (IAS) holds the data and programs needed at
that instant by the Control Unit. The CPU reads data and programs kept on
the backing storage and store them temporarily in the IAS's memory.

The CPU needs to do this because Backing Store is much too slow to be able
to run data and programs from directly. For example, lets pretend that a
modern CPU was slowed down to carry out one instruction in 1 second, then
the hard disk (ie Backing Store) would take 3 months to supply the data it
needs!

So the trick is to call in enough of the data and programs into fast Immediate
Access Store memory so as to keep the CPU busy.

1.3.3 ALU stands for Arithmetic and Logic Unit.

It is where the computer processes data by either manipulating it or acting


upon it. It has two parts:
1. Arithmetic part - does exactly what you think it should - it does the
calculations on data such as 3 + 2.

2. Logic part - This section deals with carrying out logic and comparison
operations on data. For example working out if one data value is bigger than
another data value.

1.4. Input Devices

Due to a constant research in the computer hardware we have a large


number of input devices recall that before data can be processed by the
computer they must be translated into machine readable form and entered
into the computer by an input device. Here we will introduce a variety of input
devices.

1.4.1 Keyboard

The keyboard is the most widely used input device


and is used to enter data or commands to the
computer. It has a set of alphabet keys, a set of digit
keys, and various function keys and is divided into
four main areas:

• Function keys across the top


• Letter keys in the main section
• A numeric keypad on the right
• Cursor movement and editing keys
between the main section and the numeric
keypad.

The layout of the letters on a keyboard is standard across many countries and
is called a QWERTY keyboard. The name comes from the first six keys on the
top row of the alphabetic characters.

Some keyboards come with added keys for using the Internet and others have
an integrated wrist support. Ergonomic keyboards have been developed to
reduce the risk of repetitive strain injury to workers who use keyboards for
long periods of time.

The computer's processor scans the keyboard hundreds of times per second
to see if a key has been pressed. When a key is pressed, a digital code is
sent to the Central Processing Unit (CPU). This digital code is translated into
ASCII code (American Standard Code of Information Interchange).

For example, pressing the 'A' key produces the binary code 01100001
representing the lower case letter 'a'. Holding down the shift key at the same
time produces the binary code 01000001 representing the upper case letter
'A'.

Advantages:

• Most computers have this device attached to it


• It is a reliable method for data input of text and numbers
• A skilled typist can enter data very quickly.
• Specialist keyboards are available

Disadvantages:

• It is very easy to make mistakes when typing data in


• It can be very time consuming to enter data using a
keyboard, especially if you are not a skilled typist.
• It is very difficult to enter some data, for example, details of
diagrams and pictures.
• It is very slow to access menus and not flexible when you want
to move objects around the screen
• Difficult for people unable to use keyboards through
paralysis or muscular disorder.

1.4.2 Mouse
A mouse is the most common pointing device that you will
come across. It enables you to control the movement and
position of the on-screen cursor by moving it around on the
desk.

Buttons on the mouse let you select options from menus and drag objects
around the screen. Pressing a mouse button produces a 'mouse click'. You
might have heard the expressions 'double click', 'click and drag' and 'drag and
drop'.

Most mice use a small ball located underneath them to calculate the direction
that you are moving the mouse in. The movement of the ball causes two
rollers to rotate inside the mouse; one records the movement in a north-south
direction and the other records the east-west movement. The mouse monitors
how far the ball turns and in what direction and sends this information to the
computer to move the pointer.

Advantages:

• Ideal for use with desktop computers.


• Usually supplied with a computer so no additional cost.
• All computer users tend to be familiar with using them.

Disadvantages:

• They need a flat space close to the computer.


• The mouse cannot easily be used with laptop, notebook or
palmtop computers. (These need a tracker ball or a touch sensitive
pad called a touch pad).

1.4.3 Trackball

A tracker ball is like an upside down mouse with the ball on top.
Turning the ball with your hand moves the pointer on the
screen. It has buttons like a standard mouse, but requires very little space to
operate and is often used in conjunction with computer aided design. You will
often find a small tracker ball built into laptop computers in place of the
conventional mouse.

Advantages:

• Ideal for use where flat space close to the computer is limited.
• Can be useful with laptops as they can be built into the
computer keyboard or clipped on.

Disadvantages:

• Not supplied as standard so an additional cost and users


have to learn how to use them

1.4.4 Joystick

A Joystick is similar to a tracker ball in operation except you have a stick


which is moved rather than a rolling ball.

Joysticks are used to play computer games. You can


move a standard joystick in any one of eight directions.
The joystick tells the computer in which direction it is
being pulled and the computer uses this information to
(for example) move a racing car on screen. A joystick
may also have several buttons which can be pressed to
trigger actions such as firing a missile.

Advantages:

• There is an immediate feel of direction due to the movement


of the stick

Disadvantages:
• Some people find the joystick difficult to control rather than other point and
click devices. This is probably because more arm and wrist movement is
required to control the pointer than with a mouse or tracker ball.
• Joysticks are not particularly strong and can break easily when used with
games software.

1.4.5 Touch Screen

These screens do a similar job to concept keyboards. A


grid of light beams or fine wires criss-cross the computer
screen. When you touch the screen with your finger, the
rays are blocked and the computer 'senses' where you
have pressed. Touch screens can be used to choose
options which are displayed on the screen.

Touch screens are easy to use and are often found as input devices in public places such
as museums, building societies (ATMs), airports or travel agents. However, they are not
commonly used elsewhere since they are not very accurate, tiring to use for a long period
and are more expensive than alternatives such as a mouse.

Advantages:

• Easy to use
• Software can alter the screen while it is running, making it more flexible that
a concept keyboard with a permanent overlay
• No extra peripherals are needed apart from the touch screen monitor
itself.
• No experience or competence with computer systems are needed to be
able to use it.

Disadvantages
• :Not suitable for inputting large amounts of data
• Not very accurate, selecting detailed objects can be difficult with fingers
• Tiring to use for a long period of time
• More expensive than alternatives such as a mouse.
• Touch screens are not robust and can soon become faulty.

1.4.6 Digital Camera

A digital camera looks very similar to a traditional camera.


However, unlike photographic cameras, digital cameras do not
use film. Inside a digital camera is an array of light sensors.
When a picture is taken, the different colors that make up the
picture are converted into digital signals
(binary) by sensors placed behind the lens.

Most digital cameras let you view the image as soon as you have taken the picture
and, if you don't like what you see, it can be deleted. The image can then be stored
in the camera's RAM or on a floppy disk. Later, the pictures can be transferred onto a
computer for editing using photo imaging software.

The amount of memory taken up by each picture depends on its resolution. The
resolution is determined by the number of dots which make up the picture: the
greater the number of dots which make up the picture, the clearer the image.
However, higher resolution pictures take up more memory (and are more
expensive!).

Resolution range from about 3 million (or Mega) pixels up to 12 Mega pixels

Digital cameras are extremely useful for tasks such as producing newsletters.

There is often a digital camera built into mobile phones that operates in exactly the
same way as a standard one.

Advantages:
• No film is needed and there are no film developing costs
• Unwanted images can be deleted straight away
• You can edit, enlarge or enhance the images
• Images can be incorporated easily into documents, sent by e-mail or
added to a website.

Disadvantages:

• Digital cameras are generally more expensive than ordinary cameras.


• Images often have to be compressed to avoid using up too much
expensive memory
• When they are full, the images must be downloaded to a computer or
deleted before any more can be taken.

1.4.7 Scanner

A scanner is another way in which we can capture still images


or text to be stored and used on a computer. Images are
stored as 'pixels'.

A scanner works by shining a beam of light on to the surface


of the object you are scanning. This light is reflected back on
to a sensor that detects the color of the light.

The reflected light is then digitized to build up a digital image.

Scanner software usually allows you to choose between a high resolution (very high
quality images taking up a lot of memory) and lower resolutions.

Special software can also be used to convert images of text into actual text data
which can be edited by a word processor. This software is called an
"Optical Character Reader" or OCR.

There are two types of scanner:

• Flatbed Scanner
• Handheld Scanner

The most popular type of scanner is the flatbed. It works in


a similar way to a photocopier. Flatbed scanners can scan
larger images and are more accurate than handheld
scanners.

Handheld scanners are usually only a few inches wide and are rolled across
the document to be scanned. They perform the same job but the amount of
information that can be scanned is limited by the width of the scanner and the
images produced are not of the same quality as those produced by flatbed
scanners.

Advantages:

• Flat-bed scanners are very accurate and can produce images


with a far higher resolution than a digital camera
• Any image can be converted from paper into digital format
and later enhanced and used in other computer documents.

Disadvantages:

• Images can take up a lot of memory space.


• The quality of the final image depends greatly upon the
quality of the original document.

1.4.8 Graphics Tablets


Graphics tablets are often used by graphics designers and illustrators. Using
a graphics tablet a designer can produce much more accurate drawings on
the screen than they could with a mouse or other pointing device.

A graphics tablet consists of a flat pad (the tablet) on which you draw with a
special pen. As you draw on the pad the image is created on the screen. By
using a graphics tablet a designer can produce very accurate on-screen
drawings.
Drawings created using a graphics tablet can be accurate to within
hundredths of an inch.

The 'stylus' or pen that you use may have buttons on it that act like a set of
mouse buttons. Sometimes, instead of a stylus a highly accurate mouse-like
device called a puck is used to draw on the tablet.

Advantages:

• In a design environment where it is more natural to draw


diagrams with pencil and paper, it is an effective method of
inputting the data into the computer.

Disadvantages:

• Not as good as a mouse for clicking on menu items.

1.5. Output Devices

Once data has been input into a computer and processed, it is of little use
unless it can be retrieved quickly and easily from the system. To allow this, the
computer must be connected to an output device.

The most common output devices are computer monitors and printers.
However, output can also be to a modem, a plotter, speakers, a computer
disk, another computer or even a robot.

1.5.1 Monitor

A Monitor (or "screen") is the most common form of output


from a computer. It displays information in a similar way to
that shown on a television screen.
On a typical computer the monitor may measure 17 inches (43 cm) across its
display area. Larger monitors make working at a computer easier on the eyes.
Of course the larger the screen, the higher its cost! Typical larger sizes are 19
inch, 20 inch and 21 inches.

Part of the quality of the output on a monitor depends on what resolution it is


capable of displaying. Other factors include how much contrast it has, its
viewing angle and how fast does it refresh the screen. For example a good
computer game needs a fast screen refresh so you can see all the action.

The picture on a monitor is made up of thousands of tiny colored dots called


pixels. The quality and detail of the picture on a monitor depends on the
number of pixels that it can display. The more dense the pixels the greater the
clarity of the screen image.

A PC monitor contains a matrix of dots of Red, Green and Blue known as


RGB. these can be blended to display millions of colors.

This is one RGB pixel of light

R + B = M (magenta)
B + G = C (cyan)
G + R = Y (yellow)

R + G + B = W (white)

The two most common types of monitor are a cathode-ray tube (CRT) monitor
and a liquid crystal display (LCD).

Liquid Crystal Display (or "TFT" Display)

This is smaller and lighter than the CRT (see below), which
makes them ideal for use with portable laptops, PDAs and
Palmtops. Even desktop computers are using them now that
their price has become comparable to CRT monitors.
Liquid Crystal is the material used to create each pixel on the screen. The
material has a special property - it can 'polarize' light depending on the
electrical charge across it. Charge it one way and all the light passing through
it is set to "vertical" polarity, charge it another way and the light polarity is set
to "horizontal". This feature allows the pixels to be created. Each tiny cell of
liquid crystal is a pixel.

TFT (or Thin Film Transistor) is the device within each pixel that sets the
charge. And so sometimes they are called "Liquid Crystal Display" referring to
the material they use or they are called "TFT displays" referring to the tiny
transistors that make them work.

LCDs use much less power than a normal monitor.

Cathode Ray Tube

The CRT works in the same way as a television - it contains


an electron gun at the back of the glass tube. This fires
electrons at groups of phosphor dots which coat the inside of
the screen. When the electrons strike the phosphor dots they glow to give the
colors.

Advantages of monitors

• Relatively cheap
• Reliable
• Can display text and graphics in a wide range of colours
• As each task is processed, the results can be displayed
immediately on the screen
• Output can be scrolled backwards and forwards easily.
• Quiet
• Do not waste paper

Disadvantages of monitors:
• No permanent copy to keep - the results will disappear
when the computer is switched off.
• Unsuitable for users with visual problems.
• Only a limited amount of information can be displayed at any
one time
• Screens are made of glass and can be very fragile.

1.5.2 Printers
Printers are output devices. They are dedicated to creating paper copies from
the computer.

Printers can produce text and images on paper. Paper can be either separate
sheets such as A4 A5 A3 etc. or they may be able to print on continuous
(fanfold) paper that feed through the machine.

Continuous paper with holes on the edges, used


by dot matrix printers. After you print on fanfold
A ream of A4 paper
paper, you have to separate the pages and tear
off the edge strips

Very specialist printers can also print on plastic or even textiles such as T-
shirts.

Some printers are dedicated to only producing black and white output. Their
advantage is that they are often faster than a color printer because effectively
there is only one color to print (Black).
Color Printers are dedicated to creating text and images in full
color. Some types can even produce photographs when special paper is
used.

There are three main types of printer that you need to know about. You will be
expected to understand the main differences i.e. purchase costs, running
costs, quality and speed

The three types are Laser, Dot Matrix and Inkjet.

1.5.3 Plotter

These are output devices that can produce high quality line diagrams on
paper. They are often used by engineering, architects and
scientific organizations to draw plans, diagrams of machines
and printed circuit boards.

A plotter differs from a printer in that it draws images using a


pen that can be lowered, raised and moved across the page to form
continuous lines. The electronically controlled pen is moved by two computer-
controlled motors. The pen is lifted on and off the page by switching an
electromagnet on and off.

The paper is handled in different ways depending on the type of plotter.

Flatbed plotters hold the paper still while the pens move.

Drum plotters roll the paper over a cylinder

Pinch-roller plotters are a mixture of the two.

Advantages:
• Drawings are of the same quality as if an expert drew them
• Larger sizes of paper can be used than would be found
on most printers

Disadvantages:

• Plotters are slower than printers, drawing each line separately.


• They are often more expensive to buy than printers
• Although drawings are completed to the highest quality they
are not suitable for text (although text can be produced)
• There is a limit to the amount of detail these plotters can
produce, although there are plotters which are "pen-less" the set are
used for high-density drawings as may be used for printed circuit
board layout.
• In recent years, cheaper printers that can handle A3 and
A2 sized paper have resulted in a decline in the need for smaller
plotters.

1.6 Summary

• Data processing is any computer process that converts data


into information.
• Data are any facts, numbers, or text that can be processed
by a computer.
• The patterns, associations, or relationships among all this data
can provide information.
• The CPU is a microprocessor - a silicon chip - composed of
tiny electrical switches called 'transistors'.
• The keyboard is the most widely used input device and is
used to enter data or commands to the computer.
• A Joystick is similar to a tracker ball in operation except you
have a stick which is moved rather than a rolling ball.
• Graphics tablets are often used by graphics designers
and illustrators.
• The most common output devices are computer monitors
and printers.
• Meta data - data about the data itself, such as logical database
design or data dictionary definitions.
• Resolution of a digital camera range from about 3 million (or
Mega) pixels up to 12 Mega pixels.

1.7 Key words

• Operational Data - Operational or transactional data such as,


sales, cost, inventory, payroll, and accounting.
• Non- operational Data - non-operational data, such as industry
sales, forecast data, and macro economic data.
• Input - The input stage of computing is concerned with getting
the data needed by the program into the computer.
• Output - The output stage of computing is concerned with giving
out processed data as information in a form that is useful to the user.
• Pixels - The picture on a monitor is made up of thousands of tiny
colored dots called pixels.

1.8 Self Assessment Questions (SAQ)

• What do you mean by information? How it is different from data?


Explain.
• Explain the process of input – processing - output with the help
of suitable examples.
• Explain the architecture of a Computer System.
• Explain what is meant by the term input device? Give three
examples of input devices. Also give possible advantages and
disadvantage of the same.
• Explain what is meant by the term output device? Give three
examples of output devices. Also give possible advantages and
disadvantage of the same.
• What are different types of printers? How a plotter is different
from a printer?
1.9 References/Suggested Readings
• Computer Fundamental, P.K. Sinha, BPB Publications – 2004
• Sams Teach Yourself COBOL in 24 Hours, Hubbell, Sams, Dec
1998
• Structured COBOL Methods, Noll P, Murach, Sep 1998
• ICT for you, Stephon Doyle, Nelson Thornes, 2003
• Information and Communication Technology, Denise Walmsley,
Hodder Murray 2004
• Information Technology, P Evans, BPB Publications, 2000
LESSON 2
CONCEPTS OF FILES

2.0 Objectives

At the conclusion of this lesson you should be able to know:


• File
• File Contents
• Operations on the file
• File Organization
• Storing Files
• Backing-up files
• File Terminology
• Data Capturing
• Data Verification
• Data Validation

2.1. Introduction

A computer file is a piece of arbitrary information, or resource for storing information,


that is available to a computer program and is usually based on some kind of durable
storage. A file is durable in the sense that it remains available for programs to use
after the current program has finished.
Computer files can be considered as the modern counterpart of the files of printed
documents that traditionally existed in offices and libraries.

2.1.1. File contents

As far as the operating system is concerned, a file is in most cases just a sequence
of binary digits. At a higher level, where the content of the file is being considered,
these binary digits may represent integer values or text
characters, It is up to the program using the file to understand the meaning
and internal layout of information in the file and present it to a user as a
document, image, song, or program.

At any instant in time, a file has might have a size, normally expressed in
bytes, that indicates how much storage is associated with the file.

Information in a computer file can consist of smaller packets of information


(often called records or lines) that are individually different but share some
trait in common. For example, a payroll file might contain information
concerning all the employees in a company and their payroll details; each
record in the payroll file concerns just one employee, and all the records have
the common trait of being related to payroll—this is very similar to placing all
payroll information into a specific filing cabinet in an office that does not have
a computer. A text file may contain lines of text, corresponding to printed lines
on a piece of paper.

The way information is grouped into a file is entirely up to the person


designing the file. This has led to a plethora of more or less standardized file
structures for all imaginable purposes, from the simplest to the most complex.
Most computer files are used by computer programs. These programs create,
modify and delete files for their own use on an as-needed basis. The
programmers who create the programs decide what files are needed, how
they are to be used and (often) their names.

In some cases, computer programs manipulate files that are made visible to
the computer user. For example, in a word-processing program, the user
manipulates document files that she names herself. The content of the
document file is arranged in a way that the word-processing program
understands, but the user chooses the name and location of the file, and she
provides the bulk of the information (such as words and text) that will be
stored in the file.

Files on a computer can be created, moved, modified, grown, shrunk and


deleted. In most cases, computer programs that are executed on the
computer handle these operations, but the user of a computer can also
manipulate files if necessary. For instance, Microsoft Word files are normally
created and modified by the Microsoft Word program in response to user
commands, but the user can also move, rename, or delete these files directly
by using a file manager program such as Windows Explorer (on Windows
computers).

2.1.2. Operations on the file

Opening a file to use its contents

Reading or updating the contents

Committing updated contents to durable storage

Closing the file, thereby losing access until it is opened again

2.1.3 File Organization

2.1.3.1 Sequential file

Access to records in a Sequential file is serial. To reach a particular record, all


the preceding records must be read.

As we observed when the topic was introduced earlier in the course, the
organization of an unordered Sequential file means it is only practical to read
records from the file and add records to the end of the file (OPEN..EXTEND).
It is not practical to delete or update records.

While it is possible to delete, update and insert records in an ordered


Sequential file, these operations have some drawbacks.

2.1.3.1.1 Problems accessing ordered Sequential files

Records in an ordered Sequential file are arranged, in order, on some key


field or fields. When we want to insert, delete or amend a record we must
preserve the ordering. The only way to do this is to create a new file. In the
case of an insertion or update, the new file will contain the inserted or updated
record. In the case of a deletion, the deleted record will be missing from the
new file.
The main drawback to inserting, deleting or amending records in an ordered
Sequential file is that the entire file must be read and then the records written
to a new file. Since disk access is one of the slowest things we can do in
computing this is very wasteful of computer time when only a few records are
involved.

For instance, if 10 records are to be inserted into a 10,000 record file, then
10,000 records will have to be read from the old file and 10,010 written to the
new file. The average time to insert a new record will thus be very great.

2.1.3.1.2 Inserting records in an ordered Sequential file

To insert a record in an ordered Sequential file:

1. All the records with a key value less than the record to be
inserted must be read and then written to the new file.
2. Then the record to be inserted must be written to the new file.

3. Finally, the remaining records must be written to the new file.

2.1.3.1.3 Deleting records from an ordered Sequential file

To delete a record in an ordered Sequential file:

1. All the records with a key value less than the record to be
deleted must be written to the new file.
2. When the record to be deleted is encountered it is not written to
the new file.
3. Finally, all the remaining records must be written to the new file.

2.1.3.1.4 Amending records in an ordered Sequential file

To amend a record in an ordered Sequential file:

1. All the records with a key value less than the record to be
amended must be read and then written to the new file.
2. Then the record to be amended must be read the amendments
applied to it and the amended record must then be written to the new
file.
3. Finally, all the remaining records must be written to the new file.

2.1.3.2 Relative File


As we have already noted, the problem with Sequential files is that access to
the records is serial. To reach a particular record, all the proceeding records
must be read.

Direct access files allow direct access to a particular record in the file using a
key and this greatly facilitates the operations of reading, deleting, updating
and inserting records.

COBOL supports two kinds of direct access file organizations -Relative and
Indexed.

2.1.3.2.1 Organization of Relative files

Records in relative files are organized on ascending Relative Record Number.


A Relative file may be visualized as a one dimension table stored on disk,
where the Relative Record Number is the index into the table. Relative files
support sequential access by allowing the active records to be read one after
another.

Relative files support only one key.


The key must be numeric and must
take a value between 1 and the
current highest Relative Record
Number. Enough room is allocated to
the file to contain records with Relative
Record Numbers between 1 and the
highest record number.

For instance, if the highest relative


record number used is 10,000 then
room for 10,000 records is allocated to
the file.

Figure 1 below contains a schematic


representation of a Relative file. In this
example, enough room has been
allocated on disk for 328 records. But
although there is room for 328 records
in the current allocation, not all the
record locations contain records. The record areas labeled "free", have not yet
had record values written to them.

Relative File - Organization


Figure 1

2.1.3.2.2 Accessing records in a Relative file


To access a record in a Relative file a Relative Record Number must be
provided. Supplying this number allows the record to be accessed directly
because the system can use

the start position of the file on disk,


the size of the record,
and the Relative Record Number

to calculate the position of the record.

Because the file management system only has to make a few calculations to
find the record position the Relative file organization is the fastest of the two
direct access file organizations available in COBOL. It is also the most
storage efficient.

2.1.3.3 Indexed Files

While the usefulness of a Relative file is constrained by its restrictive key,


Indexed files suffer from no such limitation.

Indexed files may have up to 255 keys, the keys can be alphanumeric and
only the primary key must be unique.

In addition, it is possible to read an Indexed file sequentially on any of its


keys.

2.1.3.3.1 Organization of Indexed files

An Indexed file may have multiple keys. The key upon which the data records
are ordered is called the primary key. The other keys are called alternate
keys.

Records in the Indexed file are sequenced on ascending primary key. Over
the actual data records, the file system builds an index. When direct access is
required, the file system uses this index to find, read, insert, update or delete,
the required record.

For each of the alternate keys specified in an Indexed file, an alternate index
is built. However, the lowest level of an alternate index does not contain actual
data records. Instead, this level made up of base records which contain only
the alternate key value and a pointer to where the actual record is. These
base records are organized in ascending alternate key order.
As well as allowing direct access to records on the primary key or any of the
254 alternate keys, indexed files may also be processed sequentially. When
processed sequentially, the records may be read in ascending order on the
primary key or on any of the alternate keys.

Since the data records are in held in ascending primary key sequence it is
easy to see how the file may be accessed sequentially on the primary key. It
is not quite so obvious how sequential on the alternate keys is achieved. This
is covered in the unit on Indexed files.

Organizing files and folders

Files and folders arranged in a hierarchy

In modern computer systems, files are typically accessed using names. In


some operating systems, the name is associated with the file itself. In others,
the file is anonymous, and is pointed to by links that have names. In the latter
case, a user can identify the name of the link with the file itself, but this is a
false analogue, especially where there exists more than one link to the same
file.

Files (or links to files) can be located in directories. However, more generally,
a directory can contain either a list of files, or a list of links to files. Within this
definition, it is of paramount importance that the term "file" includes
directories. This permits the existence of directory hierarchies. A name that
refers to a file within a directory must be unique. In other words, there must be
no identical names in a directory. However, in some operating systems, a
name may include a specification of type that means a directory can contain
an identical name to more than one type of object such as a directory and a
file.

In environments in which a file is named, a file's name and the path to the
file's directory must uniquely identifiy it among all other files in the computer
system—no two files can have the same name and path. Where a file is
anonymous, named references to it will exist within a namespace. In most
cases, any name within the namespace will refer to exactly zero or one file.
However, any file may be represented within any namespace by zero, one or
more names.

Any string of characters may or may not be a well-formed name for a file or a
link depending upon the context of application. Whether or not a name is well-
formed depends on the type of computer system being used. Early computers
permitted only a few letters or digits in the name of a file, but modern
computers allow long names (some up to 255) containing almost any
combination of unicode letters or unicode digits, making it easier to
understand the purpose of a file at a glance. Some computer systems allow
file names to contain spaces; others do not. Such characters such as / or \ are
forbidden. Case-sensitivity of file names is determined by the file system.

Most computers organize files into hierarchies using folders, directories, or


catalogs. (The concept is the same irrespective of the terminology used.)
Each folder can contain an arbitrary number of files, and it can also contain
other folders. These other folders are referred to as subfolders. Subfolders
can contain still more files and folders and so on, thus building a tree-like
structure in which one “master folder” (or “root folder” — the name varies from
one operating system to another) can contain any number of levels of other
folders and files. Folders can be named just as files can (except for the root
folder, which often does not have a name). The use of folders makes it easier
to organize files in a logical way.
Protecting files

Many modern computer systems provide methods for protecting files against
accidental and deliberate damage. Computers that allow for multiple users
implement file permissions to control who may or may not modify, delete, or
create files and folders. A given user may be granted only permission to
modify a file or folder, but not to delete it; or a user may be given permission
to create files or folders, but not to delete them. Permissions may also be
used to allow only certain users to see the contents of a file or folder.
Permissions protect against unauthorized tampering or destruction of
information in files, and keep private information confidential by preventing
unauthorized users from seeing certain files.

Another protection mechanism implemented in many computers is a read-


only flag. When this flag is turned on for a file (which can be accomplished by
a computer program or by a human user), the file can be examined, but it
cannot be modified. This flag is useful for critical information that must not be
modified or erased, such as special files that are used only by internal parts of
the computer system. Some systems also include a hidden flag to make
certain files invisible; this flag is used by the computer system to hide
essential system files that users must never modify

2.1.6 Storing files

In physical terms, most computer files are stored on hard disks—spinning


magnetic disks inside a computer that can record information indefinitely.
Hard disks allow almost instant access to computer files.

On large computers, some computer files may be stored on magnetic tape.


Files can also be stored on other media in some cases, such as writeable
compact discs, Zip drives, etc.
2.1.7 Backing up files

When computer files contain information that is extremely important, a back-


up process is used to protect against disasters that might destroy the files.
Backing up files simply means making copies of the files in a separate
location so that they can be restored if something happens to the computer, or
if they are deleted accidentally.

There are many ways to back up files. Most computer systems provide utility
programs to assist in the back-up process, which can become very time-
consuming if there are many files to safeguard. Files are often copied to
removable media such as writeable CDs or cartridge tapes. Copying files to
another hard disk in the same computer protects against failure of one disk,
but if it is necessary to protect against failure or destruction of the entire
computer, then copies of the files must be made on other media that can be
taken away from the computer and stored in a safe, distant location.

2.2. File Termnology

There are a few terms that you need to understand when learning about file
system. These will be explained over the next couple of pages.

File can store data or information in various formats. Suppose in a file data is
stored in the tables just like the one below:
2.2.1 Records

As you saw previously, each table stores can hold a a great deal of data.

Each table contains a lot of records.

A record is all of the data or information about one person or one thing.

In the table below, all of the information about each cartoon character is
stored in a 'row' or record.

What information could you find in the record for Cat Woman?

What do you think the database at your school stores records about? How

about the library? What records would be stored on that database?


2.2.2 Fields

Each table contains a lot of records.

A record is made up of lots of individual pieces of information. Look at Wonder


Woman's record; it stores her first name, last name, address, city and age.

Each of these individual pieces of information in a record are called a 'field'

A 'field' is one piece of data or information about a person or thing.

What fields can you find about Tweety Bird?

What fields do you think would be stored in your student record on the school
database?

What fields would be stored in a book record in the library database?

2.3. Data Capturing


Any database or information system needs data entered into it, in order for it
to be of any use.

There are many methods which can be used to collect and


enter data, some manual, some automatic.

We will also look in particular detail at designing an effective paper-based


data capture form.

2.3.1 Direct Data Capturing

Here are some of the methods that can be used to capture data directly.

2.3.1.1 Barcode reader

A bar code reader uses visible red light to scan and 'read' the barcode. As the
red light shines across the light and dark bands of the barcode, so the
reflected red light is also lighter and darker (do you see that on the picture
opposite?)

The Hand Scanner senses the reflected light and translates it into digital data.
The digital data is then input into the computer. The computer may display the
results on a screen and also input it into the correct fields in
the database.

Typical uses:

Shop - to find details on the product sold and price

Library - record the ISBN number of the book and the borrower's card number

Warehouse - to check the lables on boxes delivered against what is recorded


on the delivery sheet.

2.3.1.2. Magnetic ink character recognition (MICR)


The numbers at the bottom of a cheque are written in a special ink which
contains iron particles. This ink is magnetised and commonly called 'magnetic
ink'. It can be read by a special machine called a Magnetic Ink Character
Reader (MICR).

2.3.1.3 Optical Mark Readers (OMR)

An Optical Mark Reader is a scanning device that reads carefully placed


pencil marks on a specially designed form or document.

A simple pen or pencil mark is made on the form


to indicate the correct choice e.g. a multiple
choice exam paper or on the National Lottery
ticket selection form.

The completed forms are scanned by an Optical


Mark Reader (OMR) which detects the presence of a mark by measuring the
reflected light. Less light is reflected where a mark has been made.

The OMR then interprets the pattern of marks into a data record and sends
this to the computer for storage, analysis and reporting.

This provides a very fast and accurate method of inputting large amounts of
data, provided the marks have been made accurately and clearly.

2.3.1.4 Optical Character Recognition (OCR)

Optical Character Recognition (OCR) enables the computer to identify written


or printed characters.

An OCR system consists of a normal scanner and some special software. The
scanner is used to scan the text from a document
into the computer. The software then examines the
page and extracts the text from it, storing it in
a form that can be edited or processed by normal word processing software.

The ability to scan the characters accurately depends on how clear the writing
is. Scanners have been improved to be able to read different styles and sizes
of text as well as neat handwriting. Although they are often up to 95%
accurate, any text scanned with OCR needs careful checking because some
letters can be misread.

OCR is also used to automatically recognise postcodes on letters at sorting


offices.

2.3.1.5 Speech Recognition

The user talks into a microphone. The computer 'listens' to the speaker, then
translates that information to written words and phrases. It then displays the
text on to the monitor.

This process happens immediately, so as you


say the words, they appear on the screen. The
software often needs some "training" in order for
it to get used to your voice, but after that it is
simple to use.

2.3.2 Data Capture Forms

Although there are many methods of capturing data automatically, many


businesses prefer to capture it manually.

2.3.2.1 Paper-based data capture forms

This is the most commonly used method of collecting


or capturing data.
People are given a form to fill in with their personal details, e.g. name,
address, telephone number, date of birth etc.

Once the form is completed, it is given to a member of staff who will enter
the data from it, into a database or information system.

2.3.2.2 Computerised data entry forms

A member of staff could type the information directly into a computerised data
entry form whilst the customer is with them. They ask the question in the order
it appears on the form and enter the answer using a keyboard.

More commonly though, the details will be typed in by copying


what was written on the paper-based data capture form. When
this method is used, it is important that the fields on both forms are laid out in
the same order to speed up the process of entering the data.

2.3.3 Designing Data Capture Form

A data capture form looks simple enough to design, don't you just type out a
few questions, put a couple of boxes for customers to fill in their information
and then print it out? No, it's not as simple as that. If you want to collect good
quality data, you need to think carefully about the design of the form.

All forms should have the name of the organisation at the top.
They should also have an explanation to tell the customer what the form is for,
in this case 'membership application form', or 'data collection form', or
'customer details form' or something similar.

Lastly, they should give the customer instructions to tell them what they
should do with the form once they have completed it. Here it tells the person
filling the form in, to send it back to the address given.

Where possible, it is a good idea to try to limit the options that people can
enter. If you can manage to do this, then you can set up your computerised
system with a drop down box that gives all of the options on the form - making
it faster for staff to enter the data.

For Example: The first form shown above, limits the choice of title to 'Mr' or
'Miss'. This is sufficient in this case because it is an application form for a
childrens' youth club, so it is unlikely that there will be any 'Mrs' or 'Dr' or
'Reverend'

The second form gives people the different options for travel, they have to tick
one of the options since there isn't any room for them to write something
different. The same method has been used for types of lunches.

2.4. Verification

It was mentioned that validation cannot make sure that data you enter is
correct, it can only check that it is sensible, reasonable and allowable.
However, it is important that the data in your database is as accurate as
possible. Have you ever heard of the term 'Garbage in, garbage out' or
'GIGO'? This means that if you enter data that is full of mistakes (garbage in)
then when you want to search for a record you will get data with mistakes
presented to you (garbage out).

This is where Verification can help to make sure that the data in your
database contains as few mistakes as possible.

Verification means to check something twice.

Think about when you choose a new password, you have to type it in twice.
This lets the computer check if you have typed it exactly the same both times
and not made a mistake.

The data in your database can be verified or checked twice.

This can be done in different ways:

Somebody else can check the data on the screen for you against the original
paper documents

You could print out your table and check it against the original paper
documents

You could type in the data twice (like you do with your password), and get the
computer to check that both sets of data are identical.
Other methods of verification include control, batch or hash totals. To find out
more about these, visit the mini-website on Validation and Verification.

2.5. Editing and Checking

As well as choosing the correct data types to try to reduce the number of
errors made when entering data into the database, there is another method
that can be used when setting up the table. This is called 'Validation'.

It is very important to remember that Validation cannot stop the wrong data
being entered, you can still enter 'Smiht' instead of 'Smith' or 'Brown' instead
of 'Green' or '78' instead of '87'.

What Validation can do, is to check that the data is sensible, reasonable and
allowable.

This page will not go into any great depth about different methods of
validation as there is a whole mini-website on Validation alone. Go and have a
look at it to find out more details about the best kind of Validation to use and
the reasons why.

Some of the types of Validation that you could set up for your database are:

Example
Validation

Type Check

If the datatype number has been


2, 3, 4
chosen, then only that type of data will
be allowed to be entered i.e. numbers Mr, Mrs, Miss, Ms

If a field is only to accept certain choices


Brown, Green, Blue, Yellow, Red
e.g. title might be restricted to 'Mr', 'Mrs',
'Miss' and 'Ms', then 'Dr' wouldn't be
allowed.
Range Check
A shop may only sell items between
the price of £10.00 and £50.00. To
stop mistakes being made, a range
check can be set up to stop £500.00
being entered by accident.

A social club may not want people


>=10 AND <=50
below the age of 18 to be able to join.

Notice the use of maths symbols:

>=18
> 'greater than'

< 'less than'

= equals

Presence check

There might be an important piece of


data that you want to make sure is
always stored. For example, a school
will always want to know an
emergency contact number, a video School database: Emergency contact
rental store might always want to know number
a customer's address, a wedding
dress shop might always want a DVLA database: Date test passed
record of the brides wedding date.
Electoral database: Date of birth
A presence check makes sure that a
Vet's database: Type of pet
critcal field cannot be left blank, it must
be filled in.
Picture or format check always has a letter, letter, number,
number, number, letter and letter e.g.
Some things are always entered in the CV43 9PB. There may be the odd
same format. Think about postcode, it occasion where it differs slightly e.g.
a
Birmingham postcode B19 8WR, but
the letters and numbers are still in the
same order.

A picture or format check can be set up


to make sure that you can only put
letters where letters should be and
Postcode: CV43 9PB
numbers where numbers should be.
Telephone number (01926) 615432

2.6 Summary

• A computer file is a piece of arbitrary information, or resource for


storing information, that is available to a computer program and is
usually based on some kind of durable storage.
• Operations on a file includes Opening a file to use its contents,
reading or updating the contents, Committing updated contents to
durable storage and Closing the file, thereby losing access until it is
opened again .
• The main drawback to inserting, deleting or amending records
in an ordered Sequential file is that the entire file must be read and
then the records written to a new file.
• Direct access files allow direct access to a particular record in
the file using a key and this greatly facilitates the operations of reading,
deleting, updating and inserting records.
• An Indexed file may have multiple keys.
• In modern computer systems, files are typically accessed using
names.
• When computer files contain information that is extremely
important, a back-up process is used to protect against disasters that
might destroy the files.
• A member of staff could type the information directly into a
computerized data entry form whilst the customer is with them.
• It was mentioned that validation cannot make sure that data you
enter is correct, it can only check that it is sensible, reasonable and
allowable.
• Indexed files may have up to 255 keys, the keys can be
alphanumeric and only the primary key must be unique.

2.7 Key words

• File - A file is durable in the sense that it remains available for


programs to use after the current program has finished.
• COBOL supports two kinds of direct access file organizations
-Relative and Indexed.
• Record - A record is all of the data or information about one
person or one thing.
• Field - A record is made up of lots of individual pieces of
information.
Look at Wonder Woman's record; it stores her first name, last name,
address, city and age.
• OMR - An Optical Mark Reader is a scanning device that reads
carefully placed pencil marks on a specially designed form or
document.
• OCR - Optical Character Recognition (OCR) enables the
computer to identify written or printed characters.

2.8 Self Assessment Questions (SAQ)

• Define the term File. Explain the different types of operations


that can be perform on files with the help of suitable examples.
• Explain the architecture of file organization.
• What are different types of files? Explain insertion, modification
and deletion operation in context with these files types.
• What do you mean by field, record and table? Explain with the
help of suitable examples.
• Define the term Data Capturing. Explain different data capturing
techniques.
• Explain what is meant by the term back – up? Why it is
important to keep the back up copy away from the computer system?
• When the contents of a file are changed, a transaction log is
often kept.
Explain briefly the reason for the transaction log.
• Explain how the transaction file and the master file are used to
produce a new updated master file?
• Validation and Verification help to reduce the errors when
inputting data. Justify the statement.
• Explain the difference between validation and verification. Give
the names of three validations checks that can be used.

2.9 References/Suggested Readings

• Computer Fundamental, P.K. Sinha, BPB Publications – 2004


• Sams Teach Yourself COBOL in 24 Hours, Hubbell, Sams, Dec
1998
• Structured COBOL Methods, Noll P, Murach, Sep 1998
• ICT for you, Stephon Doyle, Nelson Thornes, 2003
• Information and Communication Technology, Denise Walmsley,
Hodder Murray 2004
• Information Technology, P Evans, BPB Publications, 2000
Author’s Name: Sh. Varun Kumar
Vetter’s Name: Prof. Dharminder Kumar

LESSON 3
DATA STORAGE
3.0 Objectives
At the conclusion of this lesson you should be able to know:
• Data Storage
• Storage Capacity
• Storage Devices
• Manual file System
• Types of Files
• File Recovery Procedure
• File Backup

3.1. Introduction
Unless you want to lose all of the work you have done on your computer, you
must have some means of storing the information.
There are various storage devices that will that do this for you. Some of the
most common ones that you are likely to have come across are:
• hard disks,
• floppy disks,
• CD-ROMs
• DVDs.

3.1.1. Storage Capacity

Storage capacity is measured in bytes. One


byte contains 8 bits (Binary Digits) which is the
smallest unit of data that can be stored.
A bit is represented as a 1 or 0 - binary numbers.
A single byte (Binary term) equals a keyboard letter, number or symbol. If you
think of all of the files that you have saved on your computer and how many
characters (letters) you have written, you will need millions of bytes of storage
data to keep your work safe.
We normally refer to the storage capacity of a computer in terms of Kilobytes
(kB), Megabytes (MB) and Gigabytes (GB) - (or even Terabytes on very large
systems!).

Quantity Information
Bit Smallest unit of data, either a 0 or 1

Byte 8 bits.
This is the lowest 'data' level and is a series of 0s and 1s, e.g.
00111010 = 1 byte with each 0 or 1 equal to 1 bit.
Each keyboard character = 1 byte
Kilobyte (kB) 1000 keyboard characters = 1000 bytes or 1 KB (kilobyte).
In reality it is really 1024 bytes which make a kilobyte, but
generally people refer to 1000 bytes as a kb.
Megabyte 1000 kilobytes = 1 MB (1 million keyboard characters).
(MB) Floppy disks have a capacity of 1.44 MB
CD ROM disks have a capacity of 650 MB.
Gigabyte 1000 megabytes = 1 GB (gigabytes or 1 billion characters).
(GB) Single sided DVD disks can typically hold 4.7Gb of data
Terabyte (TB) Equal to 1,099,000,000,000 bytes or 240

3.1.2. Read Only Memory (ROM)

Data stored in Read Only Memory (ROM) is not erased when the power is
switched off - it is permanent. This type of memory is also called 'non volatile
memory'.
A Motherboard within a PC may contain a ROM chip. This chip contains the
instructions required to start up the computer. Another name for this software
is the BIOS.
Whenever some data needs to be stored on a permanent basis, a ROM is the
best solution. For example, many car computers will contain ROM chips that
store the basic information required to run the car engine.

3.1.3 Random Access Memory (RAM)

In contrast to ROM, Random Access Memory is volatile memory. The data is


held on a chip, but only temporarily. The data disappears when the power is
switched off.
Have you ever forgotten to save your work before the
computer crashed? When you log back on, your work has
disappeared. This is because it was stored in RAM and
was erased when the PC switched off. However, if you had
saved your work from RAM to the hard disk, it would have
been safe!
A part of the RAM is allocated for the 'clipboard'. This is the
area that stores the information when you CUT, COPY and
PASTE from within programs such as Microsoft Word and
Excel.
As computer programs and operating systems have become more complex,
the size of RAM has increased. Today most computers are sold with either
256MB or 512 MB of RAM.
3.1.4 HARD DISK

The hard disk drive is the storage device, rather like a filing cabinet, where
all the applications software and data is kept. Data stored on
a hard disk can be accessed much more quickly than data
stored on a floppy disk.
A Hard disk spins around thousands of times per minute
inside its metal casing, which is why it makes that whirring noise. Less than a
hairs breadth above the disk, a magnetic read and write head creates the 1
and 0s on to the circular tracks beneath.

Most hard drives are installed out of the way inside the computer, however
you can also purchase external drives that plug into the machine.
Modern Hard drives are measured in gigabytes (GB). A typical hard disk drive
may be 120 Gbytes. Some computers use two hard disks, with one hard disk
automatically making a backup copy of the other - another name for this is
disk mirroring.
Hard disk drives can turn up in some surprising places, for example:-
iPods (not the Nano) have a hard dirve to store the music.
Some Game machines have them installed to allow games to be stored.
They appear inside some "Personal Video Recorders" (PVR) to act just like a
video recorder - the programs can then be burned on DVD for permanent
storrage if needed.

Advantages :
• Necessary to support the way your computer works
• Large storage capacity
• Stores and retrieves data much faster than a floppy disk or CD-
ROM
• Stored items not lost when you switch off the computer
• Usually fixed inside the computer so don't get lost or damaged
• Cheap on a cost per megabyte compared to other storage
media.

Disadvantages:
• Far slower to access data than the ROM or RAM chips
because the read-write heads have to move to the correct part of
the disk first.
• Hard disks can crash which stops the computer from working
• Regular crashes can damage the surface of the disk, leading to
loss of data in that sector.
• The disk is fixed inside the computer and cannot easily be
transferred to another computer.

The hard disk shown below has a SCSI 'interface' which is one kind of
standard connection method. Other connection methods are "IDE" and
"SATA" interfaces. Each kind of interface has a different type of socket so they
cannot get mixed up accidentally.
3.1.5 Floppy Disk

Floppy disks are one of the oldest type of portable storage devices still in use,
having been around since about 1980. They have lasted, whilst so many other
ideas have disappeared because they are so handy to use. (See "Floppy
History" term in the box opposite for more information).
The floppy disk drive enables you to transfer small files between computers
and also to make backup copies to protect against lost work.
A floppy disk is made of a flexible substance called Mylar.
They have a magnetic surface which allows the recording of
data. Early floppy disks were indeed 'floppy', but the ones we
use now (3 1/2 inch) are protected by a hard plastic cover.
The disk turns in the drive allowing the read/write head to access the disk.
A standard floppy disk can store up to 1.44 Mb of data which is approximately
equivalent to 300 pages of A4 text. However, graphic images are often very
large, so you may well find that if you have used Word Art or a large picture,
your work will not fit onto a floppy disk.
All disks must be formatted before data can be written to the disk. Formatting
divides the disk up into sections or sectors onto which data files are stored.
Floppy disks are often sold pre-formatted.
Care should be taken when handling disks, to protect the data. The surface of
the disk should not be touched and they should be kept away from extreme
temperatures and strong magnetic fields such as may appear close to audio
speakers - otherwise you might find all your data has been wiped!

Advantages:
• Portable - small and lightweight
• Can provide a valuable means of backing up data
• Inexpensive
• Useful for transferring files between computers or home and
school.
• Private data can be stored securely on a floppy disk so that
other users on a network cannot gain access to it.
• Security tab to stop data being written over.
• Most computers have a floppy drive (although now they appear
less)
• Can be written to many times.
Disadvantages:
• Not very strong - easy to damage
• Data can be erased if the disk comes into contact with a
magnetic field
• Quite slow to access and retrieve data.
• Can transport viruses from one machine to another
• Small storage capacity, especially if graphics need to be saved
• New computers are starting to be made without floppy drives
3.1.6 ZIP DRIVE
The Zip drive is similar to a floppy drive but can store 100 MB of data, at least
70 times more than a floppy. Some zip disks store as much as 250 MB.
The Zip disk is slightly thicker than a floppy
disk and needs a separate drive. Zip disks
are particularly useful for backing up
important data or for moving data easily
from one computer to another. Data is
compressed, thereby reducing the size of
files that are too large to fit onto a floppy
disk.
Advantage:
• Stores more than a floppy disk
• Portable
Disadvantage:
• More expensive than floppies
• Drives to read the disks are not that common

3.1.7 Magnetic Tape

The amount of work you do on your computer at home can easily be backed
up onto floppy disks or DVD for safety. However, many organisations need to
back up large volumes of data and floppy
disks or DVD are not the best method for doing this.
In some case, Terabytes of data may need to be stored safely at
low cost.
Examples of organizations that would hold this much information:-
Satellite imaging firms holding huge backlog of images
Movie companies holding their digitized films in archive
Architect, car and design firms holding thousands of CAD drawings.
Science organizations such as CERN holding the results of past experiments
Weather organizations.
So they tend to make their back up copies onto magnetic tape.
Magnetic tape comes in two forms:
• tape reels - these are fairly large and are usually used to back
up data from mainframe computers.
• cassettes or cartridges - these are fairly small in size but able to
hold enough data to back up the data held on a personal computer or a
small network.
Because it takes a long time to back up onto magnetic
tape, it may be done at night or over a weekend when
the computer network is not so busy.
The main advantage of using magnetic tape as backing
storage is that it is relatively cheap and can store large amounts of data.

3.2. Manual Filing System

We are all use to dealing with some sort of manual information system. In
manual information system some of the data is the same on each file. This is
called data duplication and is one of the main problem with manual filing
system. Data duplication means that more space is taken up by the files and
more work in needed to retrieve the information. The main problems arise in
the following situations are
We may need to obtain information that is held on several files.
As the data is not shared, a change in information would cause many files to
need updating.
It is time consuming and wasteful.

To overcome these anomalies, computerized systems are used. The main


advantages of computerized system are as follows:
• The information is stored only once.
• Files can be linked together.
• Access to the information is rapid and there are less chances of
the data becoming lost.

In Computerized systems, we can create data files, alter the data in these files
and extract the data from the files.
3.3. Types of files

There are mainly four types of files:

1. Master File

A Master file is a most important file as it is the most complete and up to date
version of a file. If a master file is lost or damaged and it is the only copy, the
whole system will break down.

2. Transaction file

Transaction files are used to hold temporary data which is used to update the
master file. A transaction is a piece of business, hence the name given as
transaction file. Transactions can occur in any order, so it is necessary to sort
a transaction file into the same order as the master file before it is used to
update the master file.

3. Backup or Security file

Backup copies of files are kept in case the original is damaged or lost and
cannot be used. Because of the importance of the master file, backup copies
of it should be taken at regular intervals in case it is stolen, lost, damaged or
corrupted. If the storage capacity of your disk is not enough you should
always keep backup copies of all important data.

4. Transaction Log File

Transactions are bits of business such as placing an order, updating the


stock, making a payment etc. If these transactions are performed in real time
the data input will over write the previous data. This make it impossible to
check past data and so would make it easy for people to commit fraud. A
record of transaction is kept in the form of transaction log file which shows all
the transactions made over a certain period. Using the log you can see what
the data was before the changes were made and also what the changes were
and who made it. Transaction log files therefore maintain security and can
also be used to recover to transactions lost due to hardware failures.
In practice companies will keep several generations of files. This is because
there may be a problem (eg disk crash) and the update runs may have to be
done again to re-create the current master file.

3.4. File Recovery Procedure

There is always a slight chance that data contain on a master file may be
destroyed. It could be destroyed by an inexperienced user, a power failure or
even theft. For a large company, the lost of vital data could prove disastrous.
But by creating the different generations of files it is possible to recreate the
master file if it is lost.

The three generation of files are


• Oldest Master File called grand father file
• New Master File called father file
• And the most up to date Transaction file is called the son file.

When a transaction file is used to update a master file, the process creates a
new master file.

Sometimes the old master file is referred to as the father file and the new
master file as the son file.
When the update is next run...
the son file becomes the father file
the father file becomes the grandfather file
..etc...
3.4.1 Backups

In the field of information technology, backup refers to the copying of data so


that these additional copies may be restored after a data loss event. Backups
are useful primarily for two purposes: to restore a computer to an operational
state following a disaster (called disaster recovery) and to restore small
numbers of files after they have been accidentally deleted or corrupted.
Backups differ from archives in the sense that archives are the primary copy
of data and backups are a secondary copy of data. Backup systems differ
from fault-tolerant systems in the sense that backup systems assume that a
fault will cause a data loss event and fault-tolerant systems assume a fault will
not. Backups are typically that last line of defense against data loss, and
consequently the least granular and the least convenient to use.
Since a backup system contains at least one copy of all data worth saving, the
data storage requirements are considerable. Organizing this storage space
and managing the backup process is a complicated undertaking.
Back up Media
Storage media
Regardless of the repository model that is used, the data has to be stored on
some data storage medium somewhere.
3.4.1.1 Magnetic tape
Magnetic tape has long been the most commonly used medium for bulk data
storage, backup, archiving, and interchange. Tape has typically had an order
of magnitude better capacity/price ratio when compared to hard disk, but
recently the ratios for tape and hard disk have become a lot closer. There are
myriad formats, many of which are proprietary or specific to certain markets
like mainframes or a particular brand of personal computers. Tape is a
sequential access medium, so even though access times may be poor, the
rate of continuously writing or reading data can actually be very fast. Some
new tape drives are even faster than modern hard disks.
3.4.1.2 Hard disk
The capacity/price ratio of hard disk has been rapidly improving for many
years. This is making it more competitive with magnetic tape as a bulk storage
medium. The main advantages of hard disk storage are the high capacity and
low access times.

3.4.1.3 Optical disk

A CD-R can be used as a backup device. One advantage of CDs is that they
can
hold 650 MiB of data on a 12 cm (4.75") reflective optical disc. (This is
equivalent to 12,000 images or 200,000 pages of text.) They can also be
restored on any machine with a CD-ROM drive. Another common format is
DVD+R. Many optical disk formats are WORM type, which makes them useful
for archival purposes since the data can't be changed.
3.4.1.4 Floppy disk

During the 1980s and early 1990s, many personal/home computer users
associated backup mostly with copying floppy disks. The low data capacity of
a floppy disk makes it an unpopular choice in 2006.
Solid state storage
Also known as flash memory, thumb drives, USB keys, compact flash, smart
media, memory stick, Secure Digital cards, etc., these devices are relatively
costly for their low capacity, but offer excellent portability and ease-of-use.
Remote backup service
As broadband internet access becomes more widespread, remote backup
services are gaining in popularity. Backing up via the internet to a remote
location can protect against some worse case scenarios, such as someone's
house burning down, destroying any backups along with everything else. A
drawback to remote backup is the internet connection is usually substantially
slower than the speed of local data storage devices, so this can be a problem
for people with large amounts of data. It also has the risk of potentially losing
control over personal or sensitive data.
Approaches to backing up files
Deciding what to backup at any given time is a harder process than it seems.
By backing up too much redundant data, the data repository will fill up too
quickly. If we don't backup enough data, critical information can get lost. The
key concept is to only backup files that have changed.

3.4.2 Copying files


Just copy the files in question somewhere.

3.4.3 File System dump


Copy the file system that holds the files in question somewhere. This usually
involves un-mounting the file system and running a program like dump. This is
also known as a raw partition backup. This type of backup has the possibility
of running faster than a backup that simply copies files. A feature of some
dump software is the ability to restore specific files from the dump image.
Identification of changes
Some file systems have an archive bit for each file that says it was recently
changed. Some backup software looks at the date of the file and compares it
with the last backup, to determine whether the file was changed.

3.4.4 Block Level Incremental

A more sophisticated method of backing up changes to files is to only backup


the blocks within the file that changed. This requires a higher level of
integration between the file system and the backup software.
3.4.5 Versioning file system

A versioning file system keeps track of all changes to a file and makes those
changes accessible to the user. This is a form of backup that is integrated into
the computing environment.

3.4.6 Backing up on-line databases

An on-line database is constantly being updated. To make sure no data is lost


in the event of hardware failure, special back-up methods are used.
Transaction logging and RAID (Redundant Array of Inexpensive Disks) are
two commonly used methods.

Transaction logging involves storing the details of each update in a


transaction log file. A “before“ and “after” image of each updated record is also
saved. If any part of the database is destroyed an up-to- date copy can be
recreated by a utility program using the transaction log file and the before and
the after image of updated records.
RAID involves keeping several copies of a database on different disks at the
same time. Whenever a record is updated the same changes are made to
each copy of the database. This is so that if one disk falls the data will still be
safe on the others.

3.4.7 Advice

The more important the data that are stored in the computer the greater is the
need for backing up these data.
A backup is only as useful as its associated restore strategy.
Storing the copy near the original is unwise, since many disasters such as
fire, flood and electrical surges are likely to cause damage to the backup at
the same time.
Automated backup should be considered, as manual backups are affected by
human error.
3.4.8 Rules for Backing up

a) Never keep back-up disks near the computer.


b) If you hold a lot of data which would be very expensive to recreate then you
invest in a file proof safe to protect your back-ups against thief and fire.
c) Keep at least one set of back-ups disks in a different place.

3.5 Summary
• Storage capacity is measured in bytes.
• We normally refer to the storage capacity of a computer in terms
of
Kilobytes (KB), Megabytes (MB) and Gigabytes (GB) - (or even
Terabytes on very large systems!).
• A Hard disk spins around thousands of times per minute inside
its metal casing, which is why it makes that whirring noise.
• Floppy disks are one of the oldest types of portable storage
devices still in use, having been around since about 1980.
• A Master file is a most important file as it is the most complete
and up to date version of a file. If a master file is lost or damaged and it
is the only copy, the whole system will break down.
• A transaction is a piece of business, hence the name given as
transaction file.
• When a transaction file is used to update a master file, the
process creates a new master file.
• A more sophisticated method of backing up changes to files is to
only backup the blocks within the file that changed.
• A versioning file system keeps track of all changes to a file and
makes those changes accessible to the user.
• The amount of work you do on your computer at home can
easily be backed up onto floppy disks or DVD for safety.

3.6 Key words


• Transaction File - Transaction files are used to hold temporary
data which is used to update the master file.
• Back-up - In the field of information technology, backup refers to
the copying of data so that these additional copies may be restored
after a data loss event.
• Transaction logging - involves storing the details of each update
in a transaction log file.
• RAID - involves keeping several copies of a database on
different disks at the same time.
• If a master file is lost or damaged and it is the only copy, the
whole system will break down.
3.7 Self Assessment Questions (SAQ)
• What do you mean by Storage Capacity? How we measure the
storage capacity of a computer system?
• List down the differences between:
o RAM and ROM
o Mega Byte and Giga Byte
• Explain what is meant by the term storage device? Give three
examples of storage devices. Also give possible advantages and
disadvantage of the same.
• Explain different types of files with the help of suitable examples.
• Explain what is meant by the term File Generations? Explain
with the help of suitable example.
• List down some important rules for backing up files.
• Explain the process of taking backup of an online data base.

S-ar putea să vă placă și