Documente Academic
Documente Profesional
Documente Cultură
& BIOINFORMATICS
BSCBT-604
This SIM has been prepared exclusively under the guidance of Punjab Technical University (PTU) and
reviewed by experts and approved by the concerned statutory Board of Studies (BOS). It conforms to the
syllabi and contents as approved by the BOS of PTU.
MS-Office.
Mapping in Book
Unit 1: Overview of
Computers
(Page 1-20)
Unit 2: Introduction of
MS Word
(Page 21-52)
Unit 3: Operating Systems
(Page 53-73)
Unit 4: Bioinformatics
Internet Applications
(Page 75-97)
Contents
UNIT 1
OVERVIEW OF COMPUTERS
Introduction
Computer System
Basic Ideas and Terms
Components of a Computer System
Basic Architecture
Computer Organisation
Data Representation
Performance Factors
Summary
Keywords
Review Questions
Further Readings
UNIT 2
INTRODUCTION OF MS WORD
21
Introduction
New in Microsoft Word 2000
Starting of MS Word
Components of the MS Word Window
Start Working with Word Document
File-related Operations
Opening a New File
Save a Document
Reversing and Reapplying Commands
Summary
Keywords
Review Questions
Further Readings
UNIT 3
OPERATING SYSTEMS
Introduction
Computer Hardware and OS Interaction
Single User Single Processing System
Multiprogramming Operating System
Architecture and Design of OS
Interface Design and Implementation
Window Systems based on PCs
53
Summary
Keywords
Review Questions
Further Readings
UNIT 4
75
Unit 1 Overview of
Computers
Overview of Computers
Notes
Unit Structure
z
Introduction
Computer System
Basic Architecture
Computer Organisation
Data Representation
Performance Factors
Summary
Keywords
Review Questions
Further Readings
Learning Objectives
At the conclusion of this unit, you will be able to:
z
Introduction
When we speak about computers, what exactly are we referring to? Many of us tend
to define a computer as a computing machine, as in calculator. However, this
definition strips the computer of 95% of its total capabilities. In laymans terms, a
computer can be defined as a machine that is used to generate some kind of
information from the data that is fed into the computer. The application areas of
computers are unlimited. We find a computer in every aspect of our life. From a
simple operation as playing a video game to more complicated applications as
weather forecasting computers are found everywhere. Let us take a simple example of
a person who needs to purchase a can of juice from a super market.
He walks inside the super market, picks up a can of juice and proceeds to the cash
counter. The counter person scans the code that is present in the label to generate a
bill. This scanning of the code is computerised. The man pays his bills with his credit
card and walks off the super market. He just used a computer, which will transfer the
cost of the can of juice from his bank account to the super market. The man then
moves across the street and enters the office of his travel agent. He tells the agent that
he plans to take a vacation and inquires about the places that he can possibly go. The
agent turns to his computer, presses a couple of keys and gets the list of the
prospective places immediately. The agent just used a database application of the
computer. The man selects a place and confirms his travel. The agent again turns to
his computer and moments later hands him the air tickets to that place. The agent
actually connected to a computer that did the reservation. The man then happily
Punjab Technical University 1
comes to his office and decides to inform his wife about the vacation. He, therefore,
sends e-mail to his wife. The man used a network application of the computer.
Numerous examples of such kind can be sited. With the advent of technology, newer
and newer application domains of the computer are created everyday. It is just a
matter of time when life without computers cannot be imagined.
Computer System
As a computer can simply be defined as a machine that is used to generate some kind of
information from the data that is fed into the computer. The question that arises at this
point is that how does the computer actually generate the output? There are certain
components of a computer which does the job. At this stage, we can define a computer
as a device, which manipulates the data fed in to generate a desirable output according
to a set of instructions given by the user. This definition clearly demarcates the
difference between a computer and a calculator. In simple terms, a calculator can be
said to be a subset of a computer. Figure 1.1 shows a computer along with some
peripheral devices.
Types of Computer
Computer can be categorized based on their size and design. Modern computers can
vary in size ranging from the one that fills the entire room to a size that is small enough
to fit the nail of your thumb with room to spare. It is a general tendency that the larger
the system, the greater is the processing speed, storage, cost and the ability to handle
number of peripheral devices. The difference in size also varies the number of users that
can work on the system simultaneously. At the lowest end of the size scale is the
microcomputer. They are small devices that can be used to perform dedicated tasks like
scanning the code of a can of juice. The more familiar personal computer is a kind of a
microcomputer. A typical microcomputer is shown in Figure 1.2
Next in the line is the minicomputer as shown in Figure 1.3. These are also small
general-purpose computers having the capability to serve a number of users
simultaneously. They are generally more powerful and expensive than the
microcomputers. In size, they range from desktops to a size of a small file cabinet.
Overview of Computers
Notes
2.
The type of the data contained in the memory is of no regard and it can be
addressed by location.
3.
Generation of Computers
Computer production started in the 1940s when the first electronic computer was
created. Since then, the improvements and enhancements in the field of electronics
had considerable influence on the design of computers thus leading to what is known
today as Generation of Computers.
First Generation
Dr. John Vincent Atanasoff and Clifford Berry created the first electronic computer.
They called it the Atanasoff-Berry Computer or ABC. The ABC used vacuum tubes
for storage and arithmetic and logical functions. This work was noticed by John W.
Mauchly, who in 1940-41 teamed up with J. Presper Eckert Jr., and organised the
construction of the ENIAC. The ENIAC, shown in Figure 1.6, was the first general
purpose computer to be put fully in operation.
4 Self-Instructional Material
Overview of Computers
Notes
Second Generation
The main disadvantage of the first generation computers was the fact that the vacuum
tubes, owing to their short life, had to be replaced frequently and they generated a lot
of heat. These computers took up a lot of space and programming them was a tedious
task because programs had to be written in machine language. In 1950s, these
disadvantages led to the creation of computers, which were much smaller and faster.
In addition to this, programming in these computers was easy because they
understood high-level programming languages. These languages were more English
like and easy to understand. The computers in this generation used solid state
components such as the transistors developed by the Bell Laboratories. Some
computers of this generation are LEO mark III, ATLAS and the IBM 7000 series,
shown in Figure 1.9.
Third Generation
The second-generation computers were well suited to do either scientific or
non-scientific applications but not both. Thus, in 1964, IBM announced the System 360
family of mainframes, where each processor had a set of large built-in instructions.
Punjab Technical University 5
Some of these instructions could be used effectively for scientific calculation while the
others were more suited for record-keeping applications. The computers in this
generation used the technology of Integrated Circuits (IC). Since the ICs were small in
size, there was a further reduction in size of these computers. A typical IC is shown in
Figure 1.10.
Fourth Generation
As the technology advanced, the size of the ICs reduced and more and more
components could be packed into smaller chips. They were called Large Scale
Integration (LSI) and Very Large Scale Integration (VLSI) chips, as in Figure 1.11.
Since the computers in this generation used these chips, their size was greatly
reduced. The speed at which they operated increased and they cost decreased. The
computers of today are said to be of the fourth generation.
Fifth Generation
It is predicted that by early 21st century, computers will be able to behave like human
making interaction more human like. They would be able to think and act on their
own. This situation is very well depicted in the motion picture Terminator II where
the computers acts on their own based on their own judgment.
Program
We know that the computer is a digital device. It is thus capable to understand the
digital signals. These signals are generated based on certain instructions that the user
feeds into the computer. A program can be termed as the collection of such
instructions.
6 Self-Instructional Material
Overview of Computers
Notes
Add A to B
Assign value to C
In the steps given above, Input asks the user to enter the value of A and B. The
computer then adds the value of A to the value of B and assigns the result to C. These
four instructions are collectively called a program.
Information
Information can be termed as a more useful and intelligible form of data. A program
operates on the data in a specified format and transforms it into information. For
example the bill that is produced after the user feeds in the data is the information.
Hardware
Hardware is the term used to define all the electronic and mechanical components
found inside a computer system. These components are activated as required to
execute the program. For example the counter person scans the code of the
commodity and gets a printout of the bill. The scanner and the printer are two of the
many hardware components that are used in the process.
Software
Suppose, the counter person, to generate the bill, is using a set of programs. These
programs are in turn executed by a set of underlying programs. These sets of
programs are called software. Software can be broadly divided into two categories:
application software and system software. Application software is the one that is
created to cater to a specific task, for instance, generation of bills. System software is
that which runs the application software. It also provides an interface between the
application software and the hardware thus enabling the application software to
access the hardware units. It can therefore be said that system software runs the
application software along with the hardware. The operating system is a very good
example of system software.
Example: Consider a situation where a man needs to purchase three cans of mango
juice. The man goes into a super market, picks up three cans of juice and brings them to
the counter where the counter person scans the code given in the cans. After scanning,
let us assume that the computer produces a result, which says D20567, J002A, 3,
$5.00,$15.00, $15.00, 11/11/05. The counter person may find it difficult to understand
(unless he is used to it). This is called data. Now if the counter person uses a program
that will take this data as input and generate a bill in the following format, it is said that
the data is transformed into information. The entire process is explained in Figure 1.12.
Bill Number: D20567
Date of purchase: 11th Nov 2005
S. No.
Item Code
Item Name
1.
J002A
Daily Mango
Juice Can
Total
Quantity
3.00
Rate
$5.00
Amount
$15.00
$ 15.00
Input
2.
Storage
3.
Processing
4.
Output.
An input component in a computer system is concerned with the data that is fed in.
With the help of these components, data is actually entered into the computer. Several
input devices can be used to enter data. These devices use a variety of technologies
ranging from a key pressed to voice. The input components have evolved from
switches that were used in the first generation computers to the more modern and
sophisticated voice recognition system. The keyboard and the mouse are good
examples of input components. Figure 1.13 shows a few input devices.
8 Self-Instructional Material
During the process of processing data, the data needs to be stored in the computer. It
is also necessary to store the programs that will manipulate the data in order to
generate the output, which will also be stored. The storage components deal with the
storage of programs and data. Storage is of two kinds: primary and secondary.
Primary storage components are volatile. That is, they can store data as long as they
are supplied with electricity. They are used for temporary storage of data. Secondary
storage components, on the other hand, can store data permanently. Two typical
storage components are shown in Figure 1.14.
Overview of Computers
Notes
Basic Architecture
The basic architecture of a computer system is explained with the help of Figure 1.17.
Notes
Computer Organisation
A computer must have a system to get information from the outside world and must
be able to the communicate results to the external world. Programs and data should
be entered into computer memory for processing and results obtained from
computations must be recorded or displayed for the user. The most familiar method
of entering information into a computer is using a typewriter like keyboard that
allows a person to enter alphanumeric information directly. Every time a key is
depressed, the terminal sends a binary coded character to the computer. When input
information is transferred to the processor via a keyboard, the processor will be idle
most of the time while waiting for the information to arrive. To use a computer
efficiently, a large amount of programs and data must be prepared in advance and
transmitted into a storage medium. The information in the disk is then transferred
into computer memory at a rapid rate. Results of programs are also transferred into a
high-speed storage, which can be transferred later to output device for results.
10 Self-Instructional Material
Peripheral Devices
Devices are said to be connected online that are under the direct control of the
computer. These devices are designed to read information into or out of the memory
unit when the CPU gives a command. Input or output devices connected to the
computer are also called peripherals. Among the most common peripherals are
keyboards, display units and printers. Peripherals that provide auxiliary storage for
the system are magnetic disks.
Overview of Computers
Notes
Other input and output devices are digital incremental plotters, optical and magnetic
character readers, analog-to-digital converters, etc. Not all input comes from people,
and not all output is intended for people. Computers are used to control various
processes in real time, such as machine tooling, assembly line procedures, and
chemical and industrial processes. For such applications, a method must be provided
for sensing status conditions in the process and sending control signals to the process
being controlled.
Input Devices
Input devices provide an interface between the users and the machine, for inputting
data and instruction. One of the most common examples is the keyboard. Data can be
input in many more forms audio, visual, graphical, etc.
Some common input devices are listed below:
1.
Keyboard
2.
Mouse
3.
4.
Joy stick
5.
Light pen
6.
Scanner
7.
The data in any form is first digitized, i.e. converted into binary form, by the input
device before being fed to the Central Processing Unit (CPU).
Output Devices
Like the Input devices, the Output devices also provide an interface between the user
and the machine. A common example is the visual display unit (monitor) of a
personal computer. The output unit receives the data from the CPU in the form of
binary bits. This is then converted into a desired form (graphical, audio, visual, etc.)
understandable by the user.
Some common output devices are:
(i) Visual Display Unit (Monitor)
(ii) Printers
(iii) Speakers
(iv) Secondary Storage Devices
The input and output unit collectively are referred to as peripherals.
Memory
Memory system is at the heart of a computer system. It is the memory system that
makes what a computer is. The input data, the instructions necessary to manipulate
the input data as also the output data are all stored in the memory.
Memory unit is an essential part of any digital computer because computer processes
data only if it is stored somewhere in its memory. For example, if computer has to
compute f(x)=sinx for a given value of x, then first of all x is stored in memory
somewhere, then a routine is called that contains program that calculates sine value of
a given x. It is an indispensable component of a computer.
Since these memory devices are generally silicon chips containing several thousands
of memory cells, it is not adequate to use a single memory. So there are types of
memories used within the same computer system.
Storage Technologies
Various storage technologies have been developed making use of bi-stable properties
of different objects. Most popularly employed technologies are briefed below:
Electrical Storage
These storage devices use the electronic charges (-ve and +ve) for the data and/or
instruction storage. RAM, ROM, PROM and a host of other fast primary memories
rely on this technology. Though they are fast, they have small capacity and high cost.
Magnetic Storage
Magnetic polarizations (North and South) of a magnetic substance are exploited for
data and/or instruction storage in these types of memories. Most of the large capacity
and relatively cheaper storage devices fall into this category such as floppy disk, hard
disk, etc. to name a few.
Optical Storage
Optical storage devices use the fact that a light source (usually a laser beam) can burn
holes on a disk that can be read back by reversing the source direction. CD-ROM
disks employ this technology for data storage. Capacity of such devices is very high
and the cost relatively very low.
Main Memory
Main memory is also known as primary memory. It is a faster memory. CPU directly
communicates with main memory. Main memory contains all the data that's currently
being processed by CPU. Its cost is higher than secondary memories because
production of high speed memory employs sophisticated designing techniques.
Secondary Memory
Apart from main memory there is secondary memory too, which works slower than the
main memory and is used to provide a backup. It is also called auxiliary memory. The
main memory gathers the data required currently for processing and CPU uses this data.
12 Self-Instructional Material
Cache Memory
This is the smallest and fastest memory component in memory hierarchy of a digital
computer. It increases the speed of processing. It is placed between the main memory
and CPU.
Overview of Computers
Notes
It stores the data in advance, for processing in CPU. This way it increases the inflow
of data to CPU, which is fast inherently.
2.
The basic difference between the two memories is that first is random in nature, that
is, the access of particular memory location doesn't depend upon the sequence, i.e.,
access time is small. But in sequential memory the access of a particular data depends
upon the location where it is stored.
For example, if a data is stored at XX40F then in sequential access the locations XX00
to XX40F all will be accessed, but in random access it takes same amount of time for
each access.
Types of RAM
RAMs are of two types:
z
Static RAM: Here refresh signal is not required. Data stored is lost as soon as
power is switched off.
Dynamic RAM: Here data stored may be lost even when power is on, so to
maintain data one has to give refreshing signals.
Data Representation
The computer that we know of is a digital device, which means that digital signals are
used for the functioning of the computer. One property of digital signals is that it has
discrete level of voltages as shown in Figure 1.18. This property of digital signals
proves to be very useful to represent data in the computer. Why is it so? The answer
to this question lies to the fact that there are many electrical and electronic devices
which can be in any one of their two possible states. For example, a simple switch can
be either on or off, a bulb or a Light Emitting Diode can be glowing or not glowing.
Therefore, representation of data becomes very easy with these devices. At this point,
it becomes necessary to understand how the computer represents data. As we will see
shortly, the computer uses two values to represent data. These two values are called
bits, which stand for Binary Digits. They take the values 0 and 1. John von Neumann
suggested this convention. Since there are only two values with which data can be
represented in a computer, at the lowest level, two discrete voltage values are used to
represent the bits. Thus, digital signals, with their property mentioned above, proved
to be ideal for computer systems.
2.
3.
Converts the result back to its corresponding decimal equivalent and outputs the
result.
When compared with decimal numbering system, the binary numbering system
differs in the number of digits used for a numeric value representation. The decimal
system uses ten digits, namely 0 to 9, whereas the binary system uses only two digits,
0 and 1. Table 1.1 below gives the decimal digits and its equivalent binary value.
14 Self-Instructional Material
Overview of Computers
Notes
In table 1.1, we have seen the binary equivalent of decimal numbers. Now the
question that arises is how to convert a decimal number into its binary equivalent and
vice versa? To make it more general, how to convert a number in a particular base
to its equivalent in another base. There are conversion techniques available to
accomplish this task. When dealing with binary numbers, two more terms need to be
understood. These are MSB (Most Significant Bit) and LSB (Least Significant Bit).
These two bits play a very important role in many other aspects of computing, such as
address calculation and bus optimization. So how do we define MSB and LSB? MSB
can be defined as the digit that occurs at the leftmost position in a binary number.
Similarly, LSB can be defined as the digit that occurs at the rightmost position in a
binary number. Figure 1.19 shows the MSB and LSB in a binary number.
Figure 1.19
Example: To convert 111002 to its equivalent octal code. The binary number 111002 can be
grouped in sets of three digits as 0112 and 1002. From table 1.2, binary code 001
corresponds to octal digit 3 and binary code 100 corresponds to octal digit 4.
Therefore, 111002 in octal becomes 348.
Example: To convert 11011102 to its equivalent octal code. The binary number 11011102
can be grouped in sets of three digits as 0012, 1012 and 1102. From table 1.2, binary
code 001 corresponds to octal digit 1; binary code 101 corresponds to octal digit 5 and
binary code 110 corresponds to octal digit 6.
Therefore, 11011102 in octal becomes 1568.
Similarly, the reverse of this method can be applied to obtain the binary code from an
octal number.
Example: To convert 5238 to its equivalent binary code. From table 1.2 we see that the
octal code 5 corresponds to 101 in binary, octal code 2 corresponds to 010 in binary
and octal code 3 corresponds to 011 in binary. Thus, 5238 in binary is 1010100112.
The difference between the hexadecimal system and the octal or binary system is that
alphabets are used to represent numeric values because our standard numbering
system has only 10 digits, that is, from 0 to 9.
For conversion, the techniques described in section 1.4.5 can be effectively used. As in
the case of octal numbering system, a relationship exists between hexadecimal and
binary numbering system. The number 16 is a multiple of 2. Therefore, the
relationship is 4:1. This means that a group of 4 binary digits can be used to represent
a hexadecimal digit. Table 1.4 shows this relationship similar to that of an octal
system.
Table 1.4: Decimal, Binary and Hexadecimal Numbers
16 Self-Instructional Material
Overview of Computers
Notes
Conversion Techniques
There are two methods used most frequently to convert a number in a particular base
to any other base. They are called The Remainder Method and The Expansion Method
as explained below:
1.
Remainder Method: This method can be used to convert a decimal number to its
equivalent value in any other base. The following steps are to be followed to carry
out the conversion with the remainder method. Let us assume that the number 14
is to be converted to its binary equivalent. The required base therefore is 2.
(i) Divide the number by the base and note the remainder.
(ii) Divide the quotient by the base and note the remainder.
(iii) Repeat step 2 until the quotient cannot be divided further. That is, the
quotient becomes smaller than the divisor.
(iv) The sequence of remainders starting from the last generated one prefixed by
the undivided quotient is the converted number.
These steps are explained with examples.
Example: To convert decimal number 14 to its binary equivalent.
Step 1: 14 divided by 2; Quotient = 7; Remainder = 0
Step 2: 7 divided by 2; Quotient = 3; Remainder = 1
Step 3: 3 divided by 2; Quotient = 1; Remainder = 1
The binary number therefore becomes 1110
Example: To convert decimal number 14 to its octal equivalent.
Step 1: 14 divided by 8; Quotient = 1; Remainder = 6;
The octal number therefore becomes 16
2.
Expansion Method: This method can be applied to convert any number in any
base to its equivalent in base 10. To understand how this method is carried out, let
us take an example.
Consider example. In this example, we will convert a binary number 1001 to its
equivalent decimal value.
Example:
10012 = 1 x 23 + 0 x 22 + 0 x 21 + 1 x 20
=8+0+0+1=9
The following observations are to be made from the above example:
(a) Each digit in the original number individually precedes the component of
expansion. That is, during expansion 1 precedes the expansion component in
the left most position followed by the other digits -0, 0 and 1. This occurrence
is exactly according to the digit's placement in the original binary number.
(b) During expansion, the base of the number is sequentially raised to a count
that starts with 0 and is incremented by one for every digit that occurs in the
Notes
Using the above guidelines, any binary number can be converted to its
equivalent decimal number.
Example:
11102 = 1 x 23 + 1 x 22 + 1 x 21 + 0 x 20
=8+4+2+0
= 14
Example:
110102 = 1 x 24 + 1 x 23 + 0 x 22 + 1 x 21 + 0 x 20
= 16 + 8 + 0 + 2 + 0
= 26
To convert a number of any base to base 10, a minor modification has to be
done to the steps above. For instance, if an octal number has to be converted
to its decimal equivalent, 2 in the above steps have to be changed to 8. This is
explained with example.
Example:
11018 = 1 83 + 1 82 + 0 81 + 1 80
= 512 + 64 + 0 + 1 = 577
Performance Factors
In todays world of easily available technology, it is not difficult for anyone to get a
computer system. Be it an organisation or an individual, everyone is becoming
increasingly dependent on computers. The capability of these machines to solve
almost any type of problem and to make any process more efficient has made them
very popular. Over the period of time the one thing that has really grown
exponentially in these systems is their performance.
Performance is one thing that really makes these computers sell. So in this part we
will briefly cover certain factors with the help of which we can judge the performance
of a computer.
MIPS
MIPS stand for million instructions per second. It measures the number of machine
instructions a machine can execute in a second. So you can say that it is the measure
of the computers speed and power. However, it is also true that there is no standard
of measuring MIPS since different instructions take different time. E.g. Pentium based
systems run at 100MIPS.
Clock Speed
It is the speed at which the microprocessor executes the instructions. There is an
internal clock in every computer regulating the rate at which the instructions are
executed as well as it synchronizes the other computer components. It can be
measured in Mega Hertz (MHz).
18 Self-Instructional Material
Bus Architecture
Overview of Computers
Performance also depends upon the bus architecture of a given system. The size of the
bus determines how much data can be transferred through the bus. For example, a
system can have a 16-bit or a 32-bit bus. If it has 32-bit then, it will definitely be faster
than the 16-bit one.
Notes
FLOPS
FLOPS stands for floating point operations per second. It is the benchmark
measurement for measuring the speed of the microprocessor. Any operation that
includes the involvement of fractional numbers is called floating-point operation.
Student Activity
1.
2.
Write down at least 15 short cut keys available in Excel, word and
PowerPoint each.
Summary
The computer can be defined as a data manipulating digital electronic device. The
evolution of computers is explained in terms of the generation of computers.
Computers in each generation have better capabilities and features than those in the
previous generations. There are four components of the computer: Input, Storage,
Processing, Output. The basic architecture of the computer has modules, which
realizes the above components. We follow three basic number systems: binary
number system, octal number system and hexadecimal number system. The computer
follows the binary number system.
Keywords
CPU: Central Processing Unit.
RAM: Random Access Memory.
MIPS: Million Instructions Per Second.
FLOPS: Floating Point Operations Per Second.
Review Questions
1.
2.
3.
The basic architecture of a computer system can be divided into four major
components, what are those?
4.
5.
Further Readings
Hamaacher, V., et al., Computer Organization, 4th ed., McGraw Hill, 1996
Hennessay, J.L., Patterson, D.A, Computer Organization
Hardware/Software Interface, Morgan Kufmann, 1994
and
Design:
The
Notes
20 Self-Instructional Material
Stallings, W, Computer Organization and Architecture, 2nd ed., Prentice Hall of India,
New Delhi
Unit 2 Introduction of
MS Word
Introduction of MS Word
Notes
Unit Structure
z
Introduction
Starting of MS Word
File-related Operations
Save a Document
Summary
Keywords
Review Questions
Further Readings
Learning Objectives
At the conclusion of this unit, you will be able to:
z
Introduction
Microsoft Word 2000 is highly sophisticated word-processing application software
included in Microsofts Office 2000 suite. It is the newest version of the MS-Word
available at this time. Among others, following are the functionalities of MS-Word.
z
Redo, Undo
Pagination
Spell Checking
Importing/Exporting Text
Mail Merging
Tables
Graphical Drawing
Document Template
Document wizard
Write letters
Thesis
Newsletters
Resumes
Applications
Books
We will discuss about its features and working as well as about the new features of
Microsoft word 2000 for those readers who are already familiar with earlier versions
of MS-Word (i.e. MS-Word 95 and 97).
the set that always appears. Word also displays abbreviated versions of the Standard
and Formatting toolbars on the same line, including only those buttons Microsoft
expects you to use most. You can add buttons directly from the toolbar, rather than
use MS-Words traditional customisation tools.
Better support for international users and document. For the first time, word is built
on a single code base. A separate language resources component customizes MSWords user interface for different languages. With the right language resource files,
you can change you copy of Word to display a foreign language user interface.
Perhaps more valuable, MS-Word now permits you to edit and proof documents in
multiple languages.
Introduction of MS Word
Notes
An improved help system based on Microsofts new HTML Help technology and
new Office Assistant characters, such as Rocky the dog.
Somewhat smarter IntelliSense automated features, including and improved Auto
correct feature that fixes many more spelling mistakes automatically.
A new Collect and Paste feature (and Clipboard toolbar) that makes it easier to copy
multiple elements into the Clipboard and paste them together into one location.
Improved spelling dictionary, thesaurus, and grammar checking tools.
z
New Open and Save dialog boxes that make it easier to access and store
documents quickly.
Click and Type, which enables you to double-click anywhere on a page and type
there, even if theres no existing text anywhere in sight
A new Themes feature that enables users to change the entire look of a Web (or
other) document quickly
More flexible printing features, including easy zooming to different paper sizes
and printing multiple pages on a single sheet.
Somewhat more effective protection against macro viruses, including support for
authenticated trusted sources but still no built-in virus detection features.
Finally, according to Microsoft Word and the rest of Office now support full Y2K
compliance.
An improved wizard for building Web pages or even small Web sites
By far the most important change is the improvement to the HTML format. Microsoft
has cleverly incorporated a wide variety of new Web technologies and languages into
its Web page format. This new alphabet soup of technologies includes:
z
You dont need to know much as to how to use each or any of these technologies to
build or edit Web pages in Word 2000. The main effect of adding these technologies is
to enhance the capabilities of browsers to display the data, improve formatting and
increase the scope of graphical object types that can be included in Web pages.
What this means is that, in many situations, it doesnt matter anymore whether you
save a document as a .doc binary file or a Web page. Either file format looks the same
in both MS-Word 2000 and a properly equipped browser. All the information
normally contained in the .doc binary file format is also included in the Web page
format and vice versa. This inter-changeability of file formats is called round tripping
by Microsoft. You can use any Web page created in Word 2000 to completely
regenerate the binary .doc format. This was not possible with Word 97.
In MS-Word 2000, creating a Web page is no different than creating a Word
document. You do not need to open a special environment.
Starting of MS Word
Microsoft word is a Windows based word processing application. It can be started on
a computer, where is already installed as follows:
z
icon.
This command will launch the Microsoft word 2000 on your computer, which will
have typical look as shown below:
24 Self-Instructional Material
Menu Bar
Tool Bars
Ruler
Cursor
Status Bar
Scroll Bars
Document Navigator
Introduction of MS Word
Notes
Title Bar: Title bar shows the name of the document and situated in the top of the
window application.
2.
Menu Bar: Menu bar contains the various commands under the various topics to
perform some special tasks. Menu bar is located under the title bar.
3.
4.
Ruler: The window is supplied with one horizontal and a vertical Ruler displayed
along the left and top of the document. Rulers can be used to set margins and
indents in easier way and they also provide measurement for the page formatting.
5.
Cursor: Cursor is MS-Word pointer, which tells where on the document the action
(that you choose) will appear or affect. The cursor can be moved and placed
anywhere on the document using pointing device like mouse.
6.
Status Bar: This bar displays the position of the cursor, status of some important
keys of keyboard, the messages for the toolbar button when a mouse points to it,
messages for menu option when a menu option is selected or pointed out by a
user and/or many other relevant information. It is located at the bottom of the
window.
7.
Scroll Bars: Scroll bars are sliders that can be moved using mouse. As the scroll
bar is moved, the window pans through the document exposing different regions
of the document. There are two types of scroll bars:
(i) Horizontal Scrollbar
Notes
View Buttons: View buttons are shortcuts of various views in the View Menu,
placed adjacent to the horizontal scroll bar. These buttons select different ways
the document can be viewed, as we shall see later.
9.
10. Office Assistant: Office assistant provides you the online help, real-time tips
while working with MS-Word.
Keys
Or
Ctrl+
or Ctrl+
Or
Ctrl+ or Ctrl+
Home or End
Ctrl+Home or Ctrl+End
File-related Operations
File (or document) related operation can be done through FILE menu. Different
file-operations are:
26 Self-Instructional Material
Introduction of MS Word
Notes
icon
Select the Blank document from the General Tab from the dialog box and then press
OK button.
Note that there are many types of pre-designed documents are available in the dialog
window above.
Blank Document
Start with a blank document when you want to create a traditional printed document.
Web Page
Use a Web Document when you want to display the documents contents on an
intranet or the Internet in a Web browser. A Web page opens in Web layout view.
Web pages are saved in HTML format i.e. a file with .html extension.
E-mail Messages
If you use Outlook 2000 or Outlook Express, use an e-mail message when you want to
compose and send a message or a document to others directly from Word. An e-mail
message includes an e-mail envelope toolbar so that you can fill in the recipient
names and subject of the message, set message properties, and then send the message.
Templates
Use a template when you want to reuse boilerplate text, custom toolbars, macros,
shortcut keys, styles, and auto text entries.
Save a Document
For saving a document:
Click on the File->save option.
OR
Press Ctrl+S
OR
28 Self-Instructional Material
Save and Save as options do the same work of saving a document. However, the
difference between both option is that the Save as command allows the user to save a
file by a different name and format. The Save option will save the document by the
same name and format as it was saved for very first time.
Introduction of MS Word
Notes
2.
3.
4.
Closing of Document
To close an already opened document just choose the close option from file menu but
keep it in your mind that the only current window or document will close since the
Microsoft Word works in MDI (Multi Document Interface) environment unlike
notepad which works in SDI (Single Document Interface).
Version
A document can be saved in different versions with the Version option in the File
menu. Following are the steps of saving different versions of Word document.
1.
Click on the Version option of File menu then the following Versions window
will appear.
2.
Click on the version window then the following window will appear.
3.
Write the comments of the version for example Ist version or IInd Version.
4.
Then press Ok
Do some changes in the document and repeat steps from 1 to 4. These steps will
save your document in different versions. To see the difference between the
versions do the following:
30 Self-Instructional Material
Introduction of MS Word
Here you can see the two different versions of the same document. The last version
i.e. the IInd version is you current version. To compare:
1.
2.
Click on Open
Now you can see the two windows of two versions and you can compare the text in
the two documents.
Page Setup
From the Page setup option the one can setup the page layout (margins etc.). For
using the Page setup option you have to perform the following steps:
1.
Click on the Page setup option from the file menu then the following page setup
window will appear.
2.
Adjust the different margins or apply different options from the margin tab
where
(i) In top margin enter the distance you want between the top of the page and
the top of the first line on the page.
(ii) In Bottom margin enter the distance you want between the bottom of the
page and the bottom of the last line on the page.
(iii) In Left option, enter the distance you want between the left edge of the page
and the left edge of unindented lines.
(iv) In Right option enter the distance you want between the right edge of the
page and the right end of a line with no right indent.
(v) In Gutter option enter the amount of extra space you want to add to the
margin for binding. Word adds the extra space to the left margin of all pages
if you clear the Mirror margins check box, or to the inside margin of all pages
if you select the Mirror margins check box.
(vi) In Header option under From edge frame enter the distance you want from
the top edge of the paper to the top edge of the header. If the Header setting is
larger than the Top setting, Word prints the body text below the header.
(vii) In Footer option under From edge enter the distance you want from the
bottom edge of the paper to the bottom edge of the footer. If the Footer setting
is larger than the Bottom setting, Word stops printing the body text above the
footer.
(viii) Check Mirror margin check box to adjusts left and right margins so that when
you print on both sides of the page the inside margins of facing pages are the
same width and the outside margins are the same width.
(ix) Check the 2 pages per sheet checkbox to print the second page of a document
on the first page. This check box is used when the printed page is folded in
half with the two pages on the inside. The outer margins (gutter) of the page
will be the same width, and the inner margins will be the same width.
(x) In the Apply to list box click the portion of the document you want to apply
the current settings to in the Page Setup dialog box. And the options of this
list box are whole document. This point forward, etc. which can be changed
according to the situation.
From the Paper size tab you can set the length or width of the page. When you
click on the Paper size tab the following window will appear.
32 Self-Instructional Material
From Paper size list box you can select the predefined Paper sizes.
2.
From Width and Height text boxes the custom Paper size can be defined by
adjusting the Height and width of the paper.
3.
Select the orientation of the paper from Landscape or Portrait orientation frame.
Portrait orientation is length-wise while Landscape orientation is width-wise on
the page.
Introduction of MS Word
Notes
From the Paper source tab you can select the source of paper that from where you
are going to insert the paper in the printer. Clicking on the Paper source tab the
following window will appear:
In the above window you can adjust the source of the first paper and other pages.
Print Option
For taking the printout you have to select the print option of the file menu. After
selecting the print option from file menu the window given below will appear.
From the Name combo box you can select the printer if there is more than one
printer is installed.
2.
You can select the range of pages i.e. all pages or current page or number of pages
you require from Page range frame.
3.
From the Print what option you can choose that which part of a document you
want to print i.e. the whole document or comments or anything else.
4.
From print option the pages can be selected to print i.e. all pages or even pages or
old pages.
5.
You can choose number of copies from Number of copies option under Copies
frame.
6.
From Pages per sheet option under Zoom frame you can select the number of
pages in the document that you want to print on each sheet of paper.
7.
From Scale to paper size option you can select the paper size on which you want
to print the document. For example, you can specify that a B4-size document will
be printed on A4-size paper by decreasing the size of the font and the graphics.
This feature is similar to the reduce/enlarge feature on a photocopy machine.
8.
The collate check box can prints the copies of the document in proper binding
order.
9.
Send to option
From Send to option you can send the document to various other recipients
(application or otherwise) on various places through various technologies like email,
fax, etc.
Properties Option
From the Properties option in the file menu you can set the various properties of the
document. When you click on the properties option you will see the following
window:
34 Self-Instructional Material
From this window you can set or see the various properties of the document. You can
set the author name, company name, title of the document and various other
properties. And from the other tabs you can see the various properties of the
document like.
1.
From General tab you can see the type of document, the file size, the path of the
file where it is saved, date of creation of the document, date of last modified, etc.
2.
From Statistics tab you can see the various statistics of the document like no of
words, no of characters, no of lines etc.
Introduction of MS Word
Notes
After the properties option there are the list of last modified or created files. The
number of files is dependent upon the user, which he has set. By default it is 4.
All the necessary editing commands have been grouped into Edit menu or edit
toolbar. See the edit menu given below.
Block Operations
There may be a situation when you have to perform a task on a single character.
Performing that action is very simple just put the cursor on the character and perform
the task. But take the situation when you have to perform the same action with a
group of word. Lets take an example of deleting a whole paragraph. To delete this
paragraph you can press the <Del> key repeatedly until the text of entire paragraph is
deleted. However, as you would realise that this is very tedious way of deleting
multiple characters. A better solution is to make a block of the paragraph by selecting
it and then deleting it in one go. Any general method of text selection may be
employed to select the text block. It will appear in reverse background (white
foreground and black background). While selected, if a key is pressed the entire
selection is replaced by just the character being keyed in. If instead <del> key is
pressed the selected block is deleted without being replaced by any character.
Notes
1.
Selecting a word
2.
Selecting a line
3.
4.
1.
2.
3.
4.
Press the left button of mouse and drag it till the end of the word
Or
5.
Place the cursor before the word and then press ctrl+shift+right arrow key
To Select a line
1.
Point the mouse pointer on the selection bar. There the mouse pointer will change
to an Arrow pointing opposite to usual direction.
2.
36 Self-Instructional Material
3.
4.
Introduction of MS Word
Notes
Selection bar
To select a paragraph
Place the mouse pointer to selection bar and double click. The entire paragraph will
be selected.
OR
Place the cursor on the first character of the paragraph
Press ctrl + shift + down-arrow keys
Undo
Click on the Undo option under Edit menu
Punjab Technical University 37
Or
Click on the Undo button on Standard toolbar
Or
Notes
Press Ctrl+z.
The undo option displays all the recent actions, which MS-Word can undo. You
can select from this list the appropriate action to be undone.
Similarly, sometimes, after undoing certain changes, it is desired to reapply the
action undone. MS-Word provides a command for redoing whatever was undone
in the previous step.
2.
Redo
If an undo has to perform on last reversed action that is known as Redo. To
perform redo you can
Click on the Redo option under Edit Menu
Or
Click on the Undo button on Standard Toolbar
Or
Press Ctrl+y
A word
A graphic
A line of text
Move the pointer to the left of the line until it changes to a right-pointing
arrow, and then click.
Move the pointer to the left of the lines until it changes to a right-pointing
arrow, and then drag up or down.
A sentence
A paragraph
Move the pointer to the left of the paragraph until it changes to a rightpointing arrow, and then double-click. Or triple-click anywhere in the
paragraph.
Multiple paragraphs
Move the pointer to the left of the paragraphs until it changes to a rightpointing arrow, and then double-click and drag up or down.
Click at the start of the selection, scroll to the end of the selection, and
then hold down SHIFT and click.
An entire document
Move the pointer to the left of any document text until it changes to a
right-pointing arrow, and then triple-click.
Contd.
38 Self-Instructional Material
In normal view, click Header and Footer on the View menu; in print layout
view, double-click the dimmed header or footer text. Move the pointer to
the left of the header or footer until it changes to a right-pointing arrow,
and then triple-click.
Comments,
footnotes, and
endnotes
Click in the pane, move the pointer to the left of the text until it changes to
a right-pointing arrow, and then triple-click.
A vertical block of
text (except within a
table cell)
Introduction of MS Word
Notes
Click the right mouse button and select paste from the context menu. As shown in
figure
Notes
40 Self-Instructional Material
Introduction of MS Word
Notes
(ii) In the Find what box, enter the text that you want to search for.
(iii) Select the direction of searching from Search list box.
(iv) Select any other options that you want from the following:
Match case: To find the characters matching the cases as well. With this
option on, f will not match with F.
Notes
Find whole words only: To find the characters forming a word by themselves
and are not a part of another word.
Use wildcards: To specify the wildcard characters (? Or *) in the Find what
text box. These two characters represent any character in a comparison, hence
are called wild cards. Whereas ? matches with any one character, * matches
with a string of characters.
Sounds like: To find words that sounds similar but spelled differently. For
example hair, heir, hear and hare are sounds similar to here.
Find all word forms: To find all grammatical forms of the word. For example,
on entering the word eat it also searches ate, eaten and eating words.
(v) For Help on an option, click the question mark and then click the option.
(vi) Click Find Next to proceed.
(vii) MS-Word starts searching in the specified direction from the current cursor
position and stops at the first match found. If you want to continue finding in
the rest of the document click on the Find next button.
2.
On the Edit menu, click Find. Find and replace window will pop-up on the
screen.
ii. Do one of the following: To search for text with specific formatting, enter the
text in the Find what box.
To search for specific formatting only, delete any text in the Find what box.
iii. If you don't see the Format button, click More .
iv. If you want to clear the specified formatting, click No Formatting.
v. Click Format, and then select the formats you want.
vii. Click Find Next.
Replace
If you have to replace a word in the document with another word you can use find
and replace command to do that. Find we have discussed above now its the turn of
Replace.
42 Self-Instructional Material
1.
On the Edit menu, click Replace. Then you will find the following window.
2.
The Find what box, enter the text that you want to search for.
3.
4.
5.
For Help on an option, click the question mark and then click the option.
6.
7.
Introduction of MS Word
Notes
Go to
To go to on a particular location or particular item use Go to option under the Edit
menu. Steps are as follows.
1.
Click on the Go to option under the Edit menu. Then the following window will
appear on the screen.
2.
Select according to what you want to navigate in the document from Go to what
combo box.
3.
4.
Click on the previous or next depending upon the direction you want to go.
You can also use the Document navigator to move around the document. The browse
methods on the Document Navigator includes
1.
Go to method
2.
Document Navigator can be invoked by clicking the 3-D ball on the vertical scrollbar.
Edits
Heading
Graphics
Tables
Fields
Endnotes
Footnotes
Comments :
Sections
Pages
S-Word provides a wide variety of views of seeing your document in different ways.
Various commands under View menu facilitate this action. To access the view menu
click on the view menu of press alt + v from keyboard.
MS-Word 2000 provides the following options to view a document in different styles.
1.
Normal
2.
Web Layout
3.
Print Layout
4.
Outline
5.
Full Screen
Normal View
In the Normal view you can only view the Horizontal ruler instead of both horizontal
and vertical ruler. It does not display the margin areas of the page thats why you
cant see the headers and footers.
44 Self-Instructional Material
Introduction of MS Word
Notes
The advantage of the Normal view is that you differentiate between the Soft page
break (One that MS-Word gives you when the text flows out of one page), which will
appear as horizontal line running across the page. Or A hard page break (One which
you insert to end the page before it goes full) which will appears as dotted horizontal
line with the MS-Words Page Break as shown below.
In this layout also the horizontal ruler is shown instead of both the rulers as in case of
Normal view. But there will no page break displayed on the page in the Web layout
view. The whole document looks as if it were a single page.
Print Layout
This layout is the default view layout. Print layout view gives you the view that will
appear on the hard copy when printed. It includes both horizontal as well vertical
rulers to tell you exact position of your text or picture in the document. It also shows
you all the four i.e. top, bottom, left, and right margins as well the text you have typed
in the header or the footer section of the document with light gray color. To change
the view to Print layout, click on the Print Layout option under View menu.
As you can see in the Print Layout view each and every page looks like a separate
page of the notebook.
46 Self-Instructional Material
Outline view
Introduction of MS Word
Outline view displays the contents of your document in a traditional outline format,
with text indented beneath headings in a hierarchical structure. In this view, you can
display headings, or any level of detail beneath the headings that you wish. The
figure given below shows a document in Outline view. Notice that when you are in
this view there is an additional toolbar that enables you to open and close headings to
reveal more or less detail and to promote or demote headings to change their position
in the outline hierarchy.
1.
2.
Move your mouse pointer to the plus symbol to the left of the main heading. The
pointer changes to a four-way arrow.
3.
Click the arrow symbol to select this heading and all of its subheadings.
4.
Choose the Collapse button from the Outline toolbar. The entire subheading
disappears. They have temporarily been collapsed of hidden from view. The
wavy line under the heading indicates there are collapsed heading underneath it.
5.
6.
To restyle a heading to demote it one level or promote it one level, select it and
then click the Promote or Demote buttons in the Outlining toolbar. Then click the
appropriate heading style. For example, if you demote a heading formatted with
the Heading 2 style, Word reformats it with the Heading 3 style.
7.
To move a heading (along with all the subheadings and body text it contains) to a
new location in the document, drag its plus sign. As you drag, a horizontal line
indicates where the heading will appear. When the line is in the right place,
release the mouse button.
Demote
Expand
Notes
Collapse
Promote
48 Self-Instructional Material
1.
Display the Standard toolbar if it isnt visible, using the Toolbars command from
the view menu.
2.
Click the drop-down control of the Zoom box to display a list of zoom percentage.
3.
Select a percentage from the list or type in a different percentage in the Zoom box.
As shown in figure below.
Introduction of MS Word
Notes
The four options at the bottom of the Zoom list also come in handy. They
automatically adjust your documents magnification just the right amount to display
the full width of the page (Page width), the width of the text only (Text Width), the
entire page (Whole Page), and two entire pages (Two pages). In Normal view, the
Text Width, Whole Page, and Two pages options are not available.
2.
3.
This will display all the toolbar names. Toolbars that are currently displayed have
check marks in front of them and those, which are hidden; they dont have any
check mark.
4.
5.
50 Self-Instructional Material
1.
Select Header and Footer option from the View menu. Header and Footer toolbar
is displayed. Header area is activated and documents text color changes to light
gray.
2.
3.
If you want to create footer, click the switch between Header and Footer toolbar
button to make footer area active.
4.
5.
Apply any formatting that you like on header or footer of the document.
Introduction of MS Word
Notes
Student Activity
1.
Notes
3.
Write down at least 15 short cut keys available in Excel, word and
PowerPoint each.
Summary
Microsoft Office 2000, a successor to Microsoft Office 97, was designed as a fully
32-bit and Y2K compliant version to match Windows 2000 features. All the Office
2000 applications have OLE 2 capacity, which allows moving data automatically
between various programs. All the available packages in office 2000 have some short
cut keys.
Keywords
Office 2000: Y2K compliant application software
Word 2000: Word Processing software comes as a part of office 2000
Excel 2000: Spreadsheet software
PowerPoint 2000: Presentation Software
Review Questions
1.
2.
3.
4.
What are the potential benefits of Microsoft office over its contemporaries?
Further Readings
Dharminder Kumar, Management Information Systems, Excel Books, New Delhi.
Dhiraj Sharma, Foundations of IT, Excel Books, New Delhi.
Bhuwanesh Jha, Elements of basic computing, Khanna Publication.
Chee wong lee, Fundamentals of Office 2000, China Publishing.
Rajiv Gupta, Foundations of Office 2000, Rajasthan Publishers.
Narender, Singh and Naruka, Office 2000 in 7 Days, Jalandhar Publishing House.
52 Self-Instructional Material
Unit 3 Operating
Systems
Operating Systems
Notes
Unit Structure
z
Introduction
Summary
Keywords
Review Questions
Further Readings
Learning Objectives
At the conclusion of this unit, you will be able to:
z
Introduction
Operating systems are so ubiquitous in computer operations that one hardly realises
its presence. Most likely you must have already interacted with one or more different
operating systems. The names like DOS, UNIX, etc. should not be unknown to you.
These are the names of very popular operating systems.
From a very simple standpoint, it can be stated that a computer cannot become
operational without an operating system, hence the name. Operating system is simply
a very complex computer program. You will learn about various issues related to an
operating system in this unit.
on or reset. There may be some additional activities on some machine also. These
activities are called power-on routines. Why do these activities always happen? You
will learn about it elsewhere in this unit.
You know a computer does not do anything without properly instructed. Thus, for
each one of the above power-on activities also, the computer must have instructions.
These instructions are stored in a non-volatile memory, usually in a ROM. The CPU of
the computer takes one instruction from this ROM and executes it before taking next
instruction. Since ROMs are of finite size they can store only a few kilobytes of
instructions. One by one the CPU executes these instructions. Once, these instructions
are over, the CPU must obtain further instructions from somewhere else.
Usually further instructions are stored on a secondary storage device like hard disk,
floppy disk or CD-ROM disk. These instructions are collectively known as operating
system and their primary function is to provide an environment in which users may
execute their own instructions.
Once the operating system is loaded into the main memory, the CPU starts executing
its instructions. Operating systems run in an infinite loop, each time taking
instructions in the form of commands or programs from the users and executing them
in that order. This loop continues until either the user terminates the loop deliberately
by shutting it down or something goes wrong during the operation.
Please note that a user almost never interacts with the hardware directly and that a lot
depends on the operating system loaded and running on a computer. For all practical
purposes a computer is nothing more than the operating system controlling it as far as
the users are concerned. In order to exploit the most from a computer, therefore, a
deep understanding of operating system is a must.
An operating system is the most important program in a computer system. This is one
program that runs all the time, as long as the computer is operational and exits only
when the computer is shut down.
In general, however, there is no completely adequate definition of an operating
system. Operating systems exist because they are a reasonable way to solve the
problem of creating a usable computing system.
The fundamental goal of computer systems is to execute user programs and to make
solving user problems easier. Hardware of a computer is equipped with extremely
capable resources memory, CPU, I/O devices, etc. All these hardware units interact
with each other in a well-defined manner. Bare hardware is not enough to solve a
problem. Application programs are developed for the purpose, which require certain
common operations, such as those controlling the I/O devices. The common functions
of controlling and allocating resources are then brought together into one piece of
software: the operating system.
It is easier to define operating systems by their functions, i.e., by what they do than by
what they are. The computer becomes easier for the users to operate, is the primary
goal of an operating system. Operating systems exist because they are supposed to
make it easier to compute with them than without them. This view is particularly
clear when you look at operating systems for small personal computers.
Efficient operation of the computer system is a secondary goal of an operating system.
This goal is particularly important for large, shared multi-user systems. These systems
are typically expensive, so it is desirable to make them as efficient as possible.
Operating systems and computer architecture have had a great deal of influence on
each other. To facilitate the use of the hardware, operating systems were developed.
As operating systems were designed and used, it became obvious that changes in the
design of the hardware could simplify them.
54 Self-Instructional Material
Operating systems are the programs that make computers operational, hence the
name. Without an operating system, the hardware of a computer is just an inactive
electronic machine, possessing great computational power, but doing nothing for the
user. All it can do is to execute fixed number of instructions stored into its internal
memory (ROM: Read Only Memory), each time you switch the power on, and
nothing else.
Operating Systems
Notes
Operating systems are programs (fairly complex ones) that act as interface between
the user and the computer hardware. They sit between the user and the hardware of
the computer providing an operational environment to the users and application
programs. For a user, therefore, a computer is nothing but the operating system
running on it. It is extended machine.
Users do not interact with the hardware of a computer directly but through the
services offered by operating system. This is because the language that users employ
is different from that of the hardware. Whereas users prefer to use natural language
or near natural language for interaction, the hardware uses machine language. It is the
operating system that does the necessary translation back and forth and lets the user
interact with the hardware. The operating system speaks users language one hand
and machine language on the other. It takes instructions in form of commands from
the user and translates into machine understandable instructions, gets these
instructions executed by the CPU and translates the result back into userunderstandable form.
A user can interact with a computer if only he/she understands the language of the
resident operating system. You cannot interact with a computer running UNIX
operating system, for instance, if you do not know UNIX language or UNIX
commands. A UNIX user can always interact with a computer running UNIX
operating system, no matter what type of computer it is. Thus, for a user operating
system itself is the machine an extended machine as shown in figure 3.1.
user
1
user
2
user
3
compiler
Text editor
database
- - - - -
user
n
- - - - -
application programs
System calls
shell
Operating system
Computer hardware
CPU
memory
I/O
Functions
As has been stated earlier, the prime function of an operating system is to provide an
environment for the execution of users programs. Besides, the operating system also
provides certain services to programs and to the users of those programs to enhance
the primary function of program execution in various ways.
The specific services provided differ from one operating system to another, but there
are some common classes that we can identify. These operating system services are
provided for the convenience of the programmer, to make the programming task
easier. Some of these services are listed below:
1.
2.
3.
4.
Communication services
5.
6.
Accounting services
7.
Protection services
Components
An operating system performs large number of functions. Each function is carried out
by a component of the operating system called its subsystems. The typical
components of an operating system are:
56 Self-Instructional Material
1.
2.
3.
4.
5.
6.
7.
Protection sub-system
8.
User-interface sub-system
Classification
Operating Systems
The variations and differences in the nature of different operating systems may give
the impression that all operating systems are absolutely different from each-other. But
this is not true. All operating systems contain the same components whose
functionalities are almost the same. For instance, all the operating systems perform
the functions of storage management, process management, protection of users from
one-another, etc. The procedures and methods that are used to perform these
functions might be different but the fundamental concepts behind these techniques
are just the same. Operating systems in general, perform similar functions but may
have distinguishing features. Therefore, they can be classified into different categories
on different bases. Let us quickly look at the different types of operating systems.
Notes
Application
program
user
operating system
hardware
printed outputs that belong to different jobs. As the printing and sorting of the results
is done for all the jobs of batch together, the turn around time for a job becomes the
function of the execution time requirement of all jobs in the batch. You can reduce the
turn around time for different jobs by recording the jobs or faster input output media
like magnetic tape or disk surfaces. It takes very less time to read a record from these
media. For instance, it takes round about five milliseconds for a magnetic tape and
about one millisecond for a fast fixed-head disk in comparison to a card reader or
printer that takes around 50-100 milliseconds. Thus, if you use a disk or tape, it
reduces the amount of time the central processor has to wait for an input output
operation to finish before resuming processing. This would reduce the time taken to
process a job which indirectly would bring down the turn-around times for all the
jobs in the batch.
Jobs/tasks
Jobs/tasks
Jobs/tasks
Jobs/tasks
operating system
hardware
Figure 3.3
Another term that is commonly used in a batch processing system is Job Scheduling.
Job scheduling is the process of sequencing jobs so that they can be executed on the
processor. It recognizes different jobs on the basis of first-come-first-served (FCFS)
basis. It is because of the sequential nature of the batch. The batch monitor always
starts the next job in the batch. However, in exceptional cases, you could also arrange
the different jobs in the batch depending upon the priority of each batch. Sequencing
of jobs according to some criteria require scheduling the jobs at the time of creating or
executing a batch. On the basis of relative importance of jobs, certain priorities could
be set for each batch of jobs. Several batches could be formed on the same criteria of
priorities. So, the batch having the highest priority could be made to run earlier than
other batches. This would give a better turn around service to the selected jobs.
Now, we discuss the concept of storage management. At any point of time, the main
store of the computer is shared by the batch monitor program and the current user job
of a batch. The big question that comes in our mind is-how much storage has to be
kept for the monitor program and how much has to be provided for the user jobs of a
batch. However, if too much main storage is provided to the monitor, then the user
programs will not get enough storage. Therefore, an overlay structure has to be
devised so that the unwanted sections of monitor code dont occupy storage
simultaneously.
Next we will discuss the concept of sharing and protection. The efficiency of
utilization of a computer system is recognized by its ability of sharing the system's
hardware and software resources amongst its users. Whenever, the idea of sharing the
58 Self-Instructional Material
system resources comes in your mind certain doubts also arise about the fairness and
security of the system. Every user wants that all his reasonable requests should be
taken care of and no intentional and unintentional acts of other users should fiddle
with his data. A batch processing system guarantees the fulfillment of these user
requirements. All the user jobs are performed one after the other. There is no
simultaneous execution of more than one job at a time. So, all the system resources
like storage IO devices, central processing unit, etc. are shared sequentially or serially.
This is how sharing of resources is enforced on a batch processing system. Now, arises
the question of protection. Though all the jobs are processed simultaneously, this too
can lead to loss of security or protection. Let us suppose that there are two users A
and B. User A creates a file of his own. User B deletes the file created by User A. There
are so many other similar instances that can occur in our day to day life. So, the files
and other data of all the users should be protected against unauthorized usage. In
order to avoid such loss of protection, each user is bound around certain rules and
regulations. This takes the form of a set of control statements, which every user is
required to follow.
Operating Systems
Notes
files, or may be rigidly formatted. In general, a file is a sequence of bits, bytes, lines, or
records whose meaning is defined by its creator and user. The operating system
implements the abstract concept of a file by managing mass-storage devices, such as
tapes and disks. Files are normally organized into logical clusters, or directories,
which make them easier to locate and access. Since multiple users have access to files,
it is desirable to control by whom and in what ways files may be accessed. Batch
systems are appropriate for executing large jobs that need little interaction. The user
can submit jobs and return later for the results; it is not necessary for the user to wait
while the job is processed.
Interactive jobs tend to be composed of many short actions, where the results of the
next command may be unpredictable. The user submits the command and then waits
for the results. Accordingly, the response time should be short on the order of seconds
at most.
An interactive system is used when a short response time is required. Early
computers with a single user were interactive systems. That is, the entire system was
at the immediate disposal of the programmer/operator. This situation allowed the
programmer great flexibility and freedom in program testing and development. But,
as we saw, this arrangement resulted in substantial idle time while the CPU waited
for some action to be taken by the programmer/operator. Because of the high cost of
these early computers, idle CPU time was undesirable. Batch operating systems were
developed to avoid this problem. Batch systems improved system utilization for the
owners of the computer systems.
Time-sharing systems were developed to provide interactive use of a computer
system at a reasonable cost. A time-shared operating system uses CPU scheduling
and multiprogramming to provide each user with a small portion of a time-shared
computer.
Each user has at least one separate program in memory. A program that is loaded into
memory and is executing is commonly referred to as a process. When a process
executes, it typically executes for only a short time before it either finishes or needs to
perform I/O. I/O may be interactive; that is, output is to a display for the user and
input is from a user keyboard.
Since interactive I/O typically runs at people speeds, it may take a long time to
complete. Input, for example, may be bounded by the user's typing speed; five
characters per second is fairly fast for people, but is incredibly slow for computers.
Rather than let the CPU sit idle when this interactive input takes place, the operating
system will rapidly switch the CPU to the program of some other user.
A time-shared operating system allows the many users to share the computer
simultaneously. Since each action or command in a time-shared system tends to be
short, only a little CPU time is needed for each user. As the system switches rapidly
from one user to the next, each user is given the impression that she has her own
computer, whereas actually one computer is being shared among many users.
The idea of time-sharing was demonstrated as early as 1960, but since time-shared
systems are difficult and expensive to build, they did not become common until the
early 1970s. As the popularity of time-sharing has grown, researchers have attempted
to merge batch and time-shared systems. Many computer systems that were
designed as primarily batch systems have been modified to create a time-sharing
subsystem. For example, IBM's OS/360, a batch system, was modified to support the
time-sharing option (TSO). At the same time, time-sharing systems have often added
a batch subsystem. Today, most systems provide both batch processing and time
sharing, although their basic design and use tends to be one or the other type.
Time-sharing operating systems are even more complex than are multi-programmed
operating systems. As in multiprogramming, several jobs must be kept
60 Self-Instructional Material
Operating Systems
Notes
Parallel Systems
Most systems to date are single-processor systems; that is, they have only one main
CPU. However, there is a trend toward multiprocessor systems. Such systems have
more than one processor in close communication, sharing the computer bus, the clock,
and sometimes memory and peripheral devices. These systems are referred to as
tightly coupled systems.
There are several reasons for building such systems. One advantage is increased
throughput. By increasing the number of processors, we hope to get more work done
in a shorter period of time. The speed-up ratio with n processors is not n, however,
but rather is less than n. When multiple processors cooperate on a task, a certain
amount of overhead is incurred in keeping all the parts working correctly. This
overhead, plus contention for shared resources lowers the expected gain from
additional processors. Similarly, a group of n programmers working closely together
does not result in n times the amount of work being accomplished.
Multiprocessors can also save money compared to multiple single systems because
the processors can share peripherals, cabinets, and power supplies. If several
programs are to operate on the same set of data, it is cheaper to store those data on
one disk and to have all the processors share them, rather than to have many
computers with local disks and many copies of the data.
Another reason for multiprocessor systems is that they increase reliability. If functions
can be distributed properly among several processors, then the failure of one
processor will not halt the system, but rather will only slow it down. If we have
10 processors and one fails, then each of the remaining nine processors must pick up a
share of the work of the failed processor. Thus, the entire system runs only 10 percent
slower, rather than failing altogether.
This ability to continue providing service proportional to the level of surviving
hardware is called graceful degradation. Systems that are designed for graceful
degradation are also called fault-tolerant.
Continued operation in the presence of failures requires a mechanism to allow the
failure to be detected, diagnosed, and corrected (if possible). The Tandem system uses
both hardware and software duplication to ensure continued operation despite faults.
The system consists of two identical processors, each with its own local memory. The
processors are connected by a bus. One processor is the primary, and the other is the
backup. Two copies are kept of each process; one on the primary machine and the
other on the backup. At fixed checkpoints in the execution of the system, the state
information of each job (including a copy of the memory image) is copied from the
primary machine to the backup. If a failure is detected, the backup copy is activated,
and is restarted from the most recent checkpoint.
This solution is obviously an expensive one, since there is considerable hardware
duplication. The most common multiple-processor systems now use the symmetric
multi-processing model, in which each processor runs an identical copy of the
operating system, and the copies communicate with one another as needed. Some
systems use symmetric multiprocessing, in which each processor is assigned a specific
task. A master processor controls the system; the other processors either look to the
master for instruction or have predefined tasks. This scheme defines a master-slave
relationship. The master processor schedules and allocates work to the slave
processors.
An example of the symmetric multiprocessing system is Encore's version of UNIX for
the Multimax computer. This computer can be configured to employ dozens of
processors, all running a copy of UNIX. The benefit of this model is that many
processes can run at once (N processes if there are N CPUs) without causing a
deterioration of performance. However, we must carefully control I/O to ensure that
data reach the appropriate processor. Also, since the CPUs are separate, one may be
sitting idle while another is overloaded, resulting in inefficiencies. To avoid these
inefficiencies, the processors can share certain data structures. A multiprocessor
system of this form will allow jobs and resources to be shared dynamically among the
various processors, and can lower the variance among the systems. However, such a
system must be written carefully.
Asymmetric multiprocessing is more common in extremely large systems, where one
of the most time-consuming activities is simply processing I/O. In older batch
systems, small processors, located at some distance from the main CPU, were used to
run card readers and line printers and to transfer these jobs to and from the main
computer. These locations are called remote-job-entry (RJE) sites. In a time-sharing
system, a main I/O activity is processing the I/O of characters between the terminals
and the computer. If the main CPU must be interrupted for every character for every
terminal, it may spend all its time simply processing characters. So that this situation
is avoided, most systems have a separate front-end processor that handles the entire
terminal I/O.
For example, a large IBM system might use an IBM Series/I minicomputer as a frontend. The front-end acts as a buffer between the terminals and the main CPU, allowing
the main CPU to handle lines and blocks of characters, instead of individual
characters. Such systems suffer from decreased reliability through increased
specialization. It is important to recognize that the difference between symmetric and
asymmetric multiprocessing may be the result of either hardware or software.
Special hardware may exist to differentiate the multiple processors, or the software
may be written to allow only one master and multiple slaves. For instance, Sun's
operating system SunOS Version 4 provides asymmetric multiprocessing, whereas
Version 5 (Solaris 2) is symmetric. As microprocessors become less expensive and
more powerful, additional operating system functions are off-loaded to slaveprocessors, or back-ends.
For example, it is fairly easy to add a microprocessor with its own memory to manage
a disk system. The microprocessor could receive a sequence of requests from the main
CPU and implement its own disk queue and scheduling algorithm. This arrangement
relieves the main CPU of the overhead of disk scheduling. PCs contain a
microprocessor in the keyboard to convert the key strokes into codes to be sent to the
CPU. In fact, this use of microprocessors has become so common that it is no longer
considered multiprocessing.
Distributed Systems
A recent trend in computer systems is to distribute computation among several
processors. In contrast to the tightly coupled systems, the processors do not share
memory or a clock. Instead, each processor has its own memory and clock. The
processors communicate with one another through various communication lines,
such as high-speed buses or telephone lines.
These systems are usually referred to as loosely coupled systems, or distributed
systems. The processors in a distributed system may vary in size and function. They
may include small microprocessors, workstations, minicomputers and large generalpurpose computer systems. These processors are referred to by a number of different
62 Self-Instructional Material
names, such as sites, nodes, computers, and so on, depending on the context in which
they are mentioned.
Operating Systems
There are a variety of reasons for building distributed systems, the major ones being:
z
Reliability: If one site fails in a distributed system, the remaining sites can
potentially continue operating. If the system is composed of a number of large
autonomous installations (that is, general-purpose computers), the failure of one
of them should not affect the rest. If, on the other hand, the system is composed of
a number of small machines, each of which is responsible for some crucial system
function (such as terminal character I/O or the file system), then a single failure
may effectively halt the operation of the whole system. In general, if sufficient
redundancy exists in the system (in both hardware and data), the system can
continue with its operation, even if some of its sites have failed.
Notes
bounded, from the retrieval of stored data to the time that it takes the operating
system to finish any request made of it. Such time constraints dictate the facilities that
are available in hard real-time systems. Secondary storage of any sort is usually
limited or missing, with data instead being stored in short-term memory, or in Readonly Memory (ROM). ROM is located on nonvolatile storage devices that retain their
contents even in the case of electric outage; most other types of memory are volatile.
Most advanced operating-system features are absent too, since they tend to separate
the user further from the hardware, and that separation results in uncertainty about
the amount of time an operation will take. For instance, virtual memory is almost
never found on real-time systems. Therefore, hard real-time systems conflict with the
operation of time-sharing systems, and the two cannot be mixed. Since none of the
existing general-purpose operating systems support hard real-time functionality, we
do not concern ourselves with this type of system in this text.
A less restrictive type of real-time system is a soft real-time system, where a critical
real-time task gets priority over other tasks, and retains that priority until it completes.
Monolithic Architecture
There are numerous commercial systems that do not have a well-defined structure.
This architecture is referred to as monolithic architecture because of the lack of any
identifiable structure. Frequently, such operating systems started as small, simple,
and limited systems, and then grew beyond their original scope. MS-DOS is an
example of such a system. It was originally designed and implemented by a few
people who had no idea that it would become so popular. It was written to provide
the most functionality in the least space, because of the limited hardware on which it
ran, so it was not divided into modules carefully. Figure 3.4 shows its structure.
64 Self-Instructional Material
8088 for which it was written provides no dual mode and no hardware protection, the
designers of MS-DOS had no choice but to leave the base hardware accessible.
Layered Architecture
New versions of operating systems are designed to use more advanced hardware.
Given proper hardware-support, operating systems may be broken into smaller, more
appropriate pieces than those allowed by the original MS-DOS or UNIX. The
operating system can then retain much greater control over the computer and the
applications that make use of that computer. Implementers have more freedom to
make changes to the inner workings of the system. Familiar techniques are used to aid
in the creation of modular operating systems. Under the top-down approach, the
overall functionality and features can be determined and separated into components.
Information hiding is also important, leaving programmers free to implement the
low-level routines as they see fit, provided that the external interface of the routine
stays unchanged and the routine itself performs the advertised task.
Operating Systems
Notes
The modularization of a system can be done in many ways; the most appealing is the
layered approach, which consists of breaking the operating system into a number of
layers (levels), each built on top of lower layers. The bottom layer (layer 0) is the
hardware; the highest (layer N) is the user interface.
An operating-system layer is an implementation of an abstract object that is the
encapsulation of data, and operations that can manipulate those data. A typical
operating-system layer say layer M is depicted in Figure 3.5.
to know only what these operations do. Hence, each layer hides the existence of
certain data structures, operations, and hardware from higher-level layers.
The layer approach to design was first used in the operating system at the Technische
Hogeschool Eindhoven. The system was defined in six layers, as shown in Figure 1.8.
The bottom layer was the hardware. The next layer implemented CPU scheduling.
The next layer implemented memory management; the memory-management scheme
was virtual memory. Layer 3 contained device driver for the operator's console.
Because it and I/O buffering (level 4) were placed above memory management, the
device buffers could be placed in virtual memory. The I/O buffering was also above
the operator's console, so that I/O error conditions could be output to the operator's
console.
layer 5:
user programs
layer 4:
layer 3:
operator-console
device driver
layer 2:
memory management
layer 1:
CPU scheduling
MS-DOS structure to that of the OS/2. It should be clear that, from both the systemdesign and implementation standpoints, OS/2 has the advantage. For instance, direct
user access to low-level facilities is not allowed, providing the operating system with
more control over the hardware and more knowledge of which resources each user
program is using.
Operating Systems
Notes
As a further example, consider the history of Windows NT. The first release had a
very layer-oriented organization. However, this version suffered low performance
compared to that of Windows 95. Windows NT 4.0 redressed some of these
performance issues by moving layers from user space to kernel space and more
closely integrating them.
Exokernel Architecture
Operating systems define the interface between applications and physical resources.
Unfortunately, this interface can significantly limit the performance and
implementation freedom of applications. Traditionally, operating systems hide
information about machine resources behind high-level abstractions such as
processes, files, address spaces and interprocess communication. These abstractions
define a virtual machine on which applications execute; their implementation cannot
be replaced or modified by untrusted applications.
Hardcoding the implementations of these abstractions is inappropriate for three main
reasons:
z
it restricts the flexibility of application builders, since new abstractions can only
be added by awkward emulation on top of existing ones (if they can be added at
all).
68 Self-Instructional Material
Operating Systems
Notes
exokernels can be made efficient due to the limited number of simple primitives
they must provide
Finally, many of the hardware resources in microkernel systems, such as the network,
screen, and disk, are encapsulated in heavyweight servers that cannot be bypassed or
tailored to application-specific needs. These heavyweight servers can be viewed as
fixed kernel subsystems that run in user-space.
Client-Server Architecture
This architecture abstracts each component of the operating system as clients and
servers. Every system operation is carried out by one component requesting one or
more components for one or more system service. The servers respond to the requests
and the intended action takes place in this manner.
Among other advantages of this architecture is one which makes the whole system
highly modular and hence easy to maintain and modify.
and user-friendliness are of the two very important features that an end-user program
should support.
Depending on the varied needs of the users, the interface has to comply with all the
user requirements. Therefore, there are various types of interfaces available for
different types of users. The common types of interface are:
1.
System calls
2.
3.
We have already covered system calls in previous units. We shall, now discuss about
rest of the two terminal I/O mechanisms.
70 Self-Instructional Material
do not follow any particular style or format. They only provide a means of producing
GUI systems.
The X system is based on a client server model. The application programs form the
client which require graphical display and input facilities. These facilities are
provided by the servers. The communication between the client and the server
happens through messages which are carried out in a standard protocol. The client
and the server may exist as separate processes on one system or they may exist on
separate computers and then linked over a network. X system is entirely machineindependent. The client application is not concerned with the internals of the
target display terminal it is using. The application concerns itself with a logical or
virtual terminal. The X system must match with the requests made by the user on to
the actual hardware. In X windows, a single server terminal is capable of handling
many applications at on time.
Operating Systems
Notes
In GUI systems, a window manager takes care of the size, location, movement, etc. of
windows. In X Windows, a window manager functions as an ordinary client
application program.
The facilities provided by the X windows library are at a low level. Therefore, a
considerable amount of coding is required to produce useful applications. The
applications that are coded at this level do not always provide uniformity in the user
interface. Thus, in order to meet these problems, the programmer uses a higher level
tool called an X Tool kit. An X Toolkit comprises of two parts. One is set of functions
which are called intrinsics. These reside above the X lib function. The second part
includes an additional set of tools called widgets. The widgets are provided as
separate products such as open look from Unix International. Widgets include menus,
slide bars, icons, buttons, etc.
Student Activity
1.
2.
Summary
An operating system is the most important program in a computer system. This is one
program that runs all the time, as long as the computer is operational and exits only
when the computer is shut down. Operating systems are the programs that make
computers operational, hence the name. Operating systems are computers resource
manager. Job scheduling is the process of sequencing jobs so that they can be executed
on the processor. The main advantage of the layered approach is modularity. The
layers are selected such that each uses functions (operations) and services of only
lower level layers. System calls provide the interface between a process and the
operating system. These calls are generally available as assembly-language
instructions, and are usually listed in the manuals used by assembly-language
programmers.
Keywords
Operating System: A set of complex programs that makes a computer operational by
providing an environment in which users can use the power of the computer.
Batch Processing: A mode of data processing in which a number of tasks are lines up
and the entire task set is submitted to the computer in one go.
Multiprogramming: A style of programming in which multiple programs can share
the resources appearing to be executing simultaneously.
Time Sharing: A mode of programming in which the CPU is shared between multiple
programs each getting a share of CPU time in turn.
Parallel System: A system that is capable of executing a number of programs
parallelly.
Multiprocessor System: A computer system that has an array of a number of
processors.
Distributed System: A computer systems in which computation tasks are distributed
among several processors.
Real Time System: A special-purpose operating system in which there are rigid time
requirements on the operation of a processor or the flow of data.
Monolithic Architecture: An operating system architecture that lacks any identifiable
structure.
Command Line Interface: A kind of operating system interface that allows the user to
interact with the operating system through commands given at the command prompt.
Review Questions
72 Self-Instructional Material
1.
2.
3.
4.
5.
6.
7.
8.
9.
10. Application programs interact with operating systems through system calls. Is
there any other method of interaction between the two?
Further Readings
I.A Dhotre, Operating System, Technical Publications Office 2000 and its Applications
Operating Systems
Notes
Narender, Singh and Naruka, Office 2000 in 7 Days, Jalandhar Publishing House.
74 Self-Instructional Material
Unit 4 Bioinformatics
Internet
Applications
Bioinformatics Internet
Applications
Notes
Unit Structure
z
Introduction
Statistics
Summary
Keywords
Review Questions
Further Readings
Learning Objectives
At the conclusion of this unit, you will be able to:
z
Introduction
Most important use of internet in bioinformatics is for the biological databases which
consist of long strings of nucleotides (guanine, adenine, thymine, cytosine and uracil)
and/or amino acids (threonine, serine, glycine, etc.). Each sequence of nucleotides or
amino acids represents a particular gene or protein (or section thereof), respectively.
Sequences are represented in shorthand, using single letter designations. This
decreases the space necessary to store information and increases processing speed for
analysis.
5` ACGAGCAGCTACGCACTACGATCG 3`
3` TGCTCGTCGATGCGTGATGCTAGC 5`
A nucleotide sequence
N
SDFHKJSDHFKDHGLKDSKJG C
In the last three decades, contributions from the fields of biology and chemistry have
facilitated an increase in the speed of sequencing genes and proteins. The advent of
cloning technology allowed foreign DNA sequences to be easily introduced into
bacteria. In this way, rapid mass production of particular DNA sequences, a necessary
prelude to sequence determination, became possible. Oligonucleotide synthesis
provided researchers with the ability to construct short fragments of DNA with
sequences of their own choosing. These oligonucleotides could then be used in
probing vast libraries of DNA to extract genes containing that sequence.
Alternatively, these DNA fragments could also be used in polymerase chain reactions
to amplify existing DNA sequences or to modify these sequences. With these
techniques in place, progress in biological research increased exponentially.
The starting point for the search may either be a single sequence with the goal of
identifying its relatives, or a family of sequences with the goal of identifying further
members of that family. Searching data base needs to be fast and sensitive but the two
objectives counteract each other. Fast methods have been developed primarily for
searching with a single sequence and this shall be the topic of this section.
Bioinformatics Internet
Applications
Notes
When searching a database with a newly determined DNA or amino acid sequence
the so-called query sequence the user will typically lack knowledge of whether an
expected similarity might span the entire query or just part of it. Likewise, he will be
ignorant of whether the match will extend along the full length of some database
sequence or only part of it. Therefore, one needs to look for a local alignment between
the query and any sequence in the database. This immediately suggests the
application of the Smith-Waterman algorithm to each database sequence. One should
take care, though, to apply a fairly stringent gap penalty such that the algorithm
focuses on the regions that really match. After sorting the resulting scores, the top
scoring database sequences are the candidates one is interested in.
Several implementations of this procedure are available, most prominently the
SEARCH program from the FASTA package. There exist versions of this program that
are tuned for speed like the one due to Phil Green, one that runs especially fast on
SUN computers , and one by Geoff Barton. Depending on implementation, computer
and database size, a search with such program will take on the order of several
minutes.
The motivation behind the development of other database search programs has been
to emulate the Smith-Waterman algorithm's ability to discern related sequences as
closely as possible while at the same time performing the job in much less time. To
this end, one usually makes the assumption that any good alignment as one wishes to
identify, contains, in particular, some stretch of ungapped similarity. Furthermore this
stretch will tend to contain a certain number of identically matching residues and not
only conservative replacements. Based on these assumption, most heuristic programs
rely on identifying a well-matching core and then extending it or combining several of
these. With hindsight, the different developments in this area can further be classified
according to a traditional distinction in computer science according to which one
either preprocesses the query or the text (i.e., the database). Preprocessing means that
the string is represented in a different form that allows for faster answer to particular
questions like, e.g., whether it contains a certain subword.
be adapted to focus on regions on diagonals where the match density is high and link
nearby good diagonals into alignments.
Example taken from GCG package.
Notes
Description
FASTA uses the method of Pearson and Lipman (Proc. Natl. Acad. Sci. USA 85; 24442448 (1988)) to search for similarities between one sequence (the query) and any
group of sequences of the same type (nucleic acid or protein) as the query sequence.
In the first step of this search, the comparison can be viewed as a set of dot plots, with
the query as the vertical sequence and the group of sequences to which the query is
being compared as the different horizontal sequences. This first step finds the
registers of comparison (diagonals) having the largest number of short perfect
matches (words) for each comparison. In the second step, these "best" regions are
restored using a scoring matrix that allows conservative replacements, ambiguity
symbols, and runs of identities shorter than the size of a word. In the third step, the
program checks to see if some of these initial highest-scoring diagonals can be joined
together. Finally, the search set sequences with the highest scores are aligned to the
query sequence for display.
What is a Word?
A word is any short sequence (n-mer or k-tuple) where you have set n to some small
integer less than or equal to six. The word GGATGG is one of the 4,096 possible
words of length six that can be created from an alphabet consisting of the four letters
G, A, T, and C. The word QL is one of the 400 possible words of length two that you
can make with the 20 letters of the amino acid alphabet.
Example: Here is a session using FastA to identify sequences in the PIR protein
sequence database that are similar to a human globin protein sequence:
% fasta
FASTA with what query sequence? ggamma.pep
Removing terminal * from query sequence...
Begin (* 1 *) ?
End (* 147 *) ?
Search for query in what sequence(s) (* PIR:* *) ?
What word size (* 2 *) ?
Don't show scores whose E() value exceeds: (* 10.0 *):
What should I call the output file (* ggamma.fasta *) ?
1 Sequences
105 aa searched
PIR1:CCHU
501 Sequences
93,217 aa searched
PIR1:IHQFT
78 Self-Instructional Material
Output
The output from FastA is a list file, and is suitable for input to any GCG program that
allows indirect file specifications.
Here is some of the output file:
Bioinformatics Internet
Applications
Notes
!!SEQUENCE_LIST 1.0
(Peptide) FASTA of: ggamma.pep from: 1 to: 147 September 25, 1998 11:18
TRANSLATE of: gamma.seq check: 6474 from: 2179 to: 2270 and of: gamma.seq check:
6474 from: 2393 to: 2615 and of: gamma.seq check: 6474 from: 3502 to: 3630 generated
symbols 1 to: 148.
Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26;
191-203. . . .
TO: PIR:* Sequences: 109,075 Symbols: 34,814,664 Word Size: 2
Databases Searched
NBRF, Release 57.0, Released on 30Jun1998, Formatted on 18Aug1998
Scoring matrix: GenRunData:Blosum50.Cmp
Variable pamfactor used
Gap creation penalty: 12 Gap extension penalty: 2
Histogram Key
Each histogram symbol represents 179 search set sequences
Each inset symbol represents 17 search set sequences
z-scores computed from opt scores
z-score obs exp
(=)
(*)
< 20
863
0:
22
0:
24
0:
26
2:*
28
14
25:*
30
81
149:*
32
306
577: == *
34
1045
1564 :====== *
36
2925
3213 :=================*
38
5368
5310 :=============================*
40
7971
42
9957
44
10706
9987 : ====================================*====
46
10069
10172:======================== = ===============*
48
9611
9739 :======================================= =*
50
8595
8887 := ====================================*
52
7636
7813 :====================================== *
=====
54
6559
6674 :=====================================*
56
5262
5574 :============================== *
58
4590
4576 :=========================*
60
3638
3707 :====================*
62
2916
2972 :================*
64
2320
2364 :=============*
66
1907
1868 :==========*
68
1368
1469 :========*
70
1122
1152 :======*
72
837
900
:=====*
74
631
702
:===*
76
483
546
:===*
78
349
424
:==*
80
299
330
:=*
82
213
252
:=*
84
132
200
:=*
86
112
155
:*
88
87
120
:*
90
74
93
:*
92
47
72
:* :=== *
94
27
55
:* :== *
96
29
43:* :==*
98
25
33:* :=*
100
24
26:* :=*
102
20:* :=*
104
11
15:* :*
106
12:* :*
108
9:*
:*
110
7:*
:*
112
6:*
:*
114
4:*
:*
116
3:*
:*
118
3:*
:*
>120
829
:*====
:*================= ===============
Joining threshold: 36, opt. threshold: 24, opt. width: 16, reg.-scaled
The best scores are:
PIR1:HGCZG
! hemoglobin gamma-G chain - chimpanzee 971 971 971 1145.0 6.2e57
PIR1:I37025
80 Self-Instructional Material
PIR1:HGHUG
! hemoglobin gamma-G chain - human
End of List
Bioinformatics Internet
Applications
Notes
ggamma.pep
PIR1:HGCZG
P1;HGCZG - hemoglobin gamma-G chain - chimpanzee
N;Alternate names: hemoglobin gamma-1 chain
C;Species: Pan troglodytes (chimpanzee)
C;Date: 31-May-1996 #sequence_revision 21-Jan-1997 #text_change 14-Nov1997
C;Accession: I36939; I61853
R;Slightom, J.L.; Chang, L.Y.; Koop, B.F.; Goodman, M. . . .
SCORES Init1: 971 Initn: 971 Opt: 971 z-score: 1145.0 E(): 6.2e57
Smith-Waterman score: 971; 100.0% identity in 147 aa overlap
10
20
30
40
50
60
ggamma.pep
MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAI
MGNPK
HGCZG
MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAI
MGNPK
10
20
30
40
50
60
70
80
90
100
110
120
ggamma.pep
VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVL
AIHFG
HGCZG
VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVL
AIHFG
70
80
130
140
90
100
110
120
ggamma.pep KEFTPEVQASWQKMVTGVASALSSRYH
HGCZG
130
KEFTPEVQASWQKMVTGVASALSSRYH
140
Blast
The other widely used program to search a database is called BLAST (Basic Local
Alignment Search Tool). Blast follows a similar scheme in that it relies on a core
similarity, although with less emphasis on the occurrence of exact matches. This
program also aims at identifying core similarities for later extension. The core
similarity is defined by a window with a certain match density on DNA or with an
amino acid similarity score above some threshold for proteins. Independent of the
exact definition of the core similarity, BLAST rests on the precomputation of all
strings which are in the given sense similar to any position in the query. The resulting
list may be on the order of thousand or more words long, each of which if detected in
a database give rise to a core similarity. In Blast nomenclature, this set of strings is
called the neighborhood of the query. The code to generate this neighborhood is in
fact exceedingly fast.
Given the neighborhood, a so-called finite automaton is used to detect occurrences in
the database of any string from the neighborhood. This automaton is a program,
constructed on the fly and specifically for the particular word neighborhood that has
been computed for a query. Upon reading through a database of sequences, the
automaton is given an additional letter at a time and decides whether the string that
ends in this letter is part of the neighborhood. If so, BLAST attempts to extend the
similarity around the neighborhood and if this is successful, it reports a match.
Like with the FASTA, BLAST has also been adapted to connect good diagonals and
report local alignments with gaps. BLAST additionally converts the database file into
its own format to allow for faster reading. This makes it somewhat unwieldy to use in
a local installation unless someone takes care of the installation. FASTA, on the other
82 Self-Instructional Material
hand, is slower but easier to use. There exist excellent web servers that offer these
programs, in particular at the NCBI, where BLAST http://www.ncbi.nlm.
nih.gov/BLAST/ can be used on up-to-date DNA and protein databases.
Bioinformatics Internet
Applications
Notes
2.
List of words of length 3 in the query protein sequence is made ( length 11-12 for
DNA sequences).
3.
Words are evaluated for matches with any other combination of 3 amino amino
acids using Blosum 62 scoring matrix as default. Matches of PQG to PEG would
score 15, to PRG 14, to PSG 13 and to PQA 12.
4.
5.
6.
The above procedure is repeated for each 3-letter word in the query sequence. For
a sequence of length 250 amino acids, the total number of words to search for is
approximately 50 x 250 = 12,500.
7.
Words organized into an efficient search tree for comparing them rapidly to the
database sequences.
8.
Each database sequence is scanned for an exact match to one of the 50 high
scoring amino acid words corresponding to the first query sequence position.
9.
In Blast2 or gapped Blast, short matched regions called HSPs or high scoring
segment pairs lying on the same diagonal and within a certain distance of each
other are extended in each direction as long as the score keeps rising.
Notes
Blast 2 Statistics
The probability p of observing a score S equal to or greater than x is given by the
equation,
p ( S > x) = 1 - exp( - e-l(x-u))
where u = [log (Km'n')]/l
where K and l are parameters that are calculated by Blast for the amino acid
substitution scoring, n' is the effective length of the query sequence and m' is the
effective length of the database sequence.
The expect value E, the number database sequences not related to m but which by
chance would give a score x with the query sequence is given by where D is
calculated as the length of the database divided by m.
and E ~ Dp for small p, as in the Fasta calculation.
Specifically, p is the average probability for a Poisson distribution of scores where
0,1,2,3... scores can be found. E = 1 - e-Dp is probable number of sequences giving the
score. E is roughly significant at 0.02-0.05
Blast Programs
Program
Query sequence
Database
Type of alignment
Blastp
protein
protein
gapped
Blastn
nucleic acid
nucleic acid
gapped
Blastx
protein
Tblastn
protein
Tblastx
84 Self-Instructional Material
Bioinformatics Internet
Applications
Notes
(500 letters)
Database: em_pln
98,350 sequences; 292,516,774 total letters
Searching.................................................done
Score
ATAC2330 AC002330 Arabidopsis thaliana BAC T10P11 from chromoso... 131 3e-30
ATCHRIV6 AL161494 Arabidopsis thaliana DNA chromosome 4, contig... 131 3e-30
AC068901 AC068901 Arabidopsis thaliana chromosome 1 BAC F1O3 ge... 131 3e-30
Notes
AB018120 AB018120 Arabidopsis thaliana genomic DNA, chromosome ... 131 4e-30
AC007505 AC007505 Arabidopsis thaliana chromosome I BAC F28L22 ... 131 4e-30
AC020646 AC020646 Genomic sequence for Arabidopsis thaliana BAC... 131 4e-30
ATT18B22 AL138652 Arabidopsis thaliana DNA chromosome 3, BAC cl... 130 5e-30
>ATT5C2 AL138664 Arabidopsis thaliana DNA chromosome 3, BAC clone T5C2
Length = 103098
Score = 40.9 bits (83), Expect = 0.006
Identities = 28/68 (41%), Positives = 32/68 (46%)
Frame = +1 / +1
Query: 235 TKATWFLLIIKGMMDLNLAVRILEIIFKNTIQPAMGLYSRILLA*SPLG
IKGTTVLLKPL 414
TKA W L + G +L
I E+ T+Q MG S I LA LGIK V L P
RILEIIFKNTIQPAMGLYSRILLA*SPLGIKGTTVLLKPLGI*PVLKKSLTA
86 Self-Instructional Material
Frame = -1 / -2
Query: 488 KSP*HIMQNDVCAAVRDFFRTGQIPKGFNKTVVPLIPKGDHAKSIREYR
PIAGCIVFLKI 309
KS I+ + A++ FF G +PKG N T++ LIPK AK +++YRPI+ C V K+
Bioinformatics Internet
Applications
Notes
ascribed to each subsequent residue in the gap. There is no widely accepted theory for
selecting gap costs.
Statistics
Local alignments with no gaps are referred to as High Scoring Pairs (HSPs). The
number of random HSP scores equal or greater than S is described by the Poisson
distribution. This is the P value associated with the score S. Highly significant scores
have P values close to zero. For gapped alignments, the significance of a given
alignment with score S is represented by the E (expect) value, the expected number of
chance alignments with a score of S or better. This can be evaluated by looking at
alignment scores generated using mock databases of random sequence of comparable
length and composition. The E value decreases exponentially as the Score (S) that is
assigned to a match between two sequences increases. The E value reflects the size of
database and the scoring system in use. At very low E values, the E and P values may
converge.
of the probability that a model has produced a particular sequence. The optimal
alignment is computed with an algorithm exactly analogous to the dynamic
programming algorithm and maximising the probability that a series of states as
given rise to a particular sequence. In contrast hereto, in absence of knowledge of the
correct path of states, the probability that a model has given rise to a particular
sequence should rather be computed as the sum over the different sets of states that
could have produced the sequence. This interpretation leads to a summation over all
paths instead of the choice of the best one. Practically, there is little known about the
difference in performance between the two approaches.
Bioinformatics Internet
Applications
Notes
about 10 reactions in series to read the whole thing. The first reaction can use a primer
site on the cloning vector. After you get the results from that reaction, you can use
them to design a new primer to read 1,000 bases further. Assume it takes one day to
run a sequencing reaction, study the results, and design a primer, and another two
days to have the new primer made, it will take you about a month to sequence the
whole 10,000 bases.
Now, instead of just one universal primer to read in from one end, consider how
much you could speed up the process if you read in from both ends at the same time.
With automated machines, it probably doesn't take much longer to do two reactions
than to do one, since we can do them in parallel at the same time. So reading in from
both ends, we could sequence the insert in about 15 days.
It might occur to you that the process would be even faster if we could break up the
template into smaller fragments, and sequence each of them from both ends. For
example, if we could break it into five pieces of 2,000 bases, and read 1,000 bases in
from each end, we could obtain all of the sequence in one set of reactions, using only
universal primers. This would take one day in our scenario, assuming we can run 10
reactions at once. The trouble is, we don't have a good way to break the template up
into five non-overlapping pieces of 2,000 base pairs. Maybe if we had a good
restriction map, we could invest some time to in cloning smaller pieces, but that
would take a considerable amount of work. The good news is that the speedup from
sequencing many small clones in parallel using universal primers is so great that it
can be well worthwhile to do so, even if we have to use clones representing
overlapping pieces, and we end up sequencing some parts multiple times. In fact, we
do have good methods of generating random fragments from a large piece of DNA.
It also turns out that sequencing the same part several times from different clones, at
different distances from the primer and on both strands, can let us determine the
sequence more accurately. As with most experimental results, the interpretation of
sequencing reactions often leaves us with some ambiguity. For example, sometimes
peaks blur together on an electropherogram, and though we can tell there are several
'A's in a row, it can be difficult tell whether there are five or six. Ideally, peaks would
be spaced evenly, but in fact, peak separations can vary a bit due to the way the
strands of DNA fold during electrophoresis. Since peaks tend to get shorter and
broader farther from the primer, the signal to noise ratio drops, until eventually we
can't read with confidence. Since peak migration is influenced by folding of DNA
strands, artifacts due to strand folding are likely to be different on the two strands.
This means that if sequence from both strands agrees, we can be fairly confident that
it is correct.
2.
3.
4.
90 Self-Instructional Material
5.
Check to see that the following modules are selected (You can read the help
documentation to see what these modules do):
6.
7.
Bioinformatics Internet
Applications
Notes
Look in the "shotgun" directory again. You should find a new file named
"HIV.0.aux": double click this file to launch Gap4.
8.
Two windows will open, the main Gap4 window and the "Contig Selector".
In the main Gap4 window, select "View: Template Display". This opens a "Show
templates" dialog box. Be sure the "all contigs" radio button is selected and the
"Templates" and "Readings" checkboxes are checked. Click "OK" (see Figure 4.3:
"original assembly").
The graphical template display shows how the 25 sequencing "reads" have been
assembled into 7 contiguous blocks ("contigs"). The "reads" are arrows, and the
lines between them are templates. Note that there are two reads per template.
Each template was sequenced on each end using universal primers that read into
the insert from the plasmid cloning vector. Two ends read from the same
template are called a "read pair".
All of our templates are about 1200 bases long, plus or minus about 200 bases, and
the read pairs both read in from the ends of the template. We can rearrange the
contigs so that the display of read pairs is consistent with these facts.
Note that the two contigs at the right end of Figure 4.3 are pointing out from their
templates, rather than in. Right-click on the contig lines at the bottom of the
Templates Display window to bring up a context menu that lets you
"Complement" the contig. Figure 4.4 ("complemented contigs") shows what they
should look like when you're done.
Notice the templates drawn in yellow. Each has one read in one contig, and the
other read in a different contig. Later in the exercise we will sequence the middle
parts of these templates, which will let us join the contigs that these templates
span. But first, note that some of the contigs do not have templates that would let
us connect them. We must go back to the clone library and sequence more clones.
Save your Gap4 database with a new version number (version 1) by choosing
"File" Copy database" from the main window's top menu and entering "1" in the
box marked "New version character". Exit Gap4.
9.
the forward strand, and "q" for the reverse. The number "1" indicates that you said
you were using a universal primer. Be sure these items are correct, and save the
trace file.
Once all groups have sequenced their assigned clones, we will collect them and
give everyone a copy of all the traces to use in the next round of assembly.
10. Adding New Traces to the Assembly
Copy the new sequence traces into your project folder. Open Pregap4 again by
double clicking the "congif.pg4" file. Select the "Files to process" tab, and click the
"Add files" button. Select the new SCF files, starting with "HIVsubclone026p1t.scf". Be sure to set "Files of type" to "SCF"!
On the "Configure modules" tab, click on the "Gap4 shotgun assembly" module.
Enter "1" in the "Gap4 database version" field, and select the "Append to existing
database" radio button. Click the "Run" button in the lower left corner of the
window. When the program reports "processing finished", close Pregap4.
Now open Gap4 again, but this time by double-clicking the file "HIV.1.aux". This
is the new version of the database where Pregap4 put the latest trace data. Open
the Templates Display window ("View: Template Display", "OK"). It should
resemble Figure 4.5.
Note that several of the templates do not seem to be displayed correctly. All the
templates in our subclone library have inserts of roughly the same size (1200 +/~200 bp). Since each should have been sequenced from both ends using the
forward and reverse universal primers, there should be an arrow representing the
sequencing read from each end pointing in toward the middle of the insert.
Because some of our templates are drawn much too long, and not all of the
arrows point in from the ends, we need to rearrange the contigs so they are
consistent with what we know about our templates and sequence reads.
As we saw earlier, clicking on a contig with the right mouse button brings up a
context menu that lets you complement the contig. This will change the direction
of all the read arrows in that contig. Click on a contig using the middle mouse
button to drag it left or right to a new position (if your mouse has a wheel, it will
probably work as a middle mouse button, too). You may have to click a few time
in slightly different spots to grab the contig line successfully.
Use these operations to rearrange the contigs until all templates are drawn about
the right length, with one read coming in from each end, as in Figure 4.6.
Note that the templates drawn in grey or dark grey all cross boundaries between
contigs. The next part of the exercise will be to use custom primers to sequence
the middle parts of some of these templates, to see if we can obtain enough
sequence to join some of our contigs together. For example, the contigs named
"HIVsubclone010-q1t" and "HIVsubclone021-p1t" would presumably be joined if
we had better sequence from the middle part of clone 9, 19, or 21. (Point the
mouse at a contig, template, or read to see its name. Each contig is named after
the leftmost read that it contains.)
11. Sequencing Clones that Connect Contigs
At this point, the sequence is assembled into 7 contigs, which means that there are
6 boundaries between contigs. Each group of students will be assigned one of
these boundaries, and will do additional sequencing reactions to try to get enough
information to join the contigs.
92 Self-Instructional Material
1.
2.
Clones
9, 19 or 21
29
30 or 29
18, 24 or 30
26 or 28
15, 16, 27
Bioinformatics Internet
Applications
Notes
Figure 4.4: The two Rightmost Contigs have been Complemented so that the Read
Pairs Point toward each Other (into the Template)
94 Self-Instructional Material
Bioinformatics Internet
Applications
Notes
Figure 4.6: Contigs have been Rearranged so Templates are within the Expected
Size and have Reads Coming in from both Ends
Although shotgun sequencing was the most highly developed technique for
sequencing genomes from about 19952005, other technologies have surfaced, called
next-generation sequencing. These technologies create shorter reads (anywhere from
25500bp) but many hundreds of thousands or millions of reads in a relatively short
time (on the order of a day). This results in high exposure, but the assembly process is
much more computationally costly. These technologies are vastly superior to shotgun
sequencing due to the high volume of data and the comparatively short time it takes
to sequence a whole genome. The major disadvantage is that the accuracies are
usually lower (although this is compensated for by the high coverage).
First, the genome is broken up into a collection of large fragments (between 40 and
200 kbp) called Bacterial Artificial Chromosomes or BACs. The BACs location along
the genome is then mapped using specialized laboratory experiments. A minimal
tiling path of BACs is chosen such that each base in the genome is covered by at least
one BAC, and the overlap between BACs is minimized. Each BAC is then sequenced
through the standard shotgun method, the resulting assemblies being combined into
an assembly for each chromosome using the information provided by the tiling paths.
Figure 4.7: BAC-by-BAC Approach (The Long Lines Represent Individual BACs)
Student Activity
1.
2.
Summary
The motivation behind the development of other database search programs has been
to emulate the Smith-Waterman algorithm's ability to discern related sequences as
closely as possible while at the same time performing the job in much less time. The
other widely used program to search a database is called BLAST (Basic Local
Alignment Search Tool). This program also aims at identifying core similarities for
later extension. The core similarity is defined by a window with a certain match
density on DNA or with an amino acid similarity score above some threshold for
proteins. Upon reading through a database of sequences, the automaton is given an
additional letter at a time and decides whether the string that ends in this letter is part
of the neighborhood. Dideoxy chain-termination sequencing depends on synthetic
DNA primer sequences to initiate the reaction. These primers must match a portion of
the template whose sequence we are trying to determine. Although shotgun
sequencing was the most highly developed technique for sequencing genomes from
about 19952005, other technologies have surfaced, called next-generation sequencing.
Keywords
Modularity: Ensures that, for the particular task at hand, the data will be collected
and stored in an appropriate manner which differs greatly from one level of activity
(simply gathering the raw data) to another (storing analyzed data) and from one type
of high- throughput system to another. ... The best system is one that employs
integration at those levels where it is an advantage but maintains enough modularity
to ensure that (1) there are no major compromises regarding how any one type of data
is handled and, (2) all the key elements in a researchers information system can be
adjusted or updated independently.
96 Self-Instructional Material
Bioinformatics Internet
Applications
Notes
Codon: The set of three nucleotides along a strand of mRNA that determine (or code)
the amino acid placement during protein synthesis. The number of possible
arrangements of these three nucleotides (or triplet codes) available for protein.
Beauty (Blast Enhanced Alignment Utility): A tool developed at Baylor College of
Medicine (Worley et al. 1995) which uses BLAST to search several custom databases
and incorporates sequence family information, location of conserved domains, and
information about any annotated sites or domains directly into the BLAST query
results.
Blast: Basic Local Alignment Search Tool. A program for searching biosequence
databases which was developed and is maintained by a group at the National Center
for Biotechnology Information (NCBI). There are several versions of BLAST: BLASTP
which searches a protein database, BLASTN to search a nucleotide database,
TBLASTN which searches for a protein sequence in a nucleotide database by
translating nucleotide sequences in all 6 reading frames, BLASTX which can search
for a nucleotide sequence against a protein database by translating the query via all 6
reading frames, gapped-BLAST, and psi-BLAST. BLAST locates patches of regional
similarity instead of calculating the best overall alignment using gaps. The program
then uses a scoring matrix to rank these matches as positive, negative or zero. If the
initial match is scored highly, the search is expanded in both directions until the
ranking score falls off.
Review Questions
1.
Determine the amino acid composition of the encoded polypeptide. Are all 20
amino acids present? Do certain amino acids predominate. Compare the
composition to the frequency observed in proteins on average.
2.
3.
4.
5.
What is BLAST?
Further Readings
Andreas D. Baxevanis, B.F. Francis Ouellette and Dharminder Kumar, Bioinformatics: a
practical guide to the analysis of genes and proteins. John Wiley & Sons, Inc., New York.
2001.
Jin Xiong, Essential Bioinformatics, Cambridge University Press, 2006.
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang,
Zheng Zhang, Webb Miller, and David J. Lipman, Gapped BLAST and PSI-BLAST: a
new generation of protein database search programs, 1997.
98 Self-Instructional Material
Q 1.
Q 2.
Q 3.
Q 4.
Q 5.
Q 6.
Q 7.
Q 8.
Q 9.
Question Number
Responded
On Page Number of
Assignment
Marks
1
2
3
4
5
6
7
8
9
10
Total Marks:_____________/25
Remarks by Evaluator:___________________________________________________________________
______________________________________________________________________________________
Note: Please ensure that your Correct Registration Number is mentioned on the Assignment Sheet.
Signature of the Evaluator
Signature of the student
Date:_______________
Date:_______________